Support Vector Machines for Predicting Distribution of Sudden Oak Death in California

QH Guo,M Kelly,CH Graham
DOI: https://doi.org/10.1016/j.ecolmodel.2004.07.012
IF: 3.1
2005-01-01
Ecological Modelling
Abstract:In the central California coastal forests, a newly discovered virulent pathogen (Phytophthora ramorum) has killed hundreds of thousands of native oak trees. Predicting the potential distribution of the disease in California remains an urgent demand of regulators and scientists. Most methods used to map potential ranges of species (e.g. multivariate or logistic regression) require both presence and absence data, the latter of which are not always feasibly collected, and thus the methods often require the generation of ‘pseudo’ absence data. Other methods (e.g. BIOCLIM and DOMAIN) seek to model the presence-only data directly. In this study, we present alternative methods to conventional approaches to modeling by developing support vector machines (SVMs), which are the new generation of machine learning algorithms used to find optimal separability between classes within datasets, to predict the potential distribution of Sudden Oak Death in California. We compared the performances of two types of SVMs models: two-class SVMs with ‘pseudo’ absence data and one-class SVMs. Both models performed well. The one-class SVMs have a slightly better true-positive rate (0.9272 ± 0.0460 S.D.) than the two-class SVMs (0.9105 ± 0.0712 S.D.). However, the area predicted to be at risk for the disease using the one-class SVMs (18,441km2) is much larger than that of the two-class SVMs (13,828km2). Both models show that the majority of disease risk will occur in coastal areas. Compared with the results of two-class SVMs, the one-class SVMs predict a potential risk in the foothills of the Sierra Nevada mountain ranges; much greater risks are also found in Los Angles and Humboldt Counties. We believe the support vector machines when coupled with geographic information system (GIS) will be a useful method to deal with presence-only data in ecological analysis over a range of scales.
What problem does this paper attempt to address?