Symposium 23. Toward Ecological Forecasting
Yiqi Luo,James Clark,Tom Hobbs,S. Lakshmivarahan,Andrew Latimer,Kiona Ogle,David Schimel,Xuhui Zhou
Bulletin of the Ecological Society of America
Abstract:A symposium, organized by Yiqi Luo (Department of Botany and Microbiology, University of Oklahoma) and D. S. Schimel (NEON Inc.),“Toward Ecological Forecasting: Applications of Model-Data Fusion Techniques” was held during the 93rd ESA Annual Meeting at Milwaukee, Wisconsin, USA, on 8 August 2008. Seven presentations were delivered at the symposium. The key message that emerged was that, as more and more data become available from the measurement networks, it is urgent to develop techniques and expertise in the ecological community to integrate data and models together to enhance our capability for ecological forecasting. The symposium was organized for at least two reasons: the increase in data available from measurement networks, and the societal need to develop a capability for forecasting future changes in ecosystem services. During the past decade, the field of ecology has been rapidly transformed from a data-limited to a data-rich one. This unusually rapid transformation is due largely to long-term accumulation of research data from networks such as the NSF Long Term Ecological Research (LTER) network and the development of new sensors used in numerous studies in the ecological community such as those in AmeriFlux. To enhance our ability to forecast future changes in ecological services, the National Ecological Observatory Network (NEON) is being implemented. NEON is a continental-scale research platform designed to gather long-term data on ecological responses of the biosphere to changes in land use and climate. NEON consists of distributed sensor networks, observations and experiments, linked by advanced cyberinfrastructure to record and archive ecological data for at least 30 years. NEON will provide numerous types of observations at more than 60 locations over the nation. There exists an unprecedented demand to convert these raw data into ecologically meaningful products so as to support basic research as well as policy making on resource management and climate change mitigation. Ecology is a scientific discipline to study our planet's life-supporting systems. Human survival and well being depend on Earth's ecosystems to provide essential goods such as food, timber, forage, fuels, pharmaceuticals, and precursors to industrial products. Ecosystems also serve other important functions such as recycling of water and chemicals, mitigation of floods, and cleansing of the atmospheric pollutants. The continuing rapid increase in human population requires more and more service from ecosystems, while human-induced deterioration of the environment, including global change, diminishes and/or threatens essential ecosystem services needed to maintain a healthy, productive global society. To sustain long-term ecosystem services and to prevent or mitigate ecological disasters, we must develop the capability to anticipate changes in ecological systems and the associated potential impacts. Quantitative forecasting requires the use of ecological models. During the past several decades, many models have been developed in the areas of population, community, and ecosystem ecology. Ecosystem biogeochemical cycling models, for example, have been incorporated into Earth system models to project carbon sinks and sources in land ecosystems and their feedback effects on climate. While such models are essential for forecasting future states of ecosystems and climate, they are necessarily abstractions of ecological reality. The usefulness of their projections depends on how well the models capture key processes and utilize observations and other information. Most of the model-assisted projections of biosphere–atmosphere feedbacks are a collection of interesting, but largely untested, hypotheses for the future state of ecosystems and climate. It is therefore imperative to carefully evaluate model structures against and estimate parameters from experimental and observational data in order to improve the accuracy of ecological forecasting. Data-model fusion is an essential technology to exploit the knowledge contained in past and current observations and to summarize them in models. Data-model fusion combines the best information from imperfect models and empirical evidence contained in data toward predictive understanding of ecological systems. It incorporates data into a model using statistical and computational tools. Technically, datamodel fusion is defined as a method that seeks to characterize the state of an ecosystem that best fits a given model, under specified constraints and with a starting “background field,” that preserves the information content of the observations. Data-model fusion becomes a necessary tool to improve model parameterization, to choose between alternative model structures, to better design sensor networks and experiments for data collection, and to conduct uncertainty analysis for ecological forecasting. It has been successfully applied to weather forecasting, and data-model fusion has great potential to significantly improve ecological forecasting. In the data-rich era of ecological research, data-model fusion is central to eco-informatics, which encompasses not only the acquisition, archival, and retrieval of data and metadata, but also sophisticated new means of analysis so as to effectively serve human society and protect the environment (Fig. 1). To date, however, no broad-based framework or capacity exists for assimilation of massive data from sensor networks towards ecological forecasting. It is essential to develop our capability of data-model fusion and ecological forecasting to fully realize the potentials of NEON and other networks. Eco-informatics in the data-rich era. Information is transferred from sensor networks and cyber-infrastructure to support decision making and resource management. The most critical step is data-model fusion, built upon theory and models, toward forecasting changes in ecosystem services and anticipated disatrous events. Among the seven presentations delivered during the symposium, Dr. Schimel offered a conceptual framework on ecological forecasting and data-model fusion in the NEON context. Dr. Lakshmivaran provided a historical perspective on technique development for data assimilation and forecasting in meteorology and other scientific disciplines, and also described some new development in data-assimilation technology. Drs. Latimer, Zhou, Hobbs, and Ogle used examples in areas of plant diversity, carbon cycling, infectious diseases, and soil–plant interactions, respectively, to illustrate how data and models can be combined to advance our scientific understanding and ecological forecasting. Dr. Clark used two more examples to highlight a need to use major scientific questions to guide the design of observatory networks. Below are some key concepts and major examples of data-model fusion and ecological forecasting. 1. Ecological forecasting and data-model fusion in the NEON context Ecological forecasting is to quantitatively characterize the most likely future state of an ecological system either under the prevalent conditions or under different what-if scenarios. Under prevalent conditions, short-term forecasts generally can be made according to the system's own dynamics. For example, forecasts can be made on the likely rate of spread of an invasive species according to its own invasiveness and environmental conditions. The “what-if” forecasts are made on the spread of an invasive species when alternate management actions on land uses or scenarios of climate changes are being considered. While ecological forecasting typically requires mechanistic knowledge of the process being modeled, forecasts are usually probabilistic and provide an estimate of the probability of the future state, and not just a point estimate of its value. As an example, while weather models are fully deterministic, the weather forecast is always expressed as a probability (e.g., 70% chance of rain). Ecological forecasting with data-model fusion requires accurate estimates of initial conditions and parameters before future states of an ecological system can be quantitatively estimated, even with a perfect model. In a forecast model, the estimation of initial conditions and parameters are usually reiteratively made in a cyclic way. A model is initialized and integrated with observations forward to produce a forecast, compared again to observations, re-initialized, and again integrated forward. Considering processes that evolve over long periods of time, as with the impacts of climate change, iterative comparisons of predictions with data are critical. The predictive accuracy of a model based on parameters estimated initially accurately from data, may drift due to long-term dynamic processes, such as physiological adaptation, community composition or evolution. Cyclic evaluation of models against observation for ecological forecasting motivates a research strategy for long-term observations, such as NEON. While a dedicated researcher may generate a few time series suitable for long-term forecasting studies, forecasting at the continental scale requires observatory networks to ensure long-term data collection. On the other hand, great efficiency of data collection in an observatory network can be realized when key variables are targeted. If gaps in or accuracy of measurements of a certain ecosystem variable result in large forecast errors, as identified in the error analysis, that state variable should be targeted for improved measurement. The measurements in that system will then gradually change over time as accuracy of forecasts are improving and the targeted state variables of forecasting are evolving through cyclic prediction–observation comparison and the analysis of forecast errors. Ecological forecasting is important to ecology because quantitative prediction is critical for documenting and advancing scientific understanding, and useful in societal application of knowledge. NEON is critical to forecasting and other modeling and analysis activities because it provides crucial, long-term, regular, standardized observations at the continental scale. The interplay between NEON and ecological forecasting will substantially advance ecological research and effectively offer service to the society. 2. Data-model fusion to improve model prediction of species distribution dynamics Modeling and predicting species distributions and abundances provides a case study for the application of some data-model fusion approaches. Despite much attention and the development of increasingly powerful and flexible models for predicting presence/absence of species using environmental data, species distribution models are still mostly single-level models in which the multiple sources of uncertainty from different data sources are not represented. Perhaps most importantly, observed presences (and sometimes absences) of species are taken as unbiased reflections of the environmental response of the species, when clearly other factors, including land use and population and metapopulation dynamics affect whether sites that are suitable and available are actually occupied. The focus of this study by Dr. Latimer and his colleagues was on how to use a data-integration approach, using hierarchical Bayesian modeling as a tool for integrating diverse data sources, to move toward more robustly predictive species distribution models. As a case study, we focused on shrub species in the family Proteaceae in the Cape Floristic Region of South Africa. Even at the level of phenomenological, primarily statistical static models of species distributions, if we have informative data, we can improve the models by representing distinct processes contributing to species distributions at different hierarchical levels of the model. For example, the environmental response of the species can be modeled as a latent or unobserved variable (call it “suitability”). But rather than using the raw observational data to inform us about suitability, we need to consider other factors. Human landscape transformation for agriculture or urbanization effectively excludes these species, so we use data based on remote sensing to incorporate the effect of land use at a second hierarchical level (“availability”). Finally, we model probability of occurrence (“occupancy”) given suitability and availability, using a spatial neighborhood model that reflect (in a crude, area-based way) the availability of propagules or the spatially structured effects of historical events. We can then use this full model to infer about the processes we are interested in but have not observed directly: the environmental suitability of areas, the potential range of the species in the absence of land use, and the effect of land use on the species' current distribution. Of course, the extent to which we can accurately describe these distinct processes depends heavily on having enough informative data, a point that becomes even more critical as we move to more mechanistic models of populations and their demographic rates. The core goal of modeling species' responses to environmental factors is to determine how they will respond to variation and directional change. If we are going to predict the fates of individual species, we will need to focus on the dynamics of populations to identify the key demographic rates and the components of environmental variation that most strongly affect them. Only then will we be in a position to make a relatively robust forecast, using model predictions of climate and other factors. But here we bump up against a limitation: despite increasing data richness, data on demographic rates across multiple populations and years, particularly at a scale representing the distribution of a species, remain a pocket of data poverty. We illustrate progress toward more mechanistic approaches by using a data set in progress of collection on the same South African species. We are collecting demographic rates for dozens of populations across their distributions, and across aridity and seasonality gradients. Here we focus on growth rates of individual plants, and model these growth rates using spatiotemporal models with population and temporal random effects, as well as random effects reflecting individual variation. Using an initial data set of 14 populations, we found a significant effect of annual precipitation on growth. These early results suggest we can obtain useful information about responsiveness of demographic rates to climate, as well as on background rates of variability, by integrating field and climate data sets; but the results also suggest that the field data are going to be limiting. In effect, we need all the demographic field data we can get. In addition, for modeling and predicting at scales relevant for individual populations and regional management, we will need high-quality downscaled climate data and forecasts for many more regions and time periods, as well as validation of these model outputs. 3. Uncertainties in parameter estimation for forecasting regional carbon sinks The rising concentration of CO2 in the atmosphere and the resultant climate change alter carbon cycles in terrestrial ecosystems, which, in turn, may amplify or dampen climate change via positive or negative carbon–climate feedback. To better understand responses of terrestrial ecosystems to climate change, quantification of carbon sequestration becomes essential. The carbon sequestration capacity in plant and soil pools is largely determined by both carbon residence time and NPP changes. The carbon residence time is the length of time that a carbon atom can stay in an ecosystem from the entrance via photosynthesis to the release back to the atmosphere via autotrophic and heterotrophic respiration. Several methods have been used to estimate the carbon residence time, but its uncertainty has not been well quantified. Moreover, uncertainty is an inherent component of ecological forecasting. In the carbon cycle research, carbon sink potentials cannot be fully understood, and policies based on current understanding to mitigate climate change will fall short in meeting targets of the Tokyo protocol if the uncertainty issue is not adequately addressed. Dr. Zhou and his colleagues quantified uncertainty of regional-scale carbon residence times and its propagation to forecasted carbon sink capacity. The Bayesian probability inversion and Markov Chain Monte Carlo (MCMC) technique were applied to a regional terrestrial ecosystem (TECO-R) model to quantify carbon residence times and assess their uncertainty in the conterminous USA. The results showed that almost all parameters had a nearly Gaussian distribution, but with considerably different variability. Most parameters with large variability were related to litter pools, largely due to the lack of experimental data. Estimated ecosystem carbon residence times ranged from 16.6 ± 1.8 years (cropland) to 85.9 ± 15.3 years (evergreen needleleaf forest, ENF), with an average of 56.8 ± 8.8 years in the conterminous USA. The ecosystem carbon residence times and their standard deviations were spatially heterogeneous, and varied with vegetation types and climate conditions. Large uncertainty, represented by the coefficient of variation, appeared in the southern and eastern United States. Driven by current increases in net primary production (NPP), terrestrial ecosystems in the conterminous USA sequestered 0.20 ± 0.05 Pg C/yr. The spatial pattern of ecosystem carbon sequestration was closely related to the greenness map in the summer, ranging from −60 to >140 g•m−2•yr−1, with larger sequestration in central and southeast regions. Uncertainties of carbon residence times were spatially related to distribution of data points, while uncertainties of ecosystem carbon sequestration were related to carbon residence times and NPP. Their results suggest that the Bayesian approach with MCMC inversion provides an effective tool to estimate spatially distributed C residence times and assess their uncertainty in the conterminous USA. 4. Forecasting infectious disease: Fusing process models with data Models that accurately forecast the behavior of infectious diseases represent one of the most appreciated contributions of ecology to human welfare worldwide. The assimilation of data with disease models has followed two somewhat disparate traditions. The epidemiological tradition has emphasized the statistical analysis of empirical models of disease risk, models that portray relationships in data, but that do not portray processes in nature. The theoretical tradition has emphasized the mathematical analysis of models representing disease processes, notably transmission, recovery, and the epidemic dynamics that result from them. The fusion of process models with data played a fundamental role in supporting policy on the control of recent epidemics, notably the outbreak of Severe Acute Respiratory Syndrome (SARS) in Asia and the epidemic of foot-and-mouth disease in the United Kingdom. These modeling efforts were useful in allocating limited resources, justifying difficult policy options, and assessing the nonlinear impacts of multiple interventions on disease spread. Most importantly, the models allowed decision makers to assess if interventions were being successful in containing the disease, and why. There are notable needs to enhance the effectiveness of data-model fusion in supporting interventions to control infectious diseases. First, there is a surprising absence of hierarchical, state-space approaches that estimate uncertainty due to variance in the process and errors in observations. Recent, stochastic models have ignored observation uncertainty, which may have led to excessively optimistic confidence intervals. Secondly, a rich opportunity exists for merging time-tested approaches to studying population dynamics, for example matrix models parameterized by mark–recapture methods, with epidemiological models in the SIR tradition. This is particularly true for zoonotic diseases. A statistical framework is badly needed for estimating parameters of individual-based and network models. Finally, educating the next generation of ecologists to build reliable models of infectious disease and fuse them with data requires an entirely new approach to statistical training for graduate students. We must offer grounding in distribution theory and practice in developing process models that properly include all sources of uncertainty. Contemporary hierarchical frameworks are particularly promising, and should form the backbone of quantitative training for disease ecologists. 5. Soil processes with a Bayesian approach Data obtained under an experimental design framework can be analyzed in a hierarchical Bayesian statistical modeling framework that integrates relevant information, in the form of data and models, to capture key ecological interactions and nonlinearities. In particular, the hierarchical Bayesian approach provides a straightforward method for integrating diverse data and process models related to belowground carbon dynamics, as illustrated by Dr. Ogle and her colleagues. Data from a soil incubation study included measurements of soil respiration rates, microbial biomass, and soil carbon content. All of these data sources suffered from missing data, and the Bayesian approach was able to accommodate such missing data, allowing inference to proceed based on all available information. The example that I presented also illustrated that simultaneous analysis of multiple data sets is often highly desired, because the data sets may inform similar or interconnected processes that share common parameters. The application of the hierarchical Bayesian approach to the soil incubation data suggested that soil carbon content differed greatly between shrub, grass, and bare areas in a Sonoran Desert ecosystem, and the shrub microsites supported significantly more microbial biomass than the bare soil and grass microsites. The simultaneous analysis of the different data sets produced by the soil incubation study, and the ability of the Bayesian approach to deal with missing data, allowed us to estimate the total amounts of carbon and microbes under each microsite, in addition to their depth-dependent distributions within the soil profile. Data-model fusion is a powerful tool to integrate multiple, heterogeneous data toward major synthesis and analysis of ecological systems. Data-model fusion is an essential step toward improving ecological forecasting. Ecological models can be used to predict changes in ecological systems. Accuracy of the predictions will be greatly improved when models are conditioned upon experimental and observational data. Measurement networks provide needed long-term, standardized data sets to train models and discover new principles for ecological forecasting. Ecological forecasting, in turn, can help design research strategies of measurement and/or observatory networks, such as NEON. The observatory networks and ecological forecasting can interplay with great synergy. It is also important to train the next generation of ecologists to master skills of data-model fusion and ecological forecasting.