Caravan MultiMet: Extending Caravan with Multiple Weather Nowcasts and Forecasts

Guy Shalev,Frederik Kratzert
2024-11-14
Abstract:The Caravan large-sample hydrology dataset (Kratzert et al., 2023) was created to standardize and harmonize streamflow data from various regional datasets, combined with globally available meteorological forcing and catchment attributes. This community-driven project also allows researchers to conveniently extend the dataset for additional basins, as done 6 times to date (see <a class="link-external link-https" href="https://github.com/kratzert/Caravan/discussions/10" rel="external noopener nofollow">this https URL</a>). We present a novel extension to Caravan, focusing on enriching the meteorological forcing data. Our extension adds three precipitation nowcast products (CPC, IMERG v07 Early, and CHIRPS) and three weather forecast products (ECMWF IFS HRES, GraphCast, and CHIRPS-GEFS) to the existing ERA5-Land reanalysis data. The inclusion of diverse data sources, particularly weather forecasts, enables more robust evaluation and benchmarking of hydrological models, especially for real-time forecasting scenarios. To the best of our knowledge, this extension makes Caravan the first large-sample hydrology dataset to incorporate weather forecast data, significantly enhancing its capabilities and fostering advancements in hydrological research, benchmarking, and real-time hydrologic forecasting. The data is publicly available under a CC-BY-4.0 license on Zenodo in two parts (<a class="link-external link-https" href="https://zenodo.org/records/14161235" rel="external noopener nofollow">this https URL</a>, <a class="link-external link-https" href="https://zenodo.org/records/14161281" rel="external noopener nofollow">this https URL</a>) and on Google Cloud Platform (GCP) - see more under the Data Availability chapter.
Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the limitations of current hydrological models in real - time prediction by expanding the Caravan large - sample hydrological data set and introducing multiple meteorological forecast and nowcasting products. Specifically, the study addresses the following key issues: 1. **Limitations of a single meteorological data source**: - The paper points out that relying on a single global meteorological reanalysis data (such as ERA5 - Land) has differences in spatio - temporal accuracy and reliability. By introducing multiple precipitation nowcasting products (CPC, IMERG v07 Early, CHIRPS) and weather forecast products (ECMWF IFS HRES, GraphCast, CHIRPS - GEFS), the performance of data - driven hydrological models can be improved. 2. **Lack of historical real - time forecast data**: - Currently, most hydrological models are limited to hindcasting. In order to promote the development of hydrological forecasting and enable models to conduct comparative studies in actual operating environments, it is necessary to incorporate historical real - time weather forecast data. This helps to evaluate the performance of models under real conditions, analyze how forecast uncertainties propagate through hydrological models, and ultimately improve the prediction accuracy of water resource management and disaster mitigation. 3. **The need for standardized and diverse data sets**: - The Caravan data set has integrated and standardized multiple large - sample hydrological data sets, but these data sets usually use high - resolution locally available meteorological forcing data and attribute maps, which are difficult to directly compare. By introducing diverse meteorological data sources, especially weather forecast data, this expansion makes Caravan the first large - sample hydrological data set to include weather forecast data, significantly enhancing its functionality and promoting the progress of hydrological research, benchmarking, and real - time hydrological forecasting. 4. **Challenges in real - time hydrological forecasting**: - Real - time hydrological forecasting requires processing a large amount of real - time data and generating reliable prediction results in a short time. By introducing multiple forecast products, researchers can rigorously evaluate model performance under different meteorological conditions and develop more reliable real - time hydrological forecasting systems, thereby enhancing the ability to respond to water - resource - related challenges. In summary, by expanding the Caravan data set and introducing multiple meteorological forecast and nowcasting products, this paper aims to overcome the limitations of existing hydrological models in real - time prediction and promote the development of hydrological research and applications.