Abstract:Model calibration and validation are critical in hydrological model robustness assessment. Unfortunately, the commonly used split‐sample test (SST) framework for data splitting requires modelers to make subjective decisions without clear guidelines. This large‐sample SST assessment study empirically assesses how different data splitting methods influence post‐validation model testing period performance, thereby identifying optimal data splitting methods under different conditions. This study investigates the performance of two lumped conceptual hydrological models calibrated and tested in 463 catchments across the United States using 50 different data splitting schemes. These schemes are established regarding the data availability, length and data recentness of continuous calibration sub‐periods (CSPs). A full‐period CSP is also included in the experiment, which skips model validation. The assessment approach is novel in multiple ways including how model building decisions are framed as a decision tree problem and viewing the model building process as a formal testing period classification problem, aiming to accurately predict model success/failure in the testing period. Results span different climate and catchment conditions across a 35‐year period with available data, making conclusions quite generalizable. Calibrating to older data and then validating models on newer data produces inferior model testing period performance in every single analysis conducted and should be avoided. Calibrating to the full available data and skipping model validation entirely is the most robust split‐sample decision. Experimental findings remain consistent no matter how model building factors (i.e., catchments, model types, data availability, and testing periods) are varied. Results strongly support revising the traditional split‐sample approach in hydrological modeling. Hydrological model calibration is a critical model building process that infers key model parameter values from observed system response data. Conventionally, this process requires the historical period to be split into a calibration period for tuning parameters and a validation period for testing model robustness (i.e., the split‐sample). Unfortunately, there is a lack of empirical evidence supporting how exactly to define the split‐sample. We designed an exhaustive and novel experiment comparing the range of possible split‐sampling schemes, including calibrating to older/recent years, calibrating to a short/long period, and calibrating to the full period of available system response data. Each scheme was evaluated based on performance, assessed in three different ways, in numerous post‐validation model testing periods for each of the 926 calibration case studies (two different hydrological models applied in 463 catchments). Results show that using older data for model calibration and then using newer data for validation, which is the typical practice in the literature, is an inferior choice and should be avoided. The results also show that calibrating to the full historical data and skipping model validation entirely is the most robust choice. Therefore, the split‐sample approach applied in this community for decades should be revised. A unique split‐sample experiment is performed across 463 catchments to provide guidance on split sample decision‐making in model calibration Calibrating models to the full available data period and skipping model validation entirely is the most robust choice Calibrating models to older data and then validating models on newer data, a very common approach in literature, is an inferior choice A unique split‐sample experiment is performed across 463 catchments to provide guidance on split sample decision‐making in model calibration Calibrating models to the full available data period and skipping model validation entirely is the most robust choice Calibrating models to older data and then validating models on newer data, a very common approach in literature, is an inferior choice

Time to Update the Split‐Sample Approach in Hydrological Model Calibration

On the Robustness of Conceptual Rainfall-Runoff Models to Calibration and Evaluation Data Set Splits Selection: A Large Sample Investigation

On Lack of Robustness in Hydrological Model Development Due to Absence of Guidelines for Selecting Calibration and Evaluation Data: Demonstration for Data‐Driven Models

Impact Of Two Calibration Strategies On Flow Simulation In Qiantang River Basin, China

Improved Data Splitting Methods for Data-Driven Hydrological Model Development Based on a Large Number of Catchment Samples

A Robust Strategy to Account for Data Sampling Variability in the Development of Hydrological Models

Achieving Robust and Transferable Performance for Conservation‐Based Models of Dynamical Physical Systems

Quantifying the Role of Calibration Strategies on Surface‐Subsurface Hydrologic Model Performance

Benefit of Multivariate Model Calibration for Different Climatic Regions

Diagnostic Calibration of a Hydrological Model in a Mountain Area by Hydrograph Partitioning

The Transferability of Hydrological Models under Nonstationary Climatic Conditions

Getting your money's worth: Testing the value of data for hydrological model calibration

Toward a Data‐Effective Calibration of a Fully Distributed Catchment Water Quality Model

Enhancing hydrological model calibration through hybrid strategies in data‐scarce regions

The value of distributed snow cover and soil moisture data for multi-objective calibration of a conceptual hydrologic model

Validation of a model with climatic and flow scenario analysis: case of Lake Burrumbeet in southeastern Australia

Global Optimization-Based Calibration Algorithm for a 2D Distributed Hydrologic-Hydrodynamic and Water Quality Model

A brief analysis of conceptual model structure uncertainty using 36 models and 559 catchments

Impact of Calibration Objective on Hydrological Model Performance in Ungauged Watersheds

A Calibration Framework for High‐Resolution Hydrological Models Using a Multiresolution and Heterogeneous Strategy

Scrutinizing different predictive modeling validation methodologies and data-partitioning strategies: new insights using groundwater modeling case study