Data assimilation systems are extremely helpful, but they can also be computationally intensive. For example, the National Weather Service (NWS) uses supercomputers that are more than 10,000 times faster than typical desktop computers. These supercomputers help produce daily forecasts in a few hours instead of several hours or days. Their impact is especially pronounced when used for hurricane prediction.
However, as the need for precise predictions grows with the more regular occurrence of extreme natural disasters like regional floods, the demand for high-resolution, frequently updated forecasts is increasing. In such circumstances, traditional data assimilation methods alone can become extremely computationally demanding and may not provide the timely or cost-effective results needed for preparation and response efforts.
Moreover, even with significant computing resources and extensive observational data, errors in models can still occur due to approximations in model physics or still-limited understandings of the relationships between current observations and future predictions. Machine learning (ML) and AI can significantly enhance data assimilation by detecting complex patterns and relationships in data that traditional statistical methods may not easily identify.
While some literature is showing AI/ML model improvements predicting extreme conditions at longer lead times (10+ days in the future) (Price et al 2025), traditional numerical models and data assimilation remain valuable and provide better physically interpretable explanations. AI approaches are highly sensitive to data coverage and quality, and despite ongoing expansions in observational networks, current observation density and quality are not yet sufficient to replace physics-based numeric weather models and data assimilation systems on a larger scale. These two approaches complement each other rather than conflict.
AI can enhance bias correction, better identify forecast uncertainty, improve data assimilation inputs, and integrate with physics-based constraints, ultimately creating hybrid systems that utilize both data-driven insights and fundamental atmospheric dynamics. This hybrid approach ensures forecasts that are more accurate, scientifically grounded, and practically useful.
As part of research and development at ĢƵ Allen, we augment data assimilation and numerical weather forecasting with ML and AI techniques to enhance data quality and process efficiency, improve model bias corrections and prediction accuracy, and scale the model execution.
1. Enhance data quality and data processing
Observational data often contain errors, gaps, and inconsistencies that degrade model performance. ML and AI can address these issues by automating data quality control tasks (such as labeling, cleaning, and error detection), resulting in more reliable inputs for numerical weather prediction (NWP). For example, ML-based methods can accurately classify different types of observations(Jones, 2017), ensuring that spurious data are flagged or excluded.In addition, AI can facilitate data assimilation by discerning the unique error characteristics of each observationaldataset, assigningappropriate weights to improve initial conditions for NWP models.Thiscan dramatically speed up and enhance analyses and forecasts at a reduced computational cost(Keller & Potthast, 2024).By improving both data integrity and the assimilation process,the use of ML and AI provides a stronger foundation for downstream modeling tasks.
2. Improve model prediction accuracy
Once the data are cleaned and better organized, ML and AI canfurtherrefine prediction accuracy by uncovering complex patterns and relationships that traditional methods often miss. By analyzing large volumes of historical and real-time data, ML and AIcan effectivelycapture the initial state of the atmosphere, which is crucial for reliable NWP.This approachnot only accelerates data assimilation but alsohelps correct biases and fill coverage gaps, ensuring that models have access to a more complete and consistent dataset.For instance, ML can dynamically adjust observational weighting based on data reliability, allowing high-quality observations to influence model initialization more strongly. In this way, ML and AI can leverage multiple datasets and known relationships to support better forecasts, leading to enhanced predictivecapabilitiesand more timely weather insights for decision makers.
3. Accelerate model execution
Many Earth system models, including weather forecasting models, are primarily implemented in Fortran, but most modern ML and AI models and libraries are developed in Python. While Fortran excels in scientific computing and complex numerical calculations, it lacks built-in support for automatic differentiation, creating challenges in integrating ML and AI methods and enabling hybrid models. Fortran also has more limited native Graphics Processing Unit (GPU) support, requiring additional tools or librariesto fully utilize GPU acceleration.
Historically, rewriting these systems in other languages was considered burdensomely complex, leading to continued development in their original languages and making system modifications difficult. Now, with the help of generative AI, switching coding languages has become more feasible. For example, Zhou et al. (2024) utilized a large language model (GPP-4) to translate a photosynthesis model from the community Earth system model from Fortran to Python/JAX, resulting in a significantly faster runtime by utilizing GPU parallelization and parameter estimation via automatic differentiation. With generative AI's support, modernizing traditional weather models has become more achievable, offering faster performance and the ability to leverage recent advancements in computer science, thereby supporting novel cross-disciplinary collaborations.
Generative AI can enhance the data assimilation process by generating synthetic data to fill observation gaps. Unlike traditional machine learning, which relies on assumptions like linearity or Gaussianity, models such as generative adversarial networks and diffusion models produce realistic, high-resolution synthetic data that capture underlying nonlinear dynamics (Qu et al., 2024). These models use physical constraints to ensure data aligns with atmospheric dynamics. Incorporating synthetic data into assimilation frameworks helps achieve optimal initial conditions quickly, especially in regions with sparse data. This approach is valuable for time-sensitive operations like hurricane tracking, providing near-real-time data for faster assimilation.