Data drought: The challenge of AI weather forecasting in India

As the world moves towards AI-based weather forecasting, a lack of precise data spells a rough patch ahead for Indian climate scientists.

Traditional weather forecasting typically relies on computer calculations based on physics to predict the weather. In contrast, AI and deep learning (a subset of machine learning) use large volumes of raw, unfiltered and processed data to predict weather. Image: Aditya Siva, CC BY-SA 3.0, via Unsplash.

Amid the surge of extreme weather events globally, billions of dollars are pouring into developing cutting-edge weather forecasting models based on artificial intelligence (AI) and machine learning (ML). Leading tech giants, such as Google and IBM, are spearheading efforts for more precise and expedited forecasting.

In India, climate scientists have also begun experimenting with AI. In December 2023, Kiren Rijiju, a minister at the Ministry of Earth Sciences (MoES), said the department had established a virtual centre dedicated to developing and refining various AI and ML techniques for enhanced weather predictions.

There has since been considerable excitement around AI-based weather forecasting in the country. But there is a problem: lack of credible data.

Amitabha Bagchi, a computer science professor at the Indian Institute of Technology Delhi, explains that AI-based modelling, “extrapolates and builds scenarios based on the available data and past trends.” According to Bagchi, 95 per cent of the development process of AI models revolves around data management, and robust data is critical to the process.

Compiling such data is a challenge in India, especially in the Himalayas, says Irfan Rashid, an assistant geoinformatics professor at the University of Kashmir. Rashid is working on a MoES project profiling 15 glacial lakes in Jammu & Kashmir and Ladakh to improve data collection in the Himalayan cryosphere (the frozen part of the Earth system), which could enhance AI predictions of glacial lake outburst floods (GLOFs).

The Geological Survey of India has recorded over 9,575 glaciers in the Himalayas yet detailed glaciological studies cover less than 30, he explains. This data scarcity undermines the development of AI-based early warning systems (EWS). “At present, if we want to know the volume of water in a glacial lake, there is no credible in-situ data. The data based on empirical models is associated with a high degree of uncertainty. Using such data to build an AI and ML based model may simulate scenarios/forecasts that might not be robust,” says Rashid.

Traditionally, weather forecasting models use a bunch of different starting points and then use physics equations to build models that give out various probabilistic scenarios.

Mohak Shah, founder, Praescivi Advisors

His concerns are shared by Madhavan Nair Rajeevan, one of India’s top climate scientists and former earth sciences secretary. He notes the country’s data sets do not extend to the Himalayas, affecting the reliability of AI/ML predictions for the region’s complex terrain: “In India, we have good data sets on rainfall, temperature, humidity, wind speed etc, which are the basic meteorological parameters. However, we don’t have adequate data over the Himalayas and hardly any data to work on GLOFs,” says Rajeevan.

Machine learning and weather forecasting

Traditional weather forecasting typically relies on computer calculations based on physics to predict the weather. In contrast, AI and deep learning (a subset of machine learning) use large volumes of raw, unfiltered and processed data to predict weather. When used in combination with traditional physical models and statistical methods, they can enhance the accuracy and reliability of weather forecasts.

Mohak Shah, founder and managing director of Praescivi Advisors, a strategic AI advisory firm based in California, USA, tells Dialogue Earth, that “Traditionally, weather forecasting models use a bunch of different starting points and then use physics equations to build models that give out various probabilistic scenarios.” ML, however, speeds up weather forecasting by using historical data correlations.

According to Shah, like any technology, machine learning has its advantages and challenges: “It is relatively low-cost … scalable too and can democratise weather forecasting. But we assume that there is enough granular data available, which isn’t the case [in India], at least not yet. Lack of local-level data can pose a fundamental problem.”

To mitigate data scarcity, ML can approximate missing information using data from similar areas, effectively giving forecasters a head start, though for optimal results, there is no substitute for high-quality data, Shah tells Dialogue Earth.

Shah raises concerns about the opaque “black box” nature of ML models. Traditional weather models come with a quantifiable margin of error, which allows for the identification and correction of specific errors based on the physics equations on which they are built. AI/ML models often lack such transparency since they are based on past correlations, making it difficult to ascertain the exact reasons behind their inaccuracies.

The data dilemma

Roxy Mathew Koll, a climate scientist at the Pune-based Indian Institute of Tropical Meteorology (IITM), which operates under the MoES, has struggled to obtain the data required to build an AI-based forecasting model for dengue, a climate sensitive disease.

“We have used past data of several factors that affect the incidence of dengue, including rainfall, temperature and humidity. But getting health data on the daily disease caseload in the city was a huge challenge. Concerned agencies were not ready to share the data. We had to knock on several doors and getting permission to use and publish the data was a tedious task,” says Koll.

Koll highlights the direct correlation between data quality and AI’s predictive capabilities: “If AI is trained on very high-resolution data, it would be able to provide high-resolution forecasting [for] climate-sensitive diseases such as dengue, malaria, Chikungunya, etc,” he says. “The AI-based modelling for dengue in Pune, can be replicated in other places provided there is access to data from the respective health departments, which is a challenge.”

Government scientists have also faced this problem. “Even when I was secretary [the highest level administrative officer in the government], I tried to gather some health data from the highest government officials. Nothing came. We lack the culture of compiling and archiving social-economic data at a granular scale. If we want impact studies, we need such data,” says Rajeevan. Without it, the research doesn’t translate into real-world benefits, adds the former secretary. Like Koll, he insists the technologies are only as good as the data they’re fed.

Bagchi also agrees. “Machine learning has the best mathematical tools available to us and is the future. But, in the Indian context, data integrity, data quality and quantity are a challenge, which may mar the development of AI-based weather forecasting.”

Shah sees AI/ML as a complementary addition, rather than a fix-all replacement. “We have to see machine learning like an additional tool in our arsenal,” he says. 

Can AI help predict GLOFs in the Indian Himalayas?

The Wadia Institute of Himalayan Geology, situated in Doon Valley, Uttarakhand, is pioneering the development of an advanced warning system for glacial hazards, with its director, Kalachand Sain, advocating for the integration of AI and ML in these efforts.

Sain conducted extensive research into the Chamoli disaster, where an avalanche in February 2021 severely damaged two hydropower projects in Uttarakhand’s Chamoli district, leading to over 200 casualties.

“Our study found that the rock-ice avalanche appears to have been initiated by seismic precursors which were continuously active for 2.5 hours prior to main detachment, but we do not monitor seismic activity around glaciers,” Sain tells Dialogue Earth.

Sain’s institute has been identifying potential risk zones for GLOFs in Uttarakhand. He singles out the Alaknanda-Dhauliganga-Rishiganga, a tectonically active basin, as a priority area due to the 29 existing hydropower projects, in various stages of completion, in addition to 54 proposed plans.

“For an AI-based integrated early warning system for glacial hazards, we need satellite data, real-time meteorological data, real-time hydrological data, real-time seismic and GPS data and general field survey,” says Sain. He underscores the urgency of setting up a dedicated glaciological centre in the region, requiring an investment of Rs 10-12 crore (US$ 1.2m-1.4m).

Rashid seconds this view, citing the disastrous 2021 Chamoli event and a 2023 GLOF in Sikkim, which occurred despite the presence of monitoring equipment that malfunctioned.

“At present, no seismic data is collected around glaciers in the entire Indian Himalayan region. Also, there is no detailed GLOF risk,” he says. Existing studies are fragmented, offering an incomplete picture of the Himalayas’ glacial risk.

Rashid advocates for a standardised method of collecting field data on glaciers and glacial lakes across the region. This data would be instrumental in developing a comprehensive AI-based forecast and alert system. “This massive exercise will need funds and money will come only if there is a strong political will,” Rashid concludes.

This article was originally published on Dialogue Earth under a Creative Commons licence.

Like this content? Join our growing community.

Your support helps to strengthen independent journalism, which is critically needed to guide business and policy development for positive impact. Unlock unlimited access to our content and members-only perks.

Most popular

Featured Events

Publish your event
leaf background pattern

Transforming Innovation for Sustainability Join the Ecosystem →