D-AI2-M: Ethanol Production Forecasting in Brazil Using Data-Centric Artificial Intelligence Methodology

Antonio Mello,Lucas Giusti,Tarsila Tavares,Fernando Alexandrino,Gustavo Guedes,Jorge Soares,Rafael Barbastefano,Fabio Porto,Diego Carvalho,Eduardo Ogasawara
DOI: https://doi.org/10.1109/tla.2024.10735449
IF: 0.967
2024-10-30
IEEE Latin America Transactions
Abstract:Ethanol serves as one of Brazils primary biofuels. The country produces two main types of ethanol: i) hydrous ethanol, directly utilized as vehicle fuel, and ii) anhydrous ethanol, presently integrated at a rate of 27% into regular gasoline. In 2023, data from the National Agency of Petroleum, Natural Gas, and Biofuels (ANP) indicated that the total volume of ethanol sold in Brazil (hydrous and anhydrous) was just over 28 million cubic meters (m3), which corresponded to almost 22% of the total volume of liquid fuels sold in the country. These numbers illustrate the importance of this biofuel in Brazil. Just six states account for approximately 90% of Brazilian ethanol production. The logistical challenge arises from production seasonality and the necessity to transport ethanol from production sites to distribution and resale networks. Commonly, such prediction is supported using econometric models, such as ARIMA. Considering the recent advances in Artificial Intelligence, this challenge prompts the research question: Can we enhance monthly hydrous and anhydrous ethanol production prediction for the primary Brazilian-producing states using Artificial Intelligence Models (AIM) How should data be prepared for such an approach This study aims to contribute to logistical planning by employing D-AI2-M - a Data-Centric Artificial Intelligence (DAI) methodology - to aid in selecting AIM for ethanol production time series in the principal Brazilian-producing states. Our quantitative experimental evaluation demonstrates the superior forecasting performance of D-AI2-M in two approaches: i) Local: where different D-AI2-M outperform the benchmark models depending on the specific time series, and ii) Global: where a single D-AI2-M achieves the best mean performance across the complete set of evaluated time series.
engineering, electrical & electronic,computer science, information systems
What problem does this paper attempt to address?