Autonomous Droplet Microfluidic Design Framework with Large Language Models

Dinh-Nguyen Nguyen,Raymond Kai-Yu Tong,Ngoc-Duy Dinh
2024-11-11
Abstract:Droplet-based microfluidic devices have substantial promise as cost-effective alternatives to current assessment tools in biological research. Moreover, machine learning models that leverage tabular data, including input design parameters and their corresponding efficiency outputs, are increasingly utilised to automate the design process of these devices and to predict their performance. However, these models fail to fully leverage the data presented in the tables, neglecting crucial contextual information, including column headings and their associated descriptions. This study presents MicroFluidic-LLMs, a framework designed for processing and feature extraction, which effectively captures contextual information from tabular data formats. MicroFluidic-LLMs overcomes processing challenges by transforming the content into a linguistic format and leveraging pre-trained large language models (LLMs) for analysis. We evaluate our MicroFluidic-LLMs framework on 11 prediction tasks, covering aspects such as geometry, flow conditions, regimes, and performance, utilising a publicly available dataset on flow-focusing droplet microfluidics. We demonstrate that our MicroFluidic-LLMs framework can empower deep neural network models to be highly effective and straightforward while minimising the need for extensive data preprocessing. Moreover, the exceptional performance of deep neural network models, particularly when combined with advanced natural language processing models such as DistilBERT and GPT-2, reduces the mean absolute error in the droplet diameter and generation rate by nearly 5- and 7-fold, respectively, and enhances the regime classification accuracy by over 4%, compared with the performance reported in a previous study. This study lays the foundation for the huge potential applications of LLMs and machine learning in a wider spectrum of microfluidic applications.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the data - processing challenges in the design and performance prediction of microfluidic droplet - generation devices. Specifically, existing machine - learning models fail to fully utilize the contextual information in tabular data, such as column headers and their descriptions, when processing tabular data. This results in limited performance of the models in predicting droplet diameter, generation rate, and classifying generation patterns, etc. In addition, the inconsistency of units and data types in different tabular - data systems also increases the complexity of data processing. To address these issues, the authors propose a framework named μ - Fluidic - LLMs. This framework effectively captures and utilizes the contextual information in tabular data by converting tabular data into natural - language sentences and using pre - trained large - scale language models (LLMs) to generate embedding vectors. These embedding vectors are then used as inputs to standard machine - learning models to improve prediction performance and the ability of design automation. ### Main contributions: 1. **Data - processing method**: A new data - processing method is proposed, which converts tabular data into natural - language sentences and uses large - scale language models to generate embedding vectors, thereby better capturing contextual information. 2. **Performance improvement**: By combining deep neural networks (DNN) and large - scale language models (such as DistilBERT and GPT - 2), the prediction accuracy of droplet diameter and generation rate is significantly improved, and the accuracy of generation - pattern classification is also enhanced. 3. **Design automation**: The application of this framework in design - automation tasks is demonstrated, which can predict design parameters more accurately. ### Specific improvements: - **Predicting droplet diameter**: The model combining DNN and GPT - 2 reduces the mean absolute error (MAE) from approximately 7.5 to approximately 2.5, which is nearly 5 times better than the previous research results. - **Predicting droplet generation rate**: Similarly, the model combining DNN and GPT - 2 reduces the MAE from approximately 20 to approximately 3.1921, which is nearly 7 times better. - **Classifying generation patterns**: The classification accuracy of the model is improved by more than 4%. ### Experimental verification: The authors used a public data set containing 998 data points to conduct evaluations of 11 prediction tasks, including geometric parameters, flow conditions, generation patterns, and performance, etc. The experimental results show that the μ - Fluidic - LLMs framework performs well in multiple tasks, especially when DNN and GPT - 2 are combined, the performance improvement is particularly significant. In conclusion, this research significantly improves the design and performance - prediction capabilities of microfluidic droplet - generation devices by introducing large - scale language models and natural - language - processing techniques, providing new ideas and tools for future microfluidic applications.