An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue Assistant

Mohit Tomar,Abhisek Tiwari,Tulika Saha,Prince Jha,Sriparna Saha
2024-01-11
Abstract:In recent times, there has been an increasing awareness about imminent environmental challenges, resulting in people showing a stronger dedication to taking care of the environment and nurturing green life. The current $19.6 billion indoor gardening industry, reflective of this growing sentiment, not only signifies a monetary value but also speaks of a profound human desire to reconnect with the natural world. However, several recent surveys cast a revealing light on the fate of plants within our care, with more than half succumbing primarily due to the silent menace of improper care. Thus, the need for accessible expertise capable of assisting and guiding individuals through the intricacies of plant care has become paramount more than ever. In this work, we make the very first attempt at building a plant care assistant, which aims to assist people with plant(-ing) concerns through conversations. We propose a plant care conversational dataset named Plantational, which contains around 1K dialogues between users and plant care experts. Our end-to-end proposed approach is two-fold : (i) We first benchmark the dataset with the help of various large language models (LLMs) and visual language model (VLM) by studying the impact of instruction tuning (zero-shot and few-shot prompting) and fine-tuning techniques on this task; (ii) finally, we build EcoSage, a multi-modal plant care assisting dialogue generation framework, incorporating an adapter-based modality infusion using a gated mechanism. We performed an extensive examination (both automated and manual evaluation) of the performance exhibited by various LLMs and VLM in the generation of the domain-specific dialogue responses to underscore the respective strengths and weaknesses of these diverse models.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the following problem: how to assist people in solving plant care-related issues by building a multimodal plant care dialogue assistant. Specifically, the paper focuses on the following aspects: 1. **Growth in plant care demand**: As people's awareness of environmental challenges deepens, more and more individuals are paying attention to plant care. However, due to a lack of professional knowledge and guidance, many plants fail to survive under their care. 2. **Insufficiency of existing resources**: Although there are some online forums and communities (such as Reddit and Houzz) that provide discussions related to plant care, these platforms suffer from issues like untimely responses and varying quality of advice. 3. **Lack of multimodal data**: Existing plant care datasets are mainly focused on image classification and lack multimodal dialogue data that includes both text and images. To address these issues, the paper proposes the following objectives: - Construct a multimodal plant care dialogue dataset named Plantational, containing approximately 1K dialogues, each involving interactions between users and plant care experts. - Investigate the performance of different large language models (LLMs) and vision-language models (VLMs) in the plant care assistant task, and explore the impact of instruction tuning and fine-tuning techniques. - Develop a multimodal plant care dialogue generation framework named EcoSage, which incorporates an adapter mechanism to integrate visual information into the model to improve the quality and relevance of responses. Through these efforts, the paper aims to provide users with an intelligent assistant capable of answering plant care questions in real-time and accurately.