WiP: A Solution for Reducing MLLM-Based Agent Interaction Overhead

Wenjie Li,Xiaoyang Liu,Zihao Zheng,Jishun Wang,Kang Ling,Ming Fu
DOI: https://doi.org/10.1145/3662006.3662062
2024-01-01
Abstract:Current Multi-modal LLM-based mobile agents are associated with concerns over high inference time and cost. We propose to tackle these issues by developing a lightweight UI Transition Graph (UTG) and locally executing automatic tasks. Specifically, we build a lightweight HTML-based UTG on both system-level and third-party applications, enabling the avoidance of computational overhead and laboriousness. Then we simplify the interaction phase with the LLM, and perform a local shortest path search on the UTG after a target option is derived from the LLM. The small-scale experiments demonstrate the benefits of our method.
What problem does this paper attempt to address?