Mplug-Octopus: the Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM

Qinghao Ye,Haiyang Xu,Ming Yan,Chenlin Zhao,Junyang Wang,Xiaoshan Yang,Ji Zhang,Fei Huang,Jitao Sang,Changsheng Xu
DOI: https://doi.org/10.1145/3581783.3612665
2023-01-01
Abstract:Inspired by the recent developments of large language models (LLMs), we propose mPLUG-Octopus, a versatile conversational assistant designed to provide users with coherent, engaging, and helpful interaction experiences in both text-only and multi-modal scenarios. Unlike traditional pipeline chatting systems, mPLUG-Octopus offers a diverse range of creative capabilities including open-domain QA, multi-turn chatting, and multi-modal creation, all built with a unified multimodal LLM without relying on any external API. With the modularized end-to-end multimodal LLM technology, mPLUG-Octopus efficiently facilitates engaging and open-domain conversation experience. It exhibits a wide range of uni/multi-modal elemental capabilities, enabling it to seamlessly communicate with users on open-domain topics and engage in multi-turn conversations. It also assists users in accomplishing various content creation and application tasks. Our conversational assistant can also be deployed on smart hardware to drive advanced AIGC applications.
What problem does this paper attempt to address?