Anime Popularity Prediction Before Huge Investments: a Multimodal Approach Using Deep Learning

Jesús Armenta-Segura,Grigori Sidorov
2024-06-22
Abstract:In the japanese anime industry, predicting whether an upcoming product will be popular is crucial. This paper presents a dataset and methods on predicting anime popularity using a multimodal textimage dataset constructed exclusively from freely available internet sources. The dataset was built following rigorous standards based on real-life investment experiences. A deep neural network architecture leveraging GPT-2 and ResNet-50 to embed the data was employed to investigate the correlation between the multimodal text-image input and a popularity score, discovering relevant strengths and weaknesses in the dataset. To measure the accuracy of the model, mean squared error (MSE) was used, obtaining a best result of 0.011 when considering all inputs and the full version of the deep neural network, compared to the benchmark MSE 0.412 obtained with traditional TF-IDF and PILtotensor vectorizations. This is the first proposal to address such task with multimodal datasets, revealing the substantial benefit of incorporating image information, even when a relatively small model (ResNet-50) was used to embed them.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue of how to predict whether an upcoming work will be popular in the Japanese anime industry before making significant investments. Specifically, the authors propose a dataset and method that utilize multimodal text-image data to predict the popularity of anime. These datasets are entirely sourced from free internet resources and are constructed following strict standards based on real investment experience. By combining the deep neural network architectures of GPT-2 and ResNet-50, the correlation between multimodal text-image inputs and popularity scores is studied, revealing the strengths and weaknesses of the dataset. ### Main Research Questions: 1. **Predicting the Popularity of Anime Works**: How to accurately predict whether an anime work will be liked by the audience in the early stages of production, before making large-scale investments. 2. **Application of Multimodal Data**: How to use multimodal data of text and images to improve the accuracy of predictions. 3. **Model Performance Evaluation**: Evaluating the predictive performance of the model using metrics such as Mean Squared Error (MSE) and comparing it with traditional methods. ### Background and Motivation: - **Industry Demand**: The success of the anime industry largely depends on the popularity of its works. Accurate predictions can help investors make better decisions, avoid financial disasters, and create successful and profitable series. - **Existing Challenges**: Currently, there is no direct method to predict the popularity of new anime. For example, a movie project of "Dragon Ball Z" lost over 9 billion yen despite featuring well-known characters; whereas "Demon Slayer" set box office records even though its original manga was not particularly popular. - **Data Limitations**: In the early stages of production, the available information is very limited, usually only a plot summary and simple sketches of the main characters. Therefore, any prediction system must be based on this limited information. ### Methods and Contributions: - **Dataset Construction**: A large number of plot summaries, main character descriptions, and portraits of anime works were collected from platforms like MyAnimeList to construct a multimodal dataset. - **Model Design**: GPT-2 was used to process text data, ResNet-50 to process image data, and a multi-input deep neural network model was used for regression prediction. - **Performance Evaluation**: The model's performance was evaluated using metrics such as Mean Squared Error (MSE), Pearson correlation coefficient, Spearman's rank correlation coefficient, and Kendall's tau coefficient. ### Experimental Results: - **Best Model**: The model that combined all inputs (text and images) performed the best, with an MSE of 0.011, significantly lower than the baseline model's 0.412. - **Correlation Analysis**: There was a moderate correlation between the combination of all inputs and MyAnimeList scores, with text inputs showing a stronger correlation than image inputs. ### Discussion and Future Work: - **Model Complexity**: Experimental results indicate that larger and more complex models are needed to capture the complexity of the dataset. Future improvements include using larger language models such as Llama2 or GPT-4. - **Data Processing**: Due to memory limitations of Transformer models, the current model has issues with information loss when processing longer text descriptions. Future work can explore more efficient processing methods to include more information. In summary, this paper provides a new solution for predicting the popularity of anime works by constructing a multimodal dataset and deep neural network model, demonstrating the potential of multimodal data in this task.