Abstract:In the japanese anime industry, predicting whether an upcoming product will be popular is crucial. This paper presents a dataset and methods on predicting anime popularity using a multimodal textimage dataset constructed exclusively from freely available internet sources. The dataset was built following rigorous standards based on real-life investment experiences. A deep neural network architecture leveraging GPT-2 and ResNet-50 to embed the data was employed to investigate the correlation between the multimodal text-image input and a popularity score, discovering relevant strengths and weaknesses in the dataset. To measure the accuracy of the model, mean squared error (MSE) was used, obtaining a best result of 0.011 when considering all inputs and the full version of the deep neural network, compared to the benchmark MSE 0.412 obtained with traditional TF-IDF and PILtotensor vectorizations. This is the first proposal to address such task with multimodal datasets, revealing the substantial benefit of incorporating image information, even when a relatively small model (ResNet-50) was used to embed them.

What problem does this paper attempt to address?

The paper attempts to address the issue of how to predict whether an upcoming work will be popular in the Japanese anime industry before making significant investments. Specifically, the authors propose a dataset and method that utilize multimodal text-image data to predict the popularity of anime. These datasets are entirely sourced from free internet resources and are constructed following strict standards based on real investment experience. By combining the deep neural network architectures of GPT-2 and ResNet-50, the correlation between multimodal text-image inputs and popularity scores is studied, revealing the strengths and weaknesses of the dataset. ### Main Research Questions: 1. **Predicting the Popularity of Anime Works**: How to accurately predict whether an anime work will be liked by the audience in the early stages of production, before making large-scale investments. 2. **Application of Multimodal Data**: How to use multimodal data of text and images to improve the accuracy of predictions. 3. **Model Performance Evaluation**: Evaluating the predictive performance of the model using metrics such as Mean Squared Error (MSE) and comparing it with traditional methods. ### Background and Motivation: - **Industry Demand**: The success of the anime industry largely depends on the popularity of its works. Accurate predictions can help investors make better decisions, avoid financial disasters, and create successful and profitable series. - **Existing Challenges**: Currently, there is no direct method to predict the popularity of new anime. For example, a movie project of "Dragon Ball Z" lost over 9 billion yen despite featuring well-known characters; whereas "Demon Slayer" set box office records even though its original manga was not particularly popular. - **Data Limitations**: In the early stages of production, the available information is very limited, usually only a plot summary and simple sketches of the main characters. Therefore, any prediction system must be based on this limited information. ### Methods and Contributions: - **Dataset Construction**: A large number of plot summaries, main character descriptions, and portraits of anime works were collected from platforms like MyAnimeList to construct a multimodal dataset. - **Model Design**: GPT-2 was used to process text data, ResNet-50 to process image data, and a multi-input deep neural network model was used for regression prediction. - **Performance Evaluation**: The model's performance was evaluated using metrics such as Mean Squared Error (MSE), Pearson correlation coefficient, Spearman's rank correlation coefficient, and Kendall's tau coefficient. ### Experimental Results: - **Best Model**: The model that combined all inputs (text and images) performed the best, with an MSE of 0.011, significantly lower than the baseline model's 0.412. - **Correlation Analysis**: There was a moderate correlation between the combination of all inputs and MyAnimeList scores, with text inputs showing a stronger correlation than image inputs. ### Discussion and Future Work: - **Model Complexity**: Experimental results indicate that larger and more complex models are needed to capture the complexity of the dataset. Future improvements include using larger language models such as Llama2 or GPT-4. - **Data Processing**: Due to memory limitations of Transformer models, the current model has issues with information loss when processing longer text descriptions. Future work can explore more efficient processing methods to include more information. In summary, this paper provides a new solution for predicting the popularity of anime works by constructing a multimodal dataset and deep neural network model, demonstrating the potential of multimodal data in this task.

Anime Popularity Prediction Before Huge Investments: a Multimodal Approach Using Deep Learning

Anime Character Identification and Tag Prediction by Multimodality Modeling: Dataset and Model

Multimodal Deep Learning for Social Media Popularity Prediction With Attention Mechanism

Predicting movie box-office revenues using deep neural networks

Predicting Relative Popularity via an End-to-End Multi-modality Model.

Sentiment and Hashtag-aware Attentive Deep Neural Network for Multimodal Post Popularity Prediction

Forecasting Popularity of News Article by Title Analyzing with BN-LSTM Network

Predicting Micro-video Popularity Via Multi-modal Retrieval Augmentation

Predicting the Popularity of Online Content with Knowledge-enhanced Neural Networks

Hybrid Machine Learning Approach to Popularity Prediction of Newly Released Contents for Online Video Streaming Service

The Short Video Popularity Prediction Using Internet of Things and Deep Learning

Predicting the Popularity of Reddit Posts with AI

A movie box office revenue prediction model based on deep multimodal features

A Real-Time Method to Predict Social Media Popularity

Modeling Popularity in Asynchronous Social Media Streams with Recurrent Neural Networks

Multi‐Pop: Enhancing user engagement with content‐based multimodal popularity prediction in social media

Enhancing Social Media Post Popularity Prediction with Visual Content

Combining Multiple Features for Image Popularity Prediction in Social Media.

Movie Box-Office Revenue Prediction Model by Mining Deep Features from Trailers Using Recurrent Neural Networks

Neural Network Based Popularity Prediction by Linking Online Content with Knowledge Bases.

Movie Popularity and Target Audience Prediction Using the Content-Based Recommender System