Survey on Automated Short Answer Grading with Deep Learning: from Word Embeddings to Transformers

Stefan Haller,Adina Aldea,Christin Seifert,Nicola Strisciuglio
DOI: https://doi.org/10.48550/arXiv.2204.03503
2022-03-11
Abstract:Automated short answer grading (ASAG) has gained attention in education as a means to scale educational tasks to the growing number of students. Recent progress in Natural Language Processing and Machine Learning has largely influenced the field of ASAG, of which we survey the recent research advancements. We complement previous surveys by providing a comprehensive analysis of recently published methods that deploy deep learning approaches. In particular, we focus our analysis on the transition from hand engineered features to representation learning approaches, which learn representative features for the task at hand automatically from large corpora of data. We structure our analysis of deep learning methods along three categories: word embeddings, sequential models, and attention-based methods. Deep learning impacted ASAG differently than other fields of NLP, as we noticed that the learned representations alone do not contribute to achieve the best results, but they rather show to work in a complementary way with hand-engineered features. The best performance are indeed achieved by methods that combine the carefully hand-engineered features with the power of the semantic descriptions provided by the latest models, like transformers architectures. We identify challenges and provide an outlook on research direction that can be addressed in the future
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the application challenges of Automated Short Answer Grading (ASAG) in the education field. Specifically, with the growth of the number of students, educational tasks need to be scaled up, and manually evaluating students' answers is both time - consuming and labor - intensive. Therefore, ASAG, as a solution, aims to evaluate students' short answers by automated means to improve the evaluation efficiency while maintaining or improving the evaluation quality. The paper pays special attention to how the advances in natural language processing (NLP) and machine learning (ML) in recent years have influenced the development of the ASAG field, and conducts a comprehensive review and analysis of deep - learning - based methods. ### Main contributions of the paper 1. **Provide the latest ASAG methods and their performance comparison**: The paper compares in detail the performance of recently proposed ASAG methods on different datasets. 2. **Outline the main benchmark datasets**: Introduces several main benchmark datasets for evaluating ASAG systems. 3. **Identify current trends and the most promising model architectures**: Analyzes the current research trends and points out the possible future development directions. 4. **Analyze the impact of the advances in NLP and deep - learning methods on the ASAG field**: Explores how these technological advances promote the development of ASAG. ### Historical perspective The paper reviews the development process of the ASAG field, from the early concept - mapping methods to the application of information retrieval techniques, then to the introduction of semantic features and the development of machine - learning methods. In particular, the emergence of word - embedding methods has significantly enhanced the capabilities of ASAG systems. Subsequently, the introduction of sequence - based models (such as RNN and LSTM) and attention - based models further improves the quality of text representation, making ASAG systems more effective in handling long - distance dependencies. ### Method classification The paper divides the existing ASAG methods into two major categories: 1. **Methods based on hand - crafted features and classical machine learning**: - **Lexical features**: Including word frequency, part - of - speech tagging, etc. - **Syntactic features**: Including sentence structure, dependency relations, etc. - **Semantic features**: Including word vectors, semantic similarity, etc. 2. **Deep - learning - based methods**: - **Word embeddings**: Such as Word2Vec, which maps words to a high - dimensional space, making semantically similar words close in the space. - **Sequence - based models**: Such as RNN and LSTM, which can capture long - distance dependencies in sentences. - **Attention - mechanism - based models**: Such as Transformer, which relaxes strict sequential analysis through multi - head attention mechanisms and can more effectively model the dependencies between words. ### Conclusion The paper emphasizes the application prospects of deep - learning methods in the ASAG field, especially when combining hand - crafted features and automatically learned features, the best performance can be achieved. Future research directions include improving the text - understanding ability in cross - language settings and enhancing the generalization ability of existing methods in different fields.