Abstract:Writing essays is an important skill which enables one to clearly write the ideas and understanding of certain topic with the help of language articulation and examples. Writing essay is a skill so is the grading of those essays. It requires a lot of efforts to grade these essays and the task becomes tedious and repetitive when the student to teacher ratio is high. As with any other repetitive task, the intervention of technology for automated essay grading has been thought of long back. However, the main challenge in automated essay grading lies in the understanding of language construction, word usage and presentation of idea/ argument/ narration. Language complexity makes natural language understanding a challenging task. In this work, we show our experiments with pre-trained static word embeddings like GloVe, fastText and pre-trained contextual model Bidirectional Encoder Representations from Transformers (BERT) for the task of automated essay grading. For the regression task, we have used Long Short-Term Memory (LSTM) and Support Vector Regression (SVR) models under various feature settings framed from the learnt embeddings. The results are shown with the ASAP-AES dataset on all 8 prompts. Our work shows average Quadratic Weighted Kappa (QWK) of 0.81 and 0.71 with SVR and LSTM on in-domain test set essays, respectively. The SVR model shows a better QWK than the human-human agreement of 0.75. To the best of our knowledge, our SVR model with pre-trained BERT embeddings achieve the highest average QWK reported on ASAP-AES data set. We further show the performance of our approach with adversary samples generated using permuted essays and off-topic essays. We experimentally show that our LSTM model though does not show high QWK score with human assigned grade but is robust against the adversarial settings considered.

BERT Embeddings for Automatic Readability Assessment

Automatic Readability Assessment for Closely Related Languages

A Neural Pairwise Ranking Model for Readability Assessment

BasahaCorpus: An Expanded Linguistic Resource for Readability Assessment in Central Philippine Languages

Diverse Linguistic Features for Assessing Reading Difficulty of Educational Filipino Texts

A Unified Neural Network Model for Readability Assessment with Feature Projection and Length-Balanced Loss

Strategies for Arabic Readability Modeling

A Transfer Learning Based Model for Text Readability Assessment in German

Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features

Automated Scoring for Reading Comprehension via In-context BERT Tuning

Linguistic Features for Readability Assessment

Text Readability Assessment for Second Language Learners

Text as Environment: A Deep Reinforcement Learning Text Readability Assessment Model

Improving BERT Performance for Aspect-Based Sentiment Analysis

On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation

A study of Vietnamese readability assessing through semantic and statistical features

Modeling essay grading with pre-trained BERT features

Automated essay scoring in Arabic: a dataset and analysis of a BERT-based system

ReadNet: A Hierarchical Transformer Framework for Web Article Readability Analysis

A Simple Post-Processing Technique for Improving Readability Assessment of Texts using Word Mover's Distance

Automatic Speech Recognition Post-Processing for Readability: Task, Dataset and a Two-Stage Pre-Trained Approach