Transformer Based Essay Generation and Automatic Evaluation Framework

Israr Hanif; Zoha Latif; Fareeha Shafique; Humaira Afzal; Muhammad Rafiq Mufti

doi:10.53560/PPASA(62-2)694

Authors

Israr Hanif Department of Computer Science, Bahauddin Zakariya University, Multan, Pakistan
Zoha Latif Department of Computer Science, Bahauddin Zakariya University, Multan, Pakistan
Fareeha Shafique Department of Computer Science, Bahauddin Zakariya University, Multan, Pakistan
Humaira Afzal Department of Computer Science, Bahauddin Zakariya University, Multan, Pakistan
Muhammad Rafiq Mufti Department of Computer Science, COMSATS University Islamabad, Vehari Campus, Vehari, Pakistan

DOI:

https://doi.org/10.53560/PPASA(62-2)694

Keywords:

Transformer, Essay Generation, BERT Model, Natural Language Processing, Automatic Essay Grading

Abstract

The purpose of Automated Essay Grading (AEG) systems is to evaluate and assign grades to essays efficiently, thereby reducing manual effort, time, and cost. The traditional AEG system mainly focuses its efforts on extractive evaluation rather than abstractive evaluation. The objective of this research is to explore the differences in the grading system of traditional and grammar schools. This research develops a transformer-based system that combines extractive and abstractive essay generation and evaluation. We utilize the Bidirectional Encoder Representations from Transformers (BERT) model for extractive essay generation and Quillbot for abstractive paraphrasing, and design a framework that evaluates both types of essays. To achieve this objective, we created the Long Essay Poets (LEP) dataset and evaluated this across four modes using four models. We compare the performance of four models: Random Forest, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and a combined approach of CNN and LSTM. After performing the experiment, it is concluded that 46% of grades declined in Mode 3 and 44% of grades improved in Mode 4, and in the context of essay evaluation, the Random Forest model performs better in extractive and merging scenarios, and the Long Short-Term Memory (LSTM) Model outperforms in abstractive essay evaluation.

References

H.M. Alawadh, T. Meraj, L. Aldosari, and H.T. Rauf. An efficient text-mining framework of automatic essay grading using discourse macrostructural and statistical lexical features. SAGE Open 14(4): 1-14 (2024).

2. B.R. Lu, N. Haduong, C.Y. Lin, H. Cheng, N.A. Smith, and M. Ostendorf. Efficient encoder-decoder transformer decoding for decomposable tasks. arXiv 2403: 13112 (2024).

3. G. Yenduri, M. Ramalingam, G.C. Selvi, Y. Supriya, G. Srivastava, P.K. Maddikunta, G.D. Raj, R.H. Jhaveri, B. Prabadevi, W. Wang, and A.V. Vasilakos. GPT (generative pre-trained transformer) - a comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. IEEE Access 12: 54608-54649 (2024).

4. N. Delpisheh and Y. Chali. Improving faithfulness in abstractive text summarization with EDUs using BART (student abstract). Proceedings of the AAAI Conference on Artificial Intelligence, (20th - 27th February, 2024), Vancouver, British Columbia, Canada (2024).

5. W. Sun, C. Fang, Y. Chen, Q. Zhang, G. Tao, Y. You, T. Han, Y. Ge, Y. Hu, B. Luo, and Z. Chen. An extractive-and-abstractive framework for source code summarization. ACM Transactions on Software Engineering and Methodology 33(3): 75 (2024).

6. J. Devlin, M.W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (3rd - 5th June 2019), Minneapolis, Minnesota, USA (2019).

7. I. Schlag, K. Irie, and J. Schmidhuber. Linear transformers are secretly fast weight programmers. International Conference on Machine Learning, (18th - 24th July 2021), Vienna, Austria (2021).

8. M.V. Koroteev. BERT: A review of applications in natural language processing and understanding. arXiv 2103: 11943 (2021).

9. Y. Qu, P. Liu, W. Song, L. Liu, and M. Cheng. A text generation and prediction system: pre-training on new corpora using BERT and GPT-2. 10th IEEE International Conference on Electronics Information and Emergency Communication, (17th-19th July 2020), Beijing, China (2020).

10. Y.C. Chen, Z. Gan, Y. Cheng, J. Liu, and J. Liu. Distilling knowledge learned in BERT for text generation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, (5th - 10th July 2020), Washington, USA (2020).

11. F. Lin, X. Ma, Y. Chen, J. Zhou, and B. Liu. PC-SAN: Pretraining-based contextual self-attention model for topic essay generation. KSII Transactions on Internet and Information Systems 14(8): 3168-3186 (2020).

12. Y.H. Chan and Y.C. Fan. A recurrent BERT-based model for question generation. Proceedings of the 2nd Workshop on Machine Reading for Question Answering, (4th November 2019), Hong Kong, China (2019).

13. W. Antoun, F. Baly, and H. Hajj. AraGPT2: Pre-trained transformer for Arabic language generation. 6th Arabic Natural Language Processing Workshop, (19th April 2021), Kyiv, Ukraine (2021).

14. A. Doewes, A. Saxena, Y. Pei, and M. Pechenizkiy. Individual fairness evaluation for automated essay scoring system. 15th International Conference on Educational Data Mining, (24th - 27th July 2022), Durham, United Kingdom (2022).

15. N. Süzen, A.N. Gorban, J. Levesley, and E.M. Mirkes. Automatic short answer grading and feedback using text mining methods. Procedia Computer Science 169: 726-743 (2020).

16. A. Kumar, M. Sharma, and R. Singh. Automatic question-answer pair generation using deep learning. 3rd IEEE International Conference on Inventive Research in Computing Applications, (11th - 13th July 2021), Coimbatore, India (2021).

17. N.H. Hameed and A.T. Sadiq. Automatic short answer grading system based on semantic networks and support vector machine. Iraqi Journal of Science 64(11): 6025-6040 (2023).

18. M. Ramamurthy and I. Krishnamurthi. Design and development of a framework for an automatic answer evaluation system based on similarity measures. Journal of Intelligent Systems 26(2): 243-262 (2017).

19. L. Xia, J. Liu, and Z. Zhang. Automatic essay scoring model based on two-layer bi-directional long-short term memory network. 3rd International Conference on Computer Science and Artificial Intelligence, (6th-8th December 2019), Beijing, China (2019).

20. Z. Wang, J. Liu, and R. Dong. Intelligent auto-grading system. 5th IEEE International Conference on Cloud Computing and Intelligence Systems, (23rd - 25th November 2018), Nanjing, China (2018).

21. B. Riordan, A. Horbach, A. Cahill, T. Zesch, and C. Lee. Investigating neural architectures for short answer scoring. 12th Workshop on Innovative Use of NLP for Building Educational Applications, (8th September 2017), Copenhagen, Denmark (2017).

22. S. Zhao, Y. Zhang, X. Xiong, A. Botelho, and N. Heffernan. A memory-augmented neural model for automated grading. 4th ACM Conference, L@S 2017, (20th - 21st April 2017), Cambridge, MA, USA (2017).

23. F. Dong, Y. Zhang, and J. Yang. Attention-based recurrent convolutional neural network for automatic essay scoring. 21st Conference on Computational Natural Language Learning, (3rd - 4th August 2017), Vancouver, Canada (2017).

24. S. Sheng, J. Jing, Z. Wang, and H. Zhang. Cosine similarity knowledge distillation for surface anomaly detection. Scientific Reports 14(1): 8150 (2024).

25. D. Jurafsky and J.H. Martin (Eds.). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd Edition. Prentice Hall,Pearson Education, India (2000).

26. N. Jain and P.K. Jana. LRF: A logically randomized forest algorithm for classification and regression problems. Expert Systems with Applications 213(18): 119225 (2023).

27. M. Krichen. Convolutional neural networks: A survey. Computers 12(8): 151 (2023).

28. H. Alizadegan, B. Rashidi, A. Radmehr, H. Karimi, and M.A. Ilani. Comparative study of long short-term memory (LSTM), bidirectional LSTM, and traditional machine learning approaches for energy consumption prediction. Energy Exploration & Exploitation 43(1): 281-301 (2025).

29. A. Sbei, K. ElBedoui, and W. Barhoumi. Assessing the efficiency of transformer models with varying sizes for text classification: A study of rule-based annotation with DistilBERT and other transformers. Vietnam Journal of Computer Science 2024: 1-28 (2024).

30. H. Wang, Q. Liang, J.T. Hancock, and T.M. Khoshgoftaar. Feature selection strategies: a comparative analysis of SHAP value and importance based methods. Journal of Big Data 11: 44 (2024).

31. R. Nallapati, B. Zhou, C.N. dos Santos, Ç. Gülçehre, and B. Xiang. Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv:1602.06023 (2016).

32. R. Johnsi and G.B. Kumar. Enhancing automated essay scoring by leveraging LSTM networks with hyper-parameter tuned word embeddings and fine-tuned LLMs. Engineering Research Express 7(2): 025272 (2025).

Transformer Based Essay Generation and Automatic Evaluation Framework

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Similar Articles

HEC-Y