Back to Cases
Case ReportPublished empirical study2025
AAB-CASE-2026-RV-066

Automated Assessment of Student Self-Explanation in Code Comprehension Using Pre-Trained Language Models

Assessing students’ responses, especially natural language responses, is a major challenge in education. In general, in education contexts, automatically evaluating what learners do or say is important as it enables personalized instruction, e.g., based on what the learner knows tailored tasks and feedback are given to the learner.

This page documents an AI literacy or AI education case for registry purposes. It is descriptive and does not imply AAB endorsement of any specific tool, provider, or intervention.
01

Implementation

Source publication / research team or educational organization described in paper

02

Learning context

Research / curriculum design context

03

AI role

Tutor

04

Outcome signal

Conceptual understanding

Registry Facets

0
Education Level
  • Unspecified / broad education
Subject Area
  • AI for education
  • programming education assessment
  • NLP / text classification
  • ML concepts / supervised learning
Use Case Type
  • Curriculum / course design
  • Assessment support
Stakeholder Group
  • Students
AI Capability Type
  • NLP / text classification
  • ML concepts / supervised learning
  • Assessment / tutoring analytics
Implementation Model
  • Research / curriculum design context
Evidence Type
  • Design / conceptual evidence
Outcomes Domain
  • Conceptual understanding
  • Assessment / feedback quality

Implementing Organization

1
Organization Type

Source publication / research team or educational organization described in paper

Location

USA

Primary Facilitator Role

Researchers, educators, instructors, or facilitators as described in the source publication

Learning Context

2
Setting Type
  • Research / curriculum design context
Session Format

Classroom, course, or resource-based AI education activity

Duration

Not specified in extracted text

Group Size

Not specified in extracted text

Devices

NLP / text classification, ML concepts / supervised learning, Assessment / tutoring analytics

Constraints
  • The paper provides limited implementation detail in the extracted abstract; additional manual review may be needed for local replication.

Learner Profile

3
Age Range

Unspecified / broad education

Prior AI Exposure Assumed

Mixed or not explicitly specified; infer from target learner group and intervention design.

Prior Programming Background Assumed

Varies by intervention; not specified unless the paper explicitly describes prerequisites.

Educational Intent

4
Primary Learning Goals
  • Document the AI education intervention, course, tool, or resource described in the source publication.
  • Extract the learner context, AI role, pedagogy, outcomes, and constraints for AAB registry comparison.
  • Assessing students’ responses, especially natural language responses, is a major challenge in education.
Secondary Learning Goals
  • Support AAB comparison across AI literacy, AI education, teacher training, higher education, and workforce contexts.
  • Capture evidence maturity, transferability, and limitations rather than treating the publication as product endorsement.
What This Was Not
  • Not an AAB endorsement of the tool, curriculum, provider, or result.
  • Not a direct replication record unless the source paper reports implementation details sufficient for replication.

AI Tool Description

5
Tool Type

NLP / text classification, ML concepts / supervised learning, Assessment / tutoring analytics

Languages

Language context discussed in source publication

AI Role
  • Tutor
  • Evaluator
User Interaction Model
  • Primary interaction pattern inferred from publication: Curriculum / course design, Assessment support.
  • AI capability focus: NLP / text classification, ML concepts / supervised learning, Assessment / tutoring analytics.
Safeguards
  • Apply standard AAB safeguards: privacy, transparency, human oversight, and documentation of limitations.

Activity Design

6
Activity Flow
  • Review the publication’s reported context, learner group, AI tool or curriculum, implementation process, and outcome evidence.
  • Map the case to AAB registry fields for comparison across educational levels and AI capability types.
  • Use the source publication and PDF for any manual verification before public registry release.
Human Vs AI Responsibilities
  • Human educators/researchers remain responsible for instructional design, supervision, interpretation, and ethical safeguards.
  • AI systems or AI concepts provide the learning object, support tool, evaluator, simulator, or automation context depending on the paper.
Scaffolding Strategies
  • Tutoring / feedback-supported learning
  • Registry extraction emphasizes explicit learning goals, observed outcomes, constraints, and safety limitations.

Observed Challenges

7
Educators Reported
  • The paper provides limited implementation detail in the extracted abstract; additional manual review may be needed for local replication.

Design Adaptations

8
Adaptations
  • Case classified under: Published empirical study.
  • Pedagogical pattern: Tutoring / feedback-supported learning.
  • Any additional adaptations should be verified against the full paper before public-facing publication.

Reported Outcomes

9
Engagement
  • Engagement evidence should be interpreted according to the source paper’s reported method and sample.
  • In general, in education contexts, automatically evaluating what learners do or say is important as it enables personalized instruction, e.g., based on what the learner knows tailored tasks and feedback are given to the learner.
Learning Signals
  • In general, in education contexts, automatically evaluating what learners do or say is important as it enables personalized instruction, e.g., based on what the learner knows tailored tasks and feedback are given to the learner.
  • Recently, deep learning techniques led to state-of-the-art methods in NLP such as transformer- based methods which resulted in significant performance im- provements for many NLP tasks such as text classification and question answering.
Educators Reflection

Assessing students’ responses, especially natural language responses, is a major challenge in education. In general, in education contexts, automatically evaluating what learners do or say is important as it enables personalized instruction, e.g., based on what the learner knows tailored tasks and feedback are given to the learner.

Ethical & Privacy Considerations

10
Privacy
  • Apply standard AAB safeguards: privacy, transparency, human oversight, and documentation of limitations.

Evidence Type

11
Evidence
  • Design / conceptual evidence

Relevance to Research

12
Potential Research Use
  • Can be used as an AAB evidence record for cross-case comparison, standards drafting, and evidence-maturity mapping.
  • Supports identification of recurring patterns in AI literacy, AI education implementation, teacher preparation, assessment, and responsible AI learning.
Relevant Research Domains
  • Conceptual understanding
  • Assessment / feedback quality
  • Curriculum / course design
  • Assessment support
  • NLP / text classification
  • ML concepts / supervised learning
  • Assessment / tutoring analytics

Case Status

13
Case Status
  • Completed

AAB Classification Tags

14
Age

Unspecified / broad education

Setting

Research / curriculum design context

AI Function

NLP / text classification, ML concepts / supervised learning, Assessment / tutoring analytics

Pedagogy

Tutoring / feedback-supported learning

Risk Level

Medium

Data Sensitivity

Medium

Source Publication

15
Title

Automated Assessment of Student Self-Explanation in Code Comprehension Using Pre-Trained Language Models

Authors
  • Jeevan Chapagain
  • Vasile Rus
Venue

Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39 No. 28, EAAI-25

Year

2025

Doi

10.1609/aaai.v39i28.35169

Source URL

https://ojs.aaai.org/index.php/AAAI/article/view/35169

Pdf URL

https://ojs.aaai.org/index.php/AAAI/article/view/35169/37324

Pdf Filename

006_Automated Assessment of Student Self-Explanation in Code Comprehension Using Pre-Trained Language Models.pdf

Page Count

8

Abstract

Assessing students’ responses, especially natural language responses, is a major challenge in education. In general, in education contexts, automatically evaluating what learners do or say is important as it enables personalized instruction, e.g., based on what the learner knows tailored tasks and feedback are given to the learner. Recently, deep learning techniques led to state-of-the-art methods in NLP such as transformer- based methods which resulted in significant performance im- provements for many NLP tasks such as text classification and question answering. However, there is not much work exploring such methods for assessing students’ free answers, particularly in the context of code comprehension, which brings additional challenges as the student explanations in- clude code references as well. This paper explores the poten- tial of applying automated assessments methods using trans- formers to code comprehension. We fine-tuned pre-trained transformer models, including BERT, RoBERTa, CodeBERT, and SciBERT, to see how well they can automatically judge students’ responses to code comprehension tasks. Our results demonstrate that these models can significantly enhance the accuracy and reliability of automated assessments, offering insights into how the latest NLP techniques can be leveraged in computer science education to support personalized learn- ing experiences.

Transferability

16
Best Fit Contexts
  • Research / curriculum design context
Likely Failure Modes
  • The paper provides limited implementation detail in the extracted abstract; additional manual review may be needed for local replication.

Cost And Operations

17
Time Cost Notes

Not specified in extracted text unless noted in duration field.

Staffing Notes

Requires educators/researchers/facilitators with sufficient AI literacy and pedagogy knowledge for the target learners.

Infra Notes

Infrastructure depends on AI tool type, learner devices, data access, and institutional policy context.

Extraction Notes

18
Confidence

High

Missing Information
  • group_size
  • duration
Reasoning Limits

This entry was automatically extracted from the PDF text and manifest metadata. Fields should be manually verified before public registry publication, especially group size, location, duration, and outcome claims.

Duplicate Check Against Uploaded Cases Json
Closest Existing Title

Behavioral-pattern exploration and development of an instructional tool for young children to learn AI

Similarity Score

0.369

Likely Duplicate

false

Registry Metadata

19
Case ID
AAB-CASE-2026-RV-066
Publication Status
Published empirical study
Tags
caseUnspecified / broad educationUSAResearch / curriculum design contextNLP / text classificationAI for educationprogramming education assessmentNLP / text classificationML concepts / supervised learningCurriculum / course designAssessment support