Automated Assessment of Student Self-Explanation in Code Comprehension Using Pre-Trained Language Models
Assessing students’ responses, especially natural language responses, is a major challenge in education. In general, in education contexts, automatically evaluating what learners do or say is important as it enables personalized instruction, e.g., based on what the learner knows tailored tasks and feedback are given to the learner.
Implementation
Source publication / research team or educational organization described in paper
Learning context
Research / curriculum design context
AI role
Tutor
Outcome signal
Conceptual understanding
Registry Facets
- Unspecified / broad education
- AI for education
- programming education assessment
- NLP / text classification
- ML concepts / supervised learning
- Curriculum / course design
- Assessment support
- Students
- NLP / text classification
- ML concepts / supervised learning
- Assessment / tutoring analytics
- Research / curriculum design context
- Design / conceptual evidence
- Conceptual understanding
- Assessment / feedback quality
Implementing Organization
Source publication / research team or educational organization described in paper
USA
Researchers, educators, instructors, or facilitators as described in the source publication
Learning Context
- Research / curriculum design context
Classroom, course, or resource-based AI education activity
Not specified in extracted text
Not specified in extracted text
NLP / text classification, ML concepts / supervised learning, Assessment / tutoring analytics
- The paper provides limited implementation detail in the extracted abstract; additional manual review may be needed for local replication.
Learner Profile
Unspecified / broad education
Mixed or not explicitly specified; infer from target learner group and intervention design.
Varies by intervention; not specified unless the paper explicitly describes prerequisites.
Educational Intent
- Document the AI education intervention, course, tool, or resource described in the source publication.
- Extract the learner context, AI role, pedagogy, outcomes, and constraints for AAB registry comparison.
- Assessing students’ responses, especially natural language responses, is a major challenge in education.
- Support AAB comparison across AI literacy, AI education, teacher training, higher education, and workforce contexts.
- Capture evidence maturity, transferability, and limitations rather than treating the publication as product endorsement.
- Not an AAB endorsement of the tool, curriculum, provider, or result.
- Not a direct replication record unless the source paper reports implementation details sufficient for replication.
AI Tool Description
NLP / text classification, ML concepts / supervised learning, Assessment / tutoring analytics
Language context discussed in source publication
- Tutor
- Evaluator
- Primary interaction pattern inferred from publication: Curriculum / course design, Assessment support.
- AI capability focus: NLP / text classification, ML concepts / supervised learning, Assessment / tutoring analytics.
- Apply standard AAB safeguards: privacy, transparency, human oversight, and documentation of limitations.
Activity Design
- Review the publication’s reported context, learner group, AI tool or curriculum, implementation process, and outcome evidence.
- Map the case to AAB registry fields for comparison across educational levels and AI capability types.
- Use the source publication and PDF for any manual verification before public registry release.
- Human educators/researchers remain responsible for instructional design, supervision, interpretation, and ethical safeguards.
- AI systems or AI concepts provide the learning object, support tool, evaluator, simulator, or automation context depending on the paper.
- Tutoring / feedback-supported learning
- Registry extraction emphasizes explicit learning goals, observed outcomes, constraints, and safety limitations.
Observed Challenges
- The paper provides limited implementation detail in the extracted abstract; additional manual review may be needed for local replication.
Design Adaptations
- Case classified under: Published empirical study.
- Pedagogical pattern: Tutoring / feedback-supported learning.
- Any additional adaptations should be verified against the full paper before public-facing publication.
Reported Outcomes
- Engagement evidence should be interpreted according to the source paper’s reported method and sample.
- In general, in education contexts, automatically evaluating what learners do or say is important as it enables personalized instruction, e.g., based on what the learner knows tailored tasks and feedback are given to the learner.
- In general, in education contexts, automatically evaluating what learners do or say is important as it enables personalized instruction, e.g., based on what the learner knows tailored tasks and feedback are given to the learner.
- Recently, deep learning techniques led to state-of-the-art methods in NLP such as transformer- based methods which resulted in significant performance im- provements for many NLP tasks such as text classification and question answering.
Assessing students’ responses, especially natural language responses, is a major challenge in education. In general, in education contexts, automatically evaluating what learners do or say is important as it enables personalized instruction, e.g., based on what the learner knows tailored tasks and feedback are given to the learner.
Ethical & Privacy Considerations
- Apply standard AAB safeguards: privacy, transparency, human oversight, and documentation of limitations.
Evidence Type
- Design / conceptual evidence
Relevance to Research
- Can be used as an AAB evidence record for cross-case comparison, standards drafting, and evidence-maturity mapping.
- Supports identification of recurring patterns in AI literacy, AI education implementation, teacher preparation, assessment, and responsible AI learning.
- Conceptual understanding
- Assessment / feedback quality
- Curriculum / course design
- Assessment support
- NLP / text classification
- ML concepts / supervised learning
- Assessment / tutoring analytics
Case Status
- Completed
AAB Classification Tags
Unspecified / broad education
Research / curriculum design context
NLP / text classification, ML concepts / supervised learning, Assessment / tutoring analytics
Tutoring / feedback-supported learning
Medium
Medium
Source Publication
Automated Assessment of Student Self-Explanation in Code Comprehension Using Pre-Trained Language Models
- Jeevan Chapagain
- Vasile Rus
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39 No. 28, EAAI-25
2025
10.1609/aaai.v39i28.35169
https://ojs.aaai.org/index.php/AAAI/article/view/35169
https://ojs.aaai.org/index.php/AAAI/article/view/35169/37324
006_Automated Assessment of Student Self-Explanation in Code Comprehension Using Pre-Trained Language Models.pdf
8
Assessing students’ responses, especially natural language responses, is a major challenge in education. In general, in education contexts, automatically evaluating what learners do or say is important as it enables personalized instruction, e.g., based on what the learner knows tailored tasks and feedback are given to the learner. Recently, deep learning techniques led to state-of-the-art methods in NLP such as transformer- based methods which resulted in significant performance im- provements for many NLP tasks such as text classification and question answering. However, there is not much work exploring such methods for assessing students’ free answers, particularly in the context of code comprehension, which brings additional challenges as the student explanations in- clude code references as well. This paper explores the poten- tial of applying automated assessments methods using trans- formers to code comprehension. We fine-tuned pre-trained transformer models, including BERT, RoBERTa, CodeBERT, and SciBERT, to see how well they can automatically judge students’ responses to code comprehension tasks. Our results demonstrate that these models can significantly enhance the accuracy and reliability of automated assessments, offering insights into how the latest NLP techniques can be leveraged in computer science education to support personalized learn- ing experiences.
Transferability
- Research / curriculum design context
- The paper provides limited implementation detail in the extracted abstract; additional manual review may be needed for local replication.
Cost And Operations
Not specified in extracted text unless noted in duration field.
Requires educators/researchers/facilitators with sufficient AI literacy and pedagogy knowledge for the target learners.
Infrastructure depends on AI tool type, learner devices, data access, and institutional policy context.
Extraction Notes
High
- group_size
- duration
This entry was automatically extracted from the PDF text and manifest metadata. Fields should be manually verified before public registry publication, especially group size, location, duration, and outcome claims.
Behavioral-pattern exploration and development of an instructional tool for young children to learn AI
0.369
false
