Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments
Simulating learner actions helps stress-test open-ended inter- active learning environments and prototype new adaptations before deployment. While recent studies show the promise of using large language models (LLMs) for simulating hu- man behavior, such approaches have not gone beyond rudi- mentary proof-of-concept stages due to key limitations.
Implementation
Source publication / research team or educational organization described in paper
Learning context
Research / curriculum design context
AI role
Learning object / concept model
Outcome signal
Conceptual understanding
Registry Facets
- Unspecified / broad education
- AI for education
- learner simulation
- LLM/Chat
- NLP / text classification
- Learning tool / resource design
- Students
- Adult learners / professionals
- Researchers
- LLM/Chat
- NLP / text classification
- Research / curriculum design context
- Design / conceptual evidence
- Conceptual understanding
Implementing Organization
Source publication / research team or educational organization described in paper
Not specified in extracted text
Researchers, educators, instructors, or facilitators as described in the source publication
Learning Context
- Research / curriculum design context
Tool / platform-supported learning activity
100 hours of work per instructional hour (Blessing and Gilbert 2008
Not specified in extracted text
LLM/Chat, NLP / text classification
- AI output reliability, hallucination, academic integrity, and age-appropriate use require safeguards.
Learner Profile
Unspecified / broad education
Mixed or not explicitly specified; infer from target learner group and intervention design.
Varies by intervention; not specified unless the paper explicitly describes prerequisites.
Educational Intent
- Document the AI education intervention, course, tool, or resource described in the source publication.
- Extract the learner context, AI role, pedagogy, outcomes, and constraints for AAB registry comparison.
- Simulating learner actions helps stress-test open-ended inter- active learning environments and prototype new adaptations before deployment.
- Support AAB comparison across AI literacy, AI education, teacher training, higher education, and workforce contexts.
- Capture evidence maturity, transferability, and limitations rather than treating the publication as product endorsement.
- Not an AAB endorsement of the tool, curriculum, provider, or result.
- Not a direct replication record unless the source paper reports implementation details sufficient for replication.
AI Tool Description
LLM/Chat, NLP / text classification
Language context discussed in source publication
- Learning object / concept model
- Primary interaction pattern inferred from publication: Learning tool / resource design.
- AI capability focus: LLM/Chat, NLP / text classification.
- Require human review of generated outputs and explicit guidance against over-reliance or answer copying.
Activity Design
- Review the publication’s reported context, learner group, AI tool or curriculum, implementation process, and outcome evidence.
- Map the case to AAB registry fields for comparison across educational levels and AI capability types.
- Use the source publication and PDF for any manual verification before public registry release.
- Human educators/researchers remain responsible for instructional design, supervision, interpretation, and ethical safeguards.
- AI systems or AI concepts provide the learning object, support tool, evaluator, simulator, or automation context depending on the paper.
- Scenario / case-based learning
- Registry extraction emphasizes explicit learning goals, observed outcomes, constraints, and safety limitations.
Observed Challenges
- AI output reliability, hallucination, academic integrity, and age-appropriate use require safeguards.
Design Adaptations
- Case classified under: Published empirical study.
- Pedagogical pattern: Scenario / case-based learning.
- Any additional adaptations should be verified against the full paper before public-facing publication.
Reported Outcomes
- Engagement evidence should be interpreted according to the source paper’s reported method and sample.
- While recent studies show the promise of using large language models (LLMs) for simulating hu- man behavior, such approaches have not gone beyond rudi- mentary proof-of-concept stages due to key limitations.
- While recent studies show the promise of using large language models (LLMs) for simulating hu- man behavior, such approaches have not gone beyond rudi- mentary proof-of-concept stages due to key limitations.
- Moreover, ap- parently successful outcomes can often be unreliable, either because domain experts unintentionally guide LLMs to pro- duce expected results, leading to self-fulfilling prophecies; or because the LLM has encountered highly similar scenarios in its training data, meaning that models may
- To address these challenges, we propose HYP-MIX, a simulation authoring framework that allows experts to de- velop and evaluate simulations by combining testable hy- potheses about learner behavior.
Simulating learner actions helps stress-test open-ended inter- active learning environments and prototype new adaptations before deployment. While recent studies show the promise of using large language models (LLMs) for simulating hu- man behavior, such approaches have not gone beyond rudi- mentary proof-of-concept stages due to key limitations.
Ethical & Privacy Considerations
- Require human review of generated outputs and explicit guidance against over-reliance or answer copying.
Evidence Type
- Design / conceptual evidence
Relevance to Research
- Can be used as an AAB evidence record for cross-case comparison, standards drafting, and evidence-maturity mapping.
- Supports identification of recurring patterns in AI literacy, AI education implementation, teacher preparation, assessment, and responsible AI learning.
- Conceptual understanding
- Learning tool / resource design
- LLM/Chat
- NLP / text classification
Case Status
- Completed
AAB Classification Tags
Unspecified / broad education
Research / curriculum design context
LLM/Chat, NLP / text classification
Scenario / case-based learning
Medium
Medium
Source Publication
Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments
- Amogh Mannekote
- Adam Davies
- Jina Kang
- Kristy Elizabeth Boyer
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39 No. 28, EAAI-25
2025
10.1609/aaai.v39i28.35175
https://ojs.aaai.org/index.php/AAAI/article/view/35175
https://ojs.aaai.org/index.php/AAAI/article/view/35175/37330
012_Can LLMs Reliably Simulate Human Learner Actions_ A Simulation Authoring Framework for Open-Ended Learning Environments.pdf
9
Simulating learner actions helps stress-test open-ended inter- active learning environments and prototype new adaptations before deployment. While recent studies show the promise of using large language models (LLMs) for simulating hu- man behavior, such approaches have not gone beyond rudi- mentary proof-of-concept stages due to key limitations. First, LLMs are highly sensitive to minor prompt variations, rais- ing doubts about their ability to generalize to new scenar- ios without extensive prompt engineering. Moreover, ap- parently successful outcomes can often be unreliable, either because domain experts unintentionally guide LLMs to pro- duce expected results, leading to self-fulfilling prophecies; or because the LLM has encountered highly similar scenarios in its training data, meaning that models may not be sim- ulating behavior so much as regurgitating memorized con- tent. To address these challenges, we propose HYP-MIX, a simulation authoring framework that allows experts to de- velop and evaluate simulations by combining testable hy- potheses about learner behavior. Testing this framework in a physics learning environment, we found that GPT-4 Turbo maintains calibrated behavior even as the underlying learner model changes, providing the first evidence that LLMs can be used to simulate realistic behaviors in open-ended interac- tive learning environments, a necessary prerequisite for use- ful LLM behavioral simulation.
Transferability
- Research / curriculum design context
- AI output reliability, hallucination, academic integrity, and age-appropriate use require safeguards.
Cost And Operations
Not specified in extracted text unless noted in duration field.
Requires educators/researchers/facilitators with sufficient AI literacy and pedagogy knowledge for the target learners.
Infrastructure depends on AI tool type, learner devices, data access, and institutional policy context.
Extraction Notes
High
- group_size
This entry was automatically extracted from the PDF text and manifest metadata. Fields should be manually verified before public registry publication, especially group size, location, duration, and outcome claims.
Pre-service teachers preparedness for AI-integrated education: An investigation from perceptions, capabilities, and teachers’ identity changes
0.368
false
