Case ReportPublished curriculum / implementation paper2023

AAB-CASE-2026-RV-106

Data Labeling for Machine Learning Engineers: Project-Based Curriculum and Data-Centric Competitions

This page documents an AI literacy or AI education case for registry purposes. It is descriptive and does not imply AAB endorsement of any specific tool, provider, or intervention.

Implementation

Source publication / research team or educational organization described in paper

Learning context

Higher education

AI role

Learning object / concept model

Outcome signal

AI literacy

Implementing Organization

Organization Type

Source publication / research team or educational organization described in paper

Location

Not specified in extracted text

Primary Facilitator Role

Researchers, educators, instructors, or facilitators as described in the source publication

Learning Context

Setting Type

Higher education
Professional / adult learning

Session Format

Course implementation or course design

Duration

Not specified in extracted text

Group Size

Not specified in extracted text

Devices

ML concepts / supervised learning

Constraints

The paper provides limited implementation detail in the extracted abstract; additional manual review may be needed for local replication.

Learner Profile

Age Range

Higher education, Adult / workforce

Prior AI Exposure Assumed

Mixed or not explicitly specified; infer from target learner group and intervention design.

Prior Programming Background Assumed

Varies by intervention; not specified unless the paper explicitly describes prerequisites.

Educational Intent

Primary Learning Goals

Document the AI education intervention, course, tool, or resource described in the source publication.
Extract the learner context, AI role, pedagogy, outcomes, and constraints for AAB registry comparison.
The process of training and evaluating machine learning (ML) models relies on high-quality and timely annotated datasets.

Secondary Learning Goals

Support AAB comparison across AI literacy, AI education, teacher training, higher education, and workforce contexts.
Capture evidence maturity, transferability, and limitations rather than treating the publication as product endorsement.

What This Was Not

Not an AAB endorsement of the tool, curriculum, provider, or result.
Not a direct replication record unless the source paper reports implementation details sufficient for replication.

AI Tool Description

Tool Type

ML concepts / supervised learning

Languages

Not specified in extracted text

AI Role

Learning object / concept model

User Interaction Model

Primary interaction pattern inferred from publication: Curriculum / course design.
AI capability focus: ML concepts / supervised learning.

Safeguards

Apply standard AAB safeguards: privacy, transparency, human oversight, and documentation of limitations.

Activity Design

Activity Flow

Review the publication’s reported context, learner group, AI tool or curriculum, implementation process, and outcome evidence.
Map the case to AAB registry fields for comparison across educational levels and AI capability types.
Use the source publication and PDF for any manual verification before public registry release.

Human Vs AI Responsibilities

Human educators/researchers remain responsible for instructional design, supervision, interpretation, and ethical safeguards.
AI systems or AI concepts provide the learning object, support tool, evaluator, simulator, or automation context depending on the paper.

Scaffolding Strategies

Project-based learning
Registry extraction emphasizes explicit learning goals, observed outcomes, constraints, and safety limitations.

Observed Challenges

Educators Reported

The paper provides limited implementation detail in the extracted abstract; additional manual review may be needed for local replication.

Design Adaptations

Adaptations

Case classified under: Published curriculum / implementation paper.
Pedagogical pattern: Project-based learning.
Any additional adaptations should be verified against the full paper before public-facing publication.

Reported Outcomes

Engagement

Engagement evidence should be interpreted according to the source paper’s reported method and sample.
To fill the need for this competency, we created a semester course on Data Collection and Labeling for Ma- chine Learning, integrated into a bachelor program that trains data analysts and ML engineers.

Learning Signals

To fill the need for this competency, we created a semester course on Data Collection and Labeling for Ma- chine Learning, integrated into a bachelor program that trains data analysts and ML engineers.

Educators Reflection

Ethical & Privacy Considerations

Privacy

Apply standard AAB safeguards: privacy, transparency, human oversight, and documentation of limitations.

Evidence Type

Evidence

Activity documentation

Relevance to Research

Potential Research Use

Can be used as an AAB evidence record for cross-case comparison, standards drafting, and evidence-maturity mapping.
Supports identification of recurring patterns in AI literacy, AI education implementation, teacher preparation, assessment, and responsible AI learning.

Relevant Research Domains

AI literacy
Conceptual understanding
Curriculum / course design
ML concepts / supervised learning

Case Status

Completed

AAB Classification Tags

Age

Higher education, Adult / workforce

Setting

Higher education, Professional / adult learning

AI Function

ML concepts / supervised learning

Pedagogy

Project-based learning

Risk Level

Low to Medium

Data Sensitivity

Medium

Source Publication

Title

Data Labeling for Machine Learning Engineers: Project-Based Curriculum and Data-Centric Competitions

Authors

Anastasia Zhdanovskaya
Daria Baidakova
Dmitry Ustalov

Venue

Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37 No. 13, EAAI-23

Year

2023

Doi

10.1609/aaai.v37i13.26886

Source URL

https://ojs.aaai.org/index.php/AAAI/article/view/26886

Pdf URL

https://ojs.aaai.org/index.php/AAAI/article/view/26886/26658

Pdf Filename

077_Data Labeling for Machine Learning Engineers_ Project-Based Curriculum and Data-Centric Competitions.pdf

Page Count

Abstract

The process of training and evaluating machine learning (ML) models relies on high-quality and timely annotated datasets. While a significant portion of academic and indus- trial research is focused on creating new ML methods, these communities rely on open datasets and benchmarks. How- ever, practitioners often face issues with unlabeled and un- available data specific to their domain. We believe that build- ing scalable and sustainable processes for collecting data of high quality for ML is a complex skill that needs focused de- velopment. To fill the need for this competency, we created a semester course on Data Collection and Labeling for Ma- chine Learning, integrated into a bachelor program that trains data analysts and ML engineers. The course design and deliv- ery illustrate how to overcome the challenge of putting uni- versity students with a theoretical background in mathemat- ics, computer science, and physics through a program that is substantially different from their educational habits. Our goal was to motivate students to focus on practicing and master- ing a skill that was considered unnecessary to their work. We created a system of inverse ML competitions that showed the students how high-quality and relevant data affect their work with ML models, and their mindset changed completely in the end. Project-based learning with increasing complexity of conditions at each stage helped to raise the satisfaction in- dex of students accustomed to difficult challenges. During the course, our invited industry practitioners drew on their first- hand experience with data, which helped us avoid overtheo- rizing and made the course highly applicable to the students’ future career paths.

Transferability

Best Fit Contexts

Higher education
Professional / adult learning

Likely Failure Modes

The paper provides limited implementation detail in the extracted abstract; additional manual review may be needed for local replication.

Cost And Operations

Time Cost Notes

Not specified in extracted text unless noted in duration field.

Staffing Notes

Requires educators/researchers/facilitators with sufficient AI literacy and pedagogy knowledge for the target learners.

Infra Notes

Infrastructure depends on AI tool type, learner devices, data access, and institutional policy context.

Extraction Notes

Confidence

High

Missing Information

group_size
duration

Reasoning Limits

This entry was automatically extracted from the PDF text and manifest metadata. Fields should be manually verified before public registry publication, especially group size, location, duration, and outcome claims.

Duplicate Check Against Uploaded Cases Json

Closest Existing Title

Artificial Intelligence (AI) in early childhood education: Curriculum design and future directions

Similarity Score

0.424

Likely Duplicate

false

Registry Metadata

Case ID

AAB-CASE-2026-RV-106

Publication Status

Published curriculum / implementation paper

Data Labeling for Machine Learning Engineers: Project-Based Curriculum and Data-Centric Competitions

Implementation

Learning context

AI role

Outcome signal

Registry Facets

Implementing Organization

Learning Context

Learner Profile

Educational Intent

AI Tool Description

Activity Design

Observed Challenges

Design Adaptations

Reported Outcomes

Ethical & Privacy Considerations

Evidence Type

Relevance to Research

Case Status

AAB Classification Tags

Source Publication

Transferability

Cost And Operations

Extraction Notes

Registry Metadata