Abstract:Despite the promises of ML in education, its adoption in the classroom has surfaced numerous issues regarding fairness, accountability, and transparency, as well as concerns about data privacy and student consent. A root cause of these issues is the lack of understanding of the complex dynamics of education, including teacher-student interactions, collaborative learning, and classroom environment. To overcome these challenges and fully utilize the potential of ML in education, software practitioners need to work closely with educators and students to fully understand the context of the data (the backbone of ML applications) and collaboratively define the ML data specifications. To gain a deeper understanding of such a collaborative process, we conduct ten co-design sessions with ML software practitioners, educators, and students. In the sessions, teachers and students work with ML engineers, UX designers, and legal practitioners to define dataset characteristics for a given ML application. We find that stakeholders contextualize data based on their domain and procedural knowledge, proactively design data requirements to mitigate downstream harms and data reliability concerns, and exhibit role-based collaborative strategies and contribution patterns. Further, we find that beyond a seat at the table, meaningful stakeholder participation in ML requires structured supports: defined processes for continuous iteration and co-evaluation, shared contextual data quality standards, and information scaffolds for both technical and non-technical stakeholders to traverse expertise boundaries.

A domain-specific language for describing machine learning datasets

Using Large Language Models to Enrich the Documentation of Datasets for Machine Learning

DSDL: Data Set Description Language for Bridging Modalities and Tasks in AI Data

Datasheets for Datasets

A Technology for BigData Analysis Task Description using Domain-Specific Languages

A Survey on Domain-Specific Languages for Machine Learning in Big Data

Metamorphic Domain-Specific Languages: A Journey Into the Shapes of a Language

A domain-specific language for managing ETL processes

SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle

DeepDSL: A Compilation-based Domain-Specific Language for Deep Learning

A Domain-Specific Language for Programming in the Tile Assembly Model

One DSL to Rule Them All: IDE-Assisted Code Generation for Agile Data Analysis

Understanding the Dataset Practitioners Behind Large Language Model Development

Exploring the Language of Data.

Is a Seat at the Table Enough? Engaging Teachers and Students in Dataset Specification for ML in Education

A Domain Specific Transformation Language

Developing Web-based Geographic Information Systems with a DSL: Proposal and Case Study

Dsco: A Language Modeling Approach For Time Series Classification

Metamodel specialization based DSL for DL lifecycle data management

Towards Coding Social Science Datasets with Language Models

Design and implementation of DeepDSL: A DSL for deep learning