Abstract:Knowledge representation has gained in relevance as data from the ubiquitous digitization of behaviors amass and academia and industry seek methods to understand and reason about the information they encode. Success in this pursuit has emerged with data from natural language, where skip-grams and other linear connectionist models of distributed representation have surfaced scrutable relational structures which have also served as artifacts of anthropological interest. Natural language is, however, only a fraction of the big data deluge. Here we show that latent semantic structure can be informed by behavioral data and that domain knowledge can be extracted from this structure through visualization and a novel mapping of the text descriptions of elements onto this behaviorally informed representation. In this study, we use the course enrollment histories of 124,000 students at a public university to learn vector representations of its courses. From these course selection informed representations, a notable 88% of course attribute information was recovered, as well as 40% of course relationships constructed from prior domain knowledge and evaluated by analogy (e.g., Math 1B is to Honors Math 1B as Physics 7B is to Honors Physics 7B). To aid in interpretation of the learned structure, we create a semantic interpolation, translating course vectors to a bag-of-words of their respective catalog descriptions via regression. We find that representations learned from enrollment histories resolved courses to a level of semantic fidelity exceeding that of their catalog descriptions, revealing nuanced content differences between similar courses, as well as accurately describing departments the dataset had no course descriptions for. We end with a discussion of the possible mechanisms by which this semantic structure may be informed and implications for the nascent research and practice of data science.

Anna Karenina Strikes Again: Pre-Trained LLM Embeddings May Favor High-Performing Learners

Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning

An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better?

Data-driven unsupervised clustering of online learner behaviour

A Map of Knowledge

Understanding Privacy Risks of Embeddings Induced by Large Language Models

Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs

United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections

Learners Demographics Classification on MOOCs During the COVID-19: Author Profiling via Deep Learning Based on Semantic and Syntactic Representations

LANA: Towards Personalized Deep Knowledge Tracing Through Distinguishable Interactive Sequences

A university map of course knowledge

Knowledge Tracing Model and Student Profile Based on Clustering-Neural-Network

Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis

Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs

Evaluation of LLMs Biases Towards Elite Universities: A Persona-Based Exploration

Choosing Between an LLM versus Search for Learning: A HigherEd Student Perspective

Dr. GPT in Campus Counseling: Understanding Higher Education Students' Opinions on LLM-assisted Mental Health Services

Crafting Interpretable Embeddings by Asking LLMs Questions

From Complexity to Parsimony: Integrating Latent Class Analysis to Uncover Multimodal Learning Patterns in Collaborative Learning

LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users

Eliciting Latent Knowledge from Quirky Language Models