Modernizing use of regression models in physics education research: a review of hierarchical linear modeling

Ben Van Dusen,Jayson Nissen
DOI: https://doi.org/10.1103/PhysRevPhysEducRes.15.020108
2019-07-04
Abstract:Physics education researchers (PER) often analyze student data with single-level regression models (e.g., linear and logistic regression). However, education datasets can have hierarchical structures, such as students nested within courses, that single-level models fail to account for. The improper use of single-level models to analyze hierarchical datasets can lead to biased findings. Hierarchical models (a.k.a., multi-level models) account for this hierarchical nested structure in the data. In this publication, we outline the theoretical differences between how single-level and multi-level models handle hierarchical datasets. We then present analysis of a dataset from 112 introductory physics courses using both multiple linear regression and hierarchical linear modeling to illustrate the potential impact of using an inappropriate analytical method on PER findings and implications. Research can leverage multi-institutional datasets to improve the field's understanding of how to support student success in physics. There is no post hoc fix, however, if researchers use inappropriate single-level models to analyze multi-level datasets. To continue developing reliable and generalizable knowledge, PER should use hierarchical models when analyzing hierarchical datasets. The supplemental materials include a sample dataset, R code to model the building and analysis presented in the paper, and an HTML output from the R code.
Physics Education
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to appropriately analyze hierarchical data sets in physics education research to avoid the biased results that may be caused by using inappropriate single - level regression models (such as linear regression). Specifically, the paper explores how to more accurately handle the hierarchical structure in student data by using Hierarchical Linear Modeling (HLM) when analyzing student data, for example, students are nested in courses. This hierarchical structure violates the independence principle assumed by single - level models and may lead to result bias. Therefore, through theoretical discussions and actual data analysis, the paper aims to show the advantages of multi - level models over traditional single - level models and emphasizes the importance of using multi - level models when analyzing data with a hierarchical structure.