Abstract:Background: Few gamified cognitive tasks are subjected to rigorous examination of psychometric properties, despite their use in experimental and clinical settings. Even small manipulations to cognitive tasks require extensive research to understand their effects. Objective: This study aims to investigate how game elements can affect the reliability of scores on a Stroop task. We specifically investigated performance consistency within and across sessions. Methods: We created 2 versions of the Stroop task, with and without game elements, and then tested each task with participants at 2 time points. The gamified task used points and feedback as game elements. In this paper, we report on the reliability of the gamified Stroop task in terms of internal consistency and test-retest reliability, compared with the control task. We used a permutation approach to evaluate internal consistency. For test-retest reliability, we calculated the Pearson correlation and intraclass correlation coefficients between each time point. We also descriptively compared the reliability of scores on a trial-by-trial basis, considering the different trial types. Results: At the first time point, the Stroop effect was reduced in the game condition, indicating an increase in performance. Participants in the game condition had faster reaction times ( P =.005) and lower error rates ( P =.04) than those in the basic task condition. Furthermore, the game condition led to higher measures of internal consistency at both time points for reaction times and error rates, which indicates a more consistent response pattern. For reaction time in the basic task condition, at time 1, r Spearman-Brown =0.78, 95% CI 0.64-0.89. At time 2, r Spearman-Brown =0.64, 95% CI 0.40-0.81. For reaction time, in the game condition, at time 1, r Spearman-Brown =0.83, 95% CI 0.71-0.91. At time 2, r Spearman-Brown =0.76, 95% CI 0.60-0.88. Similarly, for error rates in the basic task condition, at time 1, r Spearman-Brown =0.76, 95% CI 0.62-0.87. At time 2, r Spearman-Brown =0.74, 95% CI 0.58-0.86. For error rates in the game condition, at time 1, r Spearman-Brown =0.76, 95% CI 0.62-0.87. At time 2, r Spearman-Brown =0.74, 95% CI 0.58-0.86. Test-retest reliability analysis revealed a distinctive performance pattern depending on the trial type, which may be reflective of motivational differences between task versions. In short, especially in the incongruent trials where cognitive conflict occurs, performance in the game condition reaches peak consistency after 100 trials, whereas performance consistency drops after 50 trials for the basic version and only catches up to the game after 250 trials. Conclusions: Even subtle gamification can impact task performance albeit not only in terms of a direct difference in performance between conditions. People playing the game reach peak performance sooner, and their performance is more consistent within and across sessions. We advocate for a closer examination of the impact of game elements on performance.

On the reliability of behavioral measures of cognitive control: retest reliability of task-inhibition effect, task-preparation effect, Stroop-like interference, and conflict adaptation effect

On the (un)reliability of common behavioral and electrophysiological measures from the stop signal task: Measures of inhibition lack stability over time

A measure of reliability convergence to select and optimize cognitive tasks for individual differences research

How reliable are the effects of self-control training?: A re-examination using self-report and physical measures

A proof-of-concept study testing the factor structure of the Stop Signal Task: overlap with substance use and mental health symptoms

Consistency within change: Evaluating the psychometric properties of a widely used predictive-inference task

Composite Measures of Brain Activation Predict Individual Differences in Behavioral Stroop Interference

Dissociating proactive and reactive control in the Stroop task

Methods to split cognitive task data for estimating split-half reliability: A comprehensive review and systematic assessment

The complexity of measuring reliability in learning tasks: An illustration using the Alternating Serial Reaction Time Task

Measuring the Reliability of a Gamified Stroop Task: Quantitative Experiment

Evaluating the Stroop Test With Older Adults: Construct Validity, Short Term Test-Retest Reliability, and Sensitivity to Mental Fatigue

Online Independent Versus Laboratory-Based Stop-Signal Task Performance: A Within-Subjects Counterbalanced Comparison Study (Preprint)

Test–Retest Reliability and Measurement Invariance of Executive Function Tasks in Young Children With and Without ADHD

Testing the ego-depletion effect in optimized conditions

Measuring Adaptive Control in Conflict Tasks

Test-retest reliability of the simon task: a short version proposal

Psychometrics of drift-diffusion model parameters derived from the Eriksen flanker task: Reliability and validity in two independent samples

Reliably Measuring Learning-Dependent Distractor Suppression with Eye Tracking

Test–retest reliability of reinforcement learning parameters

Inhibitory control and academic achievement - a study of the relationship between Stroop Effect and university students' academic performance