Notably Inaccessible -- Data Driven Understanding of Data Science Notebook (In)Accessibility

Venkatesh Potluri,Sudheesh Singanamalla,Nussara Tieanklin,Jennifer Mankoff
DOI: https://doi.org/10.1145/3597638.3608417
2023-08-07
Abstract:Computational notebooks, tools that facilitate storytelling through exploration, data analysis, and information visualization, have become the widely accepted standard in the data science community. These notebooks have been widely adopted through notebook software such as Jupyter, Datalore and Google Colab, both in academia and industry. While there is extensive research to learn how data scientists use computational notebooks, identify their pain points, and enable collaborative data science practices, very little is known about the various accessibility barriers experienced by blind and visually impaired (BVI) users using these notebooks. BVI users are unable to use computational notebook interfaces due to (1) inaccessibility of the interface, (2) common ways in which data is represented in these interfaces, and (3) inability for popular libraries to provide accessible outputs. We perform a large scale systematic analysis of 100000 Jupyter notebooks to identify various accessibility challenges in published notebooks affecting the creation and consumption of these notebooks. Through our findings, we make recommendations to improve accessibility of the artifacts of a notebook, suggest authoring practices, and propose changes to infrastructure to make notebooks accessible. An accessible PDF can be obtained at https://blvi.dev/noteably-inaccessible-paper
Human-Computer Interaction,Computers and Society,Software Engineering
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the accessibility issues of computational notebooks (such as Jupyter notebooks) for Blind and Visually Impaired (BVI) users. Specifically, the paper focuses on the following aspects: 1. **Inaccessible Interface**: The user interface of computational notebooks is inaccessible to BVI users. 2. **Data Representation**: Common data representation methods in computational notebooks (such as charts and tables) are difficult for BVI users to understand and use. 3. **Inaccessible Output**: Popular libraries fail to provide accessible output, making it difficult for BVI users to effectively utilize the results in computational notebooks. ### Research Background Computational notebooks (such as Jupyter, Google Colab, etc.) have become the standard tools widely accepted by the data science community. These notebooks offer interactive data exploration, analysis, and visualization capabilities through code, natural language, and rich data representations. While many studies have explored how data scientists use computational notebooks, identified their pain points, and facilitated collaborative data science practices, there is little research on the various accessibility barriers BVI users encounter when using these notebooks. ### Research Methods To systematically analyze the accessibility issues of computational notebooks, the authors conducted a large-scale data-driven study. The specific methods include: 1. **Data Collection and Filtering**: Randomly selecting 100,000 notebooks from a dataset of 10 million Jupyter notebooks provided by JetBrains for analysis. 2. **Data Extraction**: Extracting the source code and output information from each notebook, particularly the generated images and tables. 3. **Code Syntax Analysis**: Using the Abstract Syntax Tree (AST) module to parse the source code, extracting imported modules and called functions to understand commonly used libraries and function calls. 4. **Output Type Classification**: Classifying the output into applications, images, and text based on MIME types, and further subdividing into specific categories. 5. **Data Enrichment**: Converting notebooks to HTML format, applying different themes, and evaluating the impact of different themes on accessibility. ### Research Questions The paper primarily addresses the following three specific research questions: 1. **Accessibility of Data Artifacts**: How accessible are key data artifacts (such as charts and tables) to blind or visually impaired users? 2. **Accessibility of Authoring Process**: How do existing notebook authoring practices affect the ability of screen reader users to quickly browse important information and results? For example, do most notebooks correctly use headings and other landmarks to improve navigation and quick browsing? 3. **Accessibility of Infrastructure**: How do the tools currently used for distributing and customizing notebooks affect accessibility? For example, how do different color themes impact the number of accessibility errors detected by automated tools? ### Contributions 1. **Development of Repeatable Automated Metrics**: These metrics represent an optimistic upper bound estimate of notebook accessibility. 2. **First Systematic Large-Scale Analysis**: Analyzing the current state of accessibility of computational notebooks for blind or visually impaired users, and open-sourcing the dataset and processing pipeline. 3. **Results Presentation**: Demonstrating the overall inaccessibility of notebooks in terms of data artifacts, notebook IDEs, and infrastructure, and describing the most commonly used programming tools by notebook authors. 4. **Improvement Suggestions**: Based on the research findings, proposing suggestions such as encouraging good ALT text authoring practices and automatically generating accessible tables to improve the accessibility of computational notebooks. ### Conclusion Through this study, the authors hope to reveal the accessibility issues in computational notebooks, thereby accelerating the improvement of data science tools and authoring practices, and reducing the need for customized, specialized accessibility solutions.