Understanding and Improving Usability of Data Dashboards for Simplified Privacy Control of Voice Assistant Data (Extended Version)

Vandit Sharma,Mainack Mondal
DOI: https://doi.org/10.48550/arXiv.2110.03080
2021-10-07
Abstract:Today, intelligent voice assistant (VA) software like Amazon's Alexa, Google's Voice Assistant (GVA) and Apple's Siri have millions of users. These VAs often collect and analyze huge user data for improving their functionality. However, this collected data may contain sensitive information (e.g., personal voice recordings) that users might not feel comfortable sharing with others and might cause significant privacy concerns. To counter such concerns, service providers like Google present their users with a personal data dashboard (called `My Activity Dashboard'), allowing them to manage all voice assistant collected data. However, a real-world GVA-data driven understanding of user perceptions and preferences regarding this data (and data dashboards) remained relatively unexplored in prior research. To that end, in this work we focused on Google Voice Assistant (GVA) users and investigated the perceptions and preferences of GVA users regarding data and dashboard while grounding them in real GVA-collected user data. Specifically, we conducted an 80-participant survey-based user study to collect both generic perceptions regarding GVA usage as well as desired privacy preferences for a stratified sample of their GVA data. We show that most participants had superficial knowledge about the type of data collected by GVA. Worryingly, we found that participants felt uncomfortable sharing a non-trivial 17.7% of GVA-collected data elements with Google. The current My Activity dashboard, although useful, did not help long-time GVA users effectively manage their data privacy. Our real-data-driven study found that showing users even one sensitive data element can significantly improve the usability of data dashboards. To that end, we built a classifier that can detect sensitive data for data dashboard recommendations with a 95% F1-score and shows 76% improvement over baseline models.
Cryptography and Security,Human-Computer Interaction
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the effectiveness of data collection and privacy control in intelligent voice assistants (such as Google Voice Assistant, GVA). Specifically, the research focuses on the following aspects: 1. **Users' awareness of voice assistant data collection and storage practices**: The study found that although most users know that Google will collect and store some forms of data, about 40% of users are not clear about what types of data are specifically collected (such as audio clips), indicating that users' understanding of data collection is rather superficial. 2. **Users' preferences for access control of data elements**: The research reveals that the proportion of participants who want to limit Google's access to specific data elements collected by their voice assistants is not low. In particular, for data collected through smartphones and smart speakers, users feel more uneasy when they don't know or can't remember the device. 3. **The effectiveness of data dashboards in controlling user privacy**: The study shows that although most participants think that Google's "My Activity" data dashboard is easy to use, for long - term GVA users, the dashboard is not effective when dealing with a large amount of data, indicating that further auxiliary means are needed to improve its effectiveness. 4. **How to improve the data dashboard by automated means to enhance users' privacy - protecting behaviors**: The research proposes a machine - learning - based method that can automatically detect sensitive data and recommend it to users, thereby significantly increasing the users' tendency to control their collected data. This method shows an F1 - score of over 95% in identifying sensitive content, which is a 76% improvement compared to the baseline model. In conclusion, this paper aims to understand users' attitudes and preferences towards voice assistant data collection and privacy control through empirical research, and explore improving the existing data dashboard through technical means (such as machine - learning - assisted sensitive - content detection) to better support users' needs for managing their personal data privacy.