Abstract:Today, smartphone devices are owned by a large portion of the population and have become a very popular platform for accessing the Internet. Smartphones provide the user with immediate access to information and services. However, they can easily expose the user to many privacy risks. Applications that are installed on the device and entities with access to the device's Internet traffic can reveal private information about the smartphone user and steal sensitive content stored on the device or transmitted by the device over the Internet. In this paper, we present a method to reveal various demographics and technical computer skills of smartphone users by their Internet traffic records, using machine learning classification models. We implement and evaluate the method on real life data of smartphone users and show that smartphone users can be classified by their gender, smoking habits, software programming experience, and other characteristics.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to use machine - learning classification models to reveal various demographic characteristics and technical computer skills of users by analyzing the Internet traffic records of smartphone users. Specifically, the author aims to answer the following questions: 1. **Privacy Risks**: When smartphone users use the Internet, does their Internet traffic contain information that can be used to infer their personal characteristics? Such information may include the user's gender, age group, educational background, programming experience, etc. 2. **Classification Ability**: Can users be accurately classified based on Internet traffic data, for example, according to characteristics such as gender, smoking habits, and programming experience? 3. **Feature Importance**: Which types of features (such as domain - name features, application - layer features, statistical features, and deep - packet - inspection features) are the most important in the classification process? ### Main Research Contents - **Data Collection**: The author collected Internet traffic data of 143 smartphone users through experiments and required participants to fill in questionnaires to provide their demographic characteristics and technical skills information. - **Feature Extraction**: Four types of features were extracted from Internet traffic records: - **Statistical Features**: Such as packet - size statistics of transmitted and received data, the number of bytes in a session, etc. - **Application - Layer Features**: Such as the SSL/TLS version in the HTTP/HTTPS protocol, the number of Cookies, the Content - Type field, etc. - **Domain - Name Features**: Such as Alexa ranking, WoT security score, website category, etc. - **Deep - Packet - Inspection Features**: Such as the number of HTTP forms, the presence of email addresses, usernames and password fields, etc. - **Machine - Learning Models**: Supervised - learning methods were used to train and evaluate classification models, mainly using Random Forest (RF) and Extra Trees (ET) algorithms, and the model performance was evaluated through Leave - One - Out Cross - Validation. ### Research Results - **Classification Accuracy**: The results show that some characteristics of users can be relatively accurately classified through Internet traffic data. For example, the accuracy rate of gender classification is 83.9%, and the accuracy rate of programming - experience classification is 77.8%. - **Feature Importance Analysis**: Domain - name features dominate in the classification model, indicating that the types and frequencies of websites visited by users have an important impact on the classification results. ### Conclusions and Prospects - **Privacy Risks**: Internet traffic does contain content that can be used to infer user personal information, which may pose a threat to user privacy. - **Future Work**: The author plans to expand the sample size and diversity and further explore other classification methods, such as users' network - security scores. Through this research, the author emphasizes the importance of Internet traffic data in privacy protection and calls for measures to prevent potential privacy - leakage risks.

Classification of Smartphone Users Using Internet Traffic

An Automatic User-Adapted Physical Activity Classification Method Using Smartphones.

Analysis and Applications of Smartphone User Mobility

Demographic Attributes Prediction Through App Usage Behaviors on Smartphones.

Investigating Smartphone User Differences in Their Application Usage Behaviors: an Empirical Study

User profiling using smartphone network traffic analysis

WHO ARE THE SMARTPHONE USERS?: Identifying user groups with apps usage behaviors.

Discovering Different Kinds Of Smartphone Users Through Their Application Usage Behaviors

Characterizing a User from Large-Scale Smartphone-Sensed Data.

User profiling from their use of smartphone applications: A survey

Who are the smartphone users?

Demographic Information Prediction: A Portrait of Smartphone Application Users

Gender Profiling From a Single Snapshot of Apps Installed on a Smartphone: An Empirical Study

Demographic Information Prediction Based on Smartphone Application Usage

Investigation on the Spatio-Temporal Mobility and Smartphone Usage of College Students.

Robust Smartphone App Identification via Encrypted Network Traffic Analysis

Information We Can Extract About a User From 'One Minute Mobile Application Usage'

Distinguishing Between Smartphones and IoT Devices Via Network Traffic

Smartphone Privacy Leakage of Social Relationships and Demographics from Surrounding Access Points.

Nasal Polyps and Biomarkers.

When Simpler Data Does Not Imply Less Information: A Study of User Profiling Scenarios with Constrained View of Mobile HTTP(S) Traffic