Henrique S. Xavier
Abstract:This paper presents a comprehensive analysis of global web usage patterns based on data from SimilarWeb, a leading source for estimating web traffic. Leveraging a dataset comprising over 250,000 websites, we estimate the total web traffic and investigate its distribution among domains and industry sectors. We detail the characteristics of the top 116 domains, which comprise an estimated one-third of all web traffic. Our analysis scrutinizes various attributes of these domains, including their content sources and types, access requirements, offline presence, and ownership features. Our analysis reveals a significant concentration of web traffic, with a diminutive number of top websites capturing the majority of visits. Search engines, news and media, social networks, streaming, and adult content emerge as primary attractors of web traffic, which is also highly concentrated on platforms and USA-owned websites. Much of the traffic goes to for-profit but mostly free-of-charge websites, highlighting the dominance of business models not based on paywalls.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to provide a quantitative analysis of global network usage. Specifically, based on data from SimilarWeb, the author analyzes the distribution of global network traffic and the characteristics of how this traffic is allocated among different websites and industries. The main objectives of the study include:
1. **Estimating the number of domain names required to form a representative map of global network usage and their specific domain names**: By analyzing which websites constitute the main network traffic, we can better understand the access habits of network users.
2. **Exploring the importance of topics explored on the network**: Understanding which topics are the most popular on the network can help us understand the distribution of network content and users' interests.
3. **Analyzing the common and broad website features in network access**: Such as access barriers, content sources, and ownership, etc., to reveal the key characteristics in network usage patterns.
Through the analysis of monthly visit volume data, the paper explores the following aspects:
- **Changes in monthly visit volume**: Analyzes the trend of changes in monthly visit volume and the impact of these changes on the overall data stability.
- **Traffic distribution**: It is found that network traffic presents a power - law distribution, that is, a small number of top - level websites attract most of the visit volume.
- **Popular network industries**: Analyzes in detail the traffic distribution of various industries, especially industries such as search engines, news and media, social networks, streaming media, and adult content.
- **Manual inspection of top - level domain names**: Conducts a manual inspection of 116 domain names with the highest visit volume to obtain more detailed information about these websites, including whether they provide SaaS services, how content is produced, whether login is required, whether there is a charge, whether business activities are mainly carried out online, who the ultimate owner is, etc.
Through these analyses, the paper aims to provide a comprehensive data - driven overview of global network usage, helping to understand the distribution characteristics of network traffic and the socio - economic factors behind it.