Network Report: A Structured Description for Network Datasets

Xinyi Zheng,Ryan A. Rossi,Nesreen Ahmed,Dominik Moritz
DOI: https://doi.org/10.48550/arXiv.2206.03635
2022-06-08
Abstract:The rapid development of network science and technologies depends on shareable datasets. Currently, there is no standard practice for reporting and sharing network datasets. Some network dataset providers only share links, while others provide some contexts or basic statistics. As a result, critical information may be unintentionally dropped, and network dataset consumers may misunderstand or overlook critical aspects. Inappropriately using a network dataset can lead to severe consequences (e.g., discrimination) especially when machine learning models on networks are deployed in high-stake domains. Challenges arise as networks are often used across different domains (e.g., network science, physics, etc) and have complex structures. To facilitate the communication between network dataset providers and consumers, we propose network report. A network report is a structured description that summarizes and contextualizes a network dataset. Network report extends the idea of dataset reports (e.g., Datasheets for Datasets) from prior work with network-specific descriptions of the non-i.i.d. nature, demographic information, network characteristics, etc. We hope network reports encourage transparency and accountability in network research and development across different fields.
Social and Information Networks,Computers and Society,Human-Computer Interaction,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address the lack of standard practices in reporting and sharing network datasets. Currently, network dataset providers have inconsistent approaches when reporting and sharing datasets; some only provide links, while others offer some background information or basic statistics. This inconsistency can lead to the unintentional omission of critical information, causing users of network datasets to misunderstand or overlook key aspects. Improper use of network datasets can have serious consequences, especially when deploying network-based machine learning models in high-risk areas (such as healthcare, finance, etc.), potentially leading to issues like discrimination. To facilitate communication between network dataset providers and users, the authors propose the concept of a "Network Report." A Network Report is a structured description used to summarize and explain network datasets. It extends previous work on dataset reporting (such as Datasheets for Datasets) by adding descriptions specific to network characteristics, including non-independent and identically distributed (non-i.i.d.) properties, demographic information, network features, etc. The authors hope that Network Reports will encourage transparency and accountability in network research and development, and reduce the misuse and bias of datasets.