A Methodology and System For Big-Thick Data Collection

Ivan Kayongo,Haonan Zhao,Leonardo Malcotti,Fausto Giunchiglia
2024-07-01
Abstract:Pervasive sensors have become essential in research for gathering real-world data. However, current studies often focus solely on objective data, neglecting subjective human contributions. We introduce an approach and system for collecting big-thick data, combining extensive sensor data (big data) with qualitative human feedback (thick data). This fusion enables effective collaboration between humans and machines, allowing machine learning to benefit from human behavior and interpretations. Emphasizing data quality, our system incorporates continuous monitoring and adaptive learning mechanisms to optimize data collection timing and context, ensuring relevance, accuracy, and reliability. The system comprises three key components: a) a tool for collecting sensor data and user feedback, b) components for experiment planning and execution monitoring, and c) a machine-learning component that enhances human-machine interaction.
Human-Computer Interaction
What problem does this paper attempt to address?
The problem this paper attempts to address is how to collect high-quality "Big-Thick Data," which is a dataset that combines large amounts of sensor data (big data) with human qualitative feedback (thick data). Current research often focuses only on objective data while neglecting subjective human contributions, leading to data quality that does not meet the needs. Specifically, the paper points out the following main issues: 1. **Insufficient data quality**: Existing data collection methods typically rely on fixed or random schedules, which may not align with participants' availability and willingness, thus affecting data quality. 2. **Excessive machine questioning**: Machines need to ask humans a large number of questions, which may result in low-quality responses. 3. **Lack of context**: Current methods fail to adequately consider the specific context in which individuals are situated during data collection, affecting the relevance and accuracy of the data. To address these issues, the paper proposes a new system and method aimed at improving the data collection process through the following ways: - **Combining sensor data and human feedback**: Integrating large amounts of sensor data with qualitative user feedback to obtain a more comprehensive data perspective. - **Adaptive learning mechanism**: Using a machine learning component to dynamically adjust the timing and context of data collection based on participants' availability and willingness, ensuring data relevance and accuracy. - **Flexible user interaction**: Allowing participants to adjust the data collection schedule according to their own situations, reducing user interference. - **Real-time monitoring and visualization**: Providing real-time data monitoring and visualization tools through a dashboard to help researchers and participants better understand and manage the data. In summary, the goal of this paper is to develop a system that can efficiently and high-quality collect "Big-Thick Data" while ensuring user-friendliness and data quality.