Abstract:Crowdsourcing is a common approach to rapidly annotate large volumes of data in machine learning applications. Typically, crowd workers are compensated with a flat rate based on an estimated completion time to meet a target hourly wage. Unfortunately, prior work has shown that variability in completion times among crowd workers led to overpayment by 168% in one case, and underpayment by 16% in another. However, by setting a time limit for task completion, it is possible to manage the risk of overpaying or underpaying while still facilitating flat rate payments. In this paper, we present an analysis of the impact of a time limit on crowd worker performance and satisfaction. We conducted a human study with a maximum view time for a crowdsourced image classification task. We find that the impact on overall crowd worker performance diminishes as view time increases. Despite some images being challenging under time limits, a consensus algorithm remains effective at preserving data quality and filters images needing more time. Additionally, crowd workers' consistent performance throughout the time-limited task indicates sustained effort, and their psychometric questionnaire scores show they prefer shorter limits. Based on our findings, we recommend implementing task time limits as a practical approach to making compensation more equitable and predictable.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the problem of unfair worker compensation in crowdsourcing image classification tasks. Specifically, traditional crowdsourcing platforms usually adopt a fixed - rate compensation model, that is, paying workers according to the estimated task completion time to ensure the target hourly wage is reached. However, this method has significant problems:
1. **Unfair compensation**: The completion times of different workers vary greatly, resulting in workers being over - paid in some cases (for example, more than 168%) and under - paid in other cases (for example, as low as 16%).
2. **Job satisfaction and data quality**: Long - term tasks may cause workers to be fatigued or lose interest, which in turn affects the quality of data labeling and workers' satisfaction.
To solve these problems, the author proposes a new method - **setting task viewing time limits**. In this way, the researchers hope to achieve the following goals:
- **Fair compensation**: Ensure that workers complete tasks in the same amount of time, making compensation more fair.
- **Cost control**: Avoid over - payment and save costs.
- **Maintain data quality**: Ensure that even under time limits, the data quality will not be significantly affected.
- **Improve worker satisfaction**: Through reasonable task design, ensure that workers can complete tasks efficiently in a short time and maintain a high level of satisfaction.
For this purpose, the author conducted a human experiment, setting different image viewing time limits (100ms, 1000ms, and 2500ms), and analyzed the impact of these time limits on worker performance and satisfaction.
### Experimental design
To verify the above hypotheses, the author used the Stanford Dogs dataset, selected 10 dog species, and set three different time limits for testing. The experiment includes the following stages:
1. **Training phase**: Participants first view 50 randomly selected training pictures to familiarize themselves with each dog breed.
2. **Qualification test**: Participants need to correctly identify 27 out of 30 randomly selected pictures to ensure that they have sufficient ability.
3. **Time - limit test**: Participants view pictures and classify them within the set time limits (100ms, 1000ms, or 2500ms).
### Main research questions
The author proposes five main research questions (RQs):
1. **RQ1**: How does the time limit affect the accuracy of individual participants?
2. **RQ2**: What is the trade - off between overall performance accuracy and different time limits?
3. **RQ3**: Which images are more difficult to classify under time limits?
4. **RQ4**: How can the consensus algorithm mitigate the impact of time limits on performance accuracy?
5. **RQ5**: How does the time limit affect the satisfaction and perceived effort of crowdsourcing workers?
By answering these questions, the author hopes to provide a comprehensive understanding of the impact of time limits on crowdsourcing tasks and provide guidance for future research and practice.