Applications of Data Mining to Electronic Commerce

Ron Kohavi,Foster Provost
DOI: https://doi.org/10.48550/arXiv.cs/0010006
2000-10-02
Abstract:Electronic commerce is emerging as the killer domain for data mining technology. The following are five desiderata for success. Seldom are they they all present in one data mining application. 1. Data with rich descriptions. For example, wide customer records with many potentially useful fields allow data mining algorithms to search beyond obvious correlations. 2. A large volume of data. The large model spaces corresponding to rich data demand many training instances to build reliable models. 3. Controlled and reliable data collection. Manual data entry and integration from legacy systems both are notoriously problematic; fully automated collection is considerably better. 4. The ability to evaluate results. Substantial, demonstrable return on investment can be very convincing. 5. Ease of integration with existing processes. Even if pilot studies show potential benefit, deploying automated solutions to previously manual processes is rife with pitfalls. Building a system to take advantage of the mined knowledge can be a substantial undertaking. Furthermore, one often must deal with social and political issues involved in the automation of a previously manual business process.
Machine Learning,Databases
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: **Why e - commerce systems have become an ideal application area for data mining technology, and how to overcome the challenges faced when applying data mining to e - commerce**. ### Detailed Interpretation: 1. **Background Problems**: - Data mining technology has existed for several decades, but has been mainly limited to the fields of computer scientists, statisticians, and hardcore business analysts. Why has the e - commerce system become a breakthrough application area for data mining technology? 2. **Key Obstacles**: - The paper cites Geoffrey Moore's book *Crossing the Chasm*, pointing out that the reasons for the failure of artificial intelligence technology to be widely adopted include difficulty in integrating into existing systems, lack of a mature design methodology, and a shortage of well - trained personnel. - Data mining technology has many similarities with AI technology, so there may be similar business dilemmas. 3. **Unique Advantages of E - commerce Systems**: - **Data Collection Control**: Compared with traditional systems, e - commerce systems can design and control the data collection process more effectively, ensuring the quality and reliability of data. - **Large and Rich Data Volume**: E - commerce systems can automatically collect a large amount of customer behavior data, including browsing history, shopping cart contents, etc., which were previously difficult to obtain or costly. - **Easy to Integrate and Evaluate**: Since e - commerce systems are already automated systems themselves, it is less difficult to apply data mining results to actual business, and it is also easier to calculate the return on investment. 4. **Specific Challenges**: - **High - Risk of Data Mining Projects**: Data mining projects usually face high risks because 80% of the workload in the entire knowledge discovery process is concentrated on non - algorithmic parts, such as data preparation, model deployment, and sociopolitical issues. - **Participation of Expert Users**: Although data mining algorithms can generate a large number of patterns, these patterns still need to be manually screened and verified by expert users to ensure their validity and practicality. 5. **Future Research Directions**: - Although e - commerce systems provide an ideal environment for data mining, further research is still needed on how to better integrate domain - specific knowledge into the data mining process, thereby improving the performance and practicality of the system. Through the above analysis, the paper explores why e - commerce systems can become an ideal application area for data mining technology and points out the key challenges that need to be overcome in this process.