Eksplorasi Teknik Web Scraping pada Data Mining: Pendekatan Pencarian Data Berbasis Python

Debora Chrisinta,Justin Eduardo Simarmata
DOI: https://doi.org/10.30998/faktorexacta.v17i1.22393
2024-05-02
Faktor Exacta
Abstract:Web scraping was an automated information extraction technique from web pages for data collection and was applied in data mining. Two common algorithms used in data mining are clustering and classification. The data source used originated from the Google Search Engine. The design of the web scraping script using Python was implemented to collect data, process HTML, and extract information from web pages. Data was successfully gathered from the Google Search Engine regarding tourism, with the number of links and processing time measured. Data processing involved cleaning the data and implementing hierarchical clustering algorithms. The evaluation was carried out by selecting the optimal number of clusters using the Dunn index. Subsequently, the data was used to train a decision tree model, and the results were evaluated using accuracy, confusion matrix, and classification reports. The results of this research indicated that the importance of web scraping in data mining could provide a comprehensive understanding of the effectiveness of web scraping techniques and the application of data mining.
What problem does this paper attempt to address?