Design and Implementation of Crawler Program Based on Python

Xiaoju Ma,Min Yan
DOI: https://doi.org/10.1088/1742-6596/2033/1/012205
2021-09-01
Journal of Physics: Conference Series
Abstract:With the development of computer and network technology, we often get information through the Internet. However, it is difficult for us to obtain valuable information from massive amounts of data because of the large amount of network data and complex formats. At present, researches found that web crawler technology can be automatically obtained information from internet. In this paper we takes the crawling of second-hand housing information of Anjuke Xi'an as an example. According to the crawler principle and process, the structure of Anjuke's page is first analyzed, using requests to obtain web pages, lxml to analyze web pages and SQL Server 2017 to store data to design and implement a network. The crawler program collects and stores housing information in some cities in East China through this program, and finally analyzes the housing price trend through the collected data through Excel. The results show that this program can automatically obtain housing information from the Internet, which provides a data source for later data analysis.
What problem does this paper attempt to address?