Data acquisition strategy for FTP search engine

HU Liang,YUAN Fang,QI Yun-yun
DOI: https://doi.org/10.16208/j.issn1000-7024.2009.03.045
2009-01-01
Abstract:Because the traditional FTP search engines usually adopt centralized spiders to collect data, the temporal effectiveness insuf- ficient is their major demerit. For solving this problem, an efficient data acquisition model is presented. The key technologies involve data update frequency and queue order. The data update frequency is designed to provide a balance between a good ratio of available FTP file download links and a high data acquisition frequency. The queue order is designed to optimize the order strategy of FTP sites in a data acquisition task.
What problem does this paper attempt to address?