Efficiency Analysis on Robots Exclusion Protocol Based on Game Theory

Wei Li,Jian Liao,Jianping Zeng
DOI: https://doi.org/10.1109/icasid.2019.8925189
2019-01-01
Abstract:As we know, there is a dilemma between Web servers and crawlers. The server may allow certain Web crawlers to visit it to harvest some useful feedbacks, while the interactions might draw the server into a vulnerable situation like being overloaded and consequently even harm the network security with possible DDOS (distributed denials of service attacks). Robots exclusion protocol (REP) which is implemented by robots.txt, serves as a reminder to inform crawlers whether or not a specific page they can crawl. However, several acute cases show that many crawlers do not obey the rules defined in the robots.txt. Hence, it is necessary to explore the validity of REP so that the key factors that lead to the disobey situation can be found. In this paper, we try to answer the question why many crawlers do not obey the REP from a point of view of game theory. We represent the interactions between servers and crawlers in the framework of game theory. The Nash equilibrium of the game model is attained, and the result shows that the best strategy for both parties of Web servers and crawlers in the context of present robots.txt will be the (Disobey, Disobey). Furthermore, several key factors, such as the reputation loss of the crawler, the cost of servers to take the crawler to court, and the benefit the server can get from crawlers, are shown to have influences in the effective implementation of REP.
What problem does this paper attempt to address?