To Provide API or Not? an Analysis of the Optimal Anti-crawler Strategy

Yunhao Liu,Gengzhong Feng,Yangyang Sun
DOI: https://doi.org/10.2139/ssrn.4557801
2023-01-01
SSRN Electronic Journal
Abstract:The value of big data has become increasingly recognized in this digital era. In this case, web crawlers are being used more and more rampantly, posing challenges to many websites, especially those user-generated content platforms. In response, many of them take various anti-crawler protection (ACP) measures to combat unauthorized crawling. However, considering the inconvenience brought by ACP on user experience, providing an Application Programming Interface (API) to crawlers becomes another feasible choice to control the amount and frequency of crawling. Hence, in order to explore the optimal anti-crawler strategy, we build economic models to study the optimal pricing decisions of websites under the different strategies, respectively. Through analysis, we firstly find that only offering API without executing ACP cannot be economically feasible. Under the other two strategies, Pure ACP and API strategies, the market situation can be divided into three regions depending on the unit loss brought by crawlers. Then through profit comparison, we find that specifically, providing API can be optimal only when the unit loss caused by direct crawling is sufficiently large and the cost caused by crawling via API is sufficiently small. By answering the doubts about API provision and its influence, the whole work brings significant innovations to literature and provides practical managerial insights for websites to deal with crawling issues.
What problem does this paper attempt to address?