Scale Effects in Web Search.

Di He,Aadharsh Kannan,Tie-Yan Liu,R. Preston McAfee,Tao Qin,Justin M. Rao
DOI: https://doi.org/10.1007/978-3-319-71924-5_21
2017-01-01
Abstract:It is a well-known statistical property that learning tends to slow down with each additional data point. Thus even if scale effects are important in web search, they could be important in a range that any viable entrant could easily achieve. In this paper we address these questions using browsing logs that give click-through-rates by query on two major search engines. An ideal experiment would be to fix the “query difficulty” and exogenously provide more or less historical data. We approximate the ideal experiment by finding queries that were not previously observed. Of these “new queries”, some grow to be moderately popular, having 1000–2000 clicks in a calendar year. We examine ranking quality during the lifespan of the query and find statistically significant improvement on the order of 2–3% and learning faster at lower levels of data. We are careful to rule out alternate explanations for this pattern. In particular, we show that the effect is not explained by new, more relevant documents entering the landscape, rather it is mainly shifting the most relevant documents to the top of the ranking. We thus conclude they represent direct scale effects. Finally, we show that scale helps link new queries to existing queries with ample historical data by forming edges in the query document bipartite graph. This “indirect knowledge” is shown to be important for “deflating uniqueness” and improving ranking.
What problem does this paper attempt to address?