ABSTRACT
Google, Yahoo and MSN all provide both web user interfaces (WUIs) and application programming interfaces (APIs) to their collections. Whether building collections of resources or studying the search engines themselves, the search engines request that researchers use their APIs and not "scrape" the WUIs. However, anecdotal evidence suggests the interfaces produce different results. We provide the first in depth quantitative analysis of the results produced by the Google, MSN and Yahoo API and WUI interfaces. We have queried both interfaces for five months and found significant discrepancies between the interfaces in several categories. In general, we found MSN to produce the most consistent results between their two interfaces. Our findings suggest that the API indexes are not older, but they are probably smaller for Google and Yahoo. We also examined how search results decay over time and built predictive models based on the observed decay rates. Based on our findings, it can take over a year for half of the top 10 results to a popular query to be replaced in Google and Yahoo; for MSN it may take only 2-3 months.
- Ask terms of service, 2006. http://sp.ask.com/en/docs/about/terms_of_service.shtml.Google Scholar
- J. Bar-Ilan. Search engine results over time-A case study on search engine stability. Cybermetrics, 2/3(1), 1998/99.Google Scholar
- J. Bar-Ilan. Methods for measuring search engine performance over time. Journal of the American Society for Information Science and Technology, 53(4):308--319, 2002. Google ScholarDigital Library
- J. Bar-Ilan. Comparing rankings of search results on the Web. Information Processing & Management, 41(6):1511--1519, Dec. 2005. Google ScholarDigital Library
- J. Bar-Ilan. Expectations versus reality-search engine features needed for web research at mid 2005. Cybermetrics, 9(1), 2005.Google Scholar
- J. Bar-Ilan, M. Levene, and M. Mat-Hassan. Dynamics of search engine rankings-A case study. In Proceedings of the 3rd International Workshop on Web Dynamics, May 2004.Google Scholar
- J. Bar-Ilan, M. Mat-Hassan, and M. Levene. Methods for comparing rankings of search engine results. Computer Networks, 50(10):1448--1463, July 2006. Google ScholarDigital Library
- Z. Bar-Yossef and M. Gurevich. Random sampling from a search engine's index. In Proceedings of WWW '06, pages 367--376, 2006. Google ScholarDigital Library
- D. Bergmark. Collection synthesis. In Proceedings of JCDL'02, pages 253--262, 2002. Google ScholarDigital Library
- D. Bergmark, C. Lagoze, and A. Sbityakov. Focused crawls, tunneling, and digital libraries. In Proceedings of ECDL'02, pages 91--106, 2002. Google ScholarDigital Library
- K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In Proceedings of WWW7, pages 379--388, 1998. Google ScholarDigital Library
- A. Broder, M. Fontura, V. Josifovski, R. Kumar,R. Motwani, S. Nabar, R. Panigrahy, A. Tomkins, and Y. Xu. Estimating corpus size via queries. In Proceedings of CIKM '06, pages 594--603, 2006. Google ScholarDigital Library
- D. Clinton. Beyond the SOAP search API, Dec. 2006. http://google-code-updates.blogspot.com/2006/12/beyond-soap-search-api.html.Google Scholar
- K. Curran and A. Doherty. Automated broadcast media monitoring using the Google API. In Proceedings of CCNC 2006, volume 2, pages 1098--1102, 2006.Google ScholarCross Ref
- M. Cutts. GoogleGuy's posts, June 2005. http://www.webmasterworld.com/forum30/29720.htm.Google Scholar
- M. Cutts. Google datacenters. Video, July 31 2006. http://video.google.com/videoplay?docid=8726665066825965913.Google Scholar
- Did-it, Enquiro, and Eyetools uncover search's Golden Triangle, 2005. http://www.enquiro.com/eye-tracking-pr.asp.Google Scholar
- W. Ding and G. Marchionini. A comparative study of web search service performance. In Proceedings of the ASIS Annual Meeting, volume 33, pages 136--142, 1996.Google Scholar
- R. Fagin, R. Kumar, and D. Sivakumar. Comparing top klists. SIAM Journal on Discrete Mathematics, 17(1):134--160, 2003. Google ScholarDigital Library
- P. Festa. Google worm targets AOL, Yahoo. Dec. 28 2004. http://news.com.com/Google+worm+targets+AOL%2C+Yahoo/2100-7349_3-5504769.html.Google Scholar
- S. Gauch, G. Wang, and M. Gomez. Profusion: Intelligent fusion from multiple, distributed search engines. Journal of Universal Computer Science, 2(9):637--649, 1996.Google Scholar
- B. Gillette. Google blacklisting researchers? Dec. 14 2004. http://www.emailbattles.com/2005/12/14/virus_aacdehdcic_ei/.Google Scholar
- Google privacy center: Terms of service, 2006. http://www.google.com/terms_of_service.html.Google Scholar
- A. Gulli and A. Signorini. The indexable web is more than 11.5 billion pages. In Proceedings of WWW '05, pages 902--903, May 2005. Google ScholarDigital Library
- T. G. Habing, T. W. Cole, and W. H. Mischo. Developing a technical registry of OAI data providers. In Proceedings of ECDL '04, pages 400--410, 2004.Google ScholarCross Ref
- N. Jain, M. Dahlin, and R. Tewari. Using Bloom filters to refine web search results. In Proceedings of the 8th International Workshop on the Web and Databases, 2005.Google Scholar
- M. Klein, M. L. Nelson, and J. Z. Pao. Augmenting OAI-PMH repository holdings using search engine APIs. In Proceedings of JCDL '07, 2007. Google ScholarDigital Library
- W. Koehler. A longitudinal study of web pages continued: A consideration of document persistence. Information Research, 9(2), 2004.Google Scholar
- M. Koo and H. Skinner. Improving web searches: Case study of quit-smoking web sites for teenagers. Journal of Medical Internet Research, 5(4), Nov. 2003.Google ScholarCross Ref
- R. Kraft and R. Stata. Finding buying guides with a web carnivore. In First Latin American Web Congress (LA-WEB'03), pages 84--92, 2003. Google ScholarDigital Library
- S. Lawrence and C. L. Giles. Accessibility of information on the web. Intelligence, 11(1):32--39, 2000. Google ScholarDigital Library
- The Lycos 50, 2006. http://50.lycos.com/.Google Scholar
- P. Mayr and F. Tosques. Google Web APIs - an instrument for webometric analyses? In Proceedings of the 10th International Conference of the International Society for Scientometrics and Informetrics (ISSI '05), 2005.Google Scholar
- F. McCown. Comparison of search engine interfaces, 2006. http://www.cs.odu.edu/~fmccown/research/se_apis/.Google Scholar
- F. McCown, J. Bollen, and M. L. Nelson. Evaluation of the NSDL and Google for obtaining pedagogical resources. In Proceedings of ECDL '05, pages 344--355, 2005. Google ScholarDigital Library
- F. McCown, S. Chan, M. L. Nelson, and J. Bollen. The availability and persistence of web references in D-Lib Magazine. In Proceedings of the 5th International Web Archiving Workshop (IWAW '05), Sept. 2005.Google Scholar
- F. McCown, X. Liu, M. L. Nelson, and M. Zubair. Search engine coverage of the OAI-PMH corpus. IEEE Internet Computing, 10(2):66--73, Mar/Apr 2006. Google ScholarDigital Library
- M. Moffatt. Yahoo error: Unable to process request at this time - error 999. Feb. 14 2005. http://murraymoffatt.com/software-problem-0011.html.Google Scholar
- MSN terms of service, 2006. http://tou.live.com/en-us/default.aspx.Google Scholar
- MSN Web Search API. http://msdn.microsoft.com/msn/msnsearch/.Google Scholar
- G. Pant. Deriving link-context from HTML tag tree. In Proceedings of DMKD '03, pages 49--55, 2003. Google ScholarDigital Library
- G. Pant, K. Tsioutsiouliklis, J. Johnson, and C. L. Giles. Panorama: extending digital libraries with topical crawlers. In Proceedings of JCDL '04, pages 142--150, 2004. Google ScholarDigital Library
- R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with Sawzall. Dynamic Grids and Worldwide Computing, 13(4):277--298, Nov. 2005. Google ScholarDigital Library
- C. Snelson. Sampling the Web: The development of a custom search tool for research. Library and Information Science Research Electronic Journal, 16(1), Dec. 2005.Google Scholar
- A. Spink, B. J. Jansen, C. Blakely, and S. Koshman. A study of results overlap and uniqueness among major web search engines. Information Processing & Management, 42(5):1379--1391, Sept. 2006. Google ScholarDigital Library
- K. C. Sua, S. E. Waldren, and T. B. Patrick. Differences inthe effects of filters on health information retrieval from the internet in three languages from three countries: A comparative study. In Proceedings of MEDINFO 2004, 2004.Google Scholar
- M. Thelwall. Can the Web give useful information about commercial uses of scientific research? Online Information Review, 28:120--130, 2004.Google ScholarCross Ref
- L. Vaughan. New measurements for search engine evaluation proposed and tested. Information Processing & Management, 40(4):677--691, May 2004. Google ScholarDigital Library
- What's a "supplemental result?" Google Webmaster Help Center, 2006. http://www.google.com/support/webmasters/bin/answer.py?answer=34473.Google Scholar
- Wikipedia: List of basic computer science topics, 2006. http://en.wikipedia.org/wiki/List_of_basic_computer_science_topics.Google Scholar
- Yahoo! Web Search APIs. http://developer.yahoo.net/search/web/.Google Scholar
- Z. Zhuang, R. Wagle, and L. C. Giles. What's there and what's not?: Focused crawling for missing documents in digital libraries. In Proceedings of JCDL '05, pages 301--310, 2005. Google ScholarDigital Library
Index Terms
- Agreeing to disagree: search engines and their public interfaces
Recommendations
Search engines and their public interfaces: which apis are the most synchronized?
WWW '07: Proceedings of the 16th international conference on World Wide WebResearchers of commercial search engines often collect datausing the application programming interface (API) or by"scraping" results from the web user interface (WUI), butanecdotal evidence suggests the interfaces produce differentresults. We provide ...
Auditing the Personalization and Composition of Politically-Related Search Engine Results Pages
WWW '18: Proceedings of the 2018 World Wide Web ConferenceSearch engines are a primary means through which people obtain information in today»s connected world. Yet, apart from the search engine companies themselves, little is known about how their algorithms filter, rank, and present the web to users. This ...
Research on Automatic Summarization Based on Search Engine Result
WISM '09: Proceedings of the 2009 International Conference on Web Information Systems and MiningApplying automatic summarization to search engine can make it easier for users to find out the content of the web page. In this paper, the results of search engine are analyzed. On the basis of query keywords expansion, we propose a new summary approach ...
Comments