skip to main content
10.1145/1255175.1255237acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
Article

Agreeing to disagree: search engines and their public interfaces

Published:18 June 2007Publication History

ABSTRACT

Google, Yahoo and MSN all provide both web user interfaces (WUIs) and application programming interfaces (APIs) to their collections. Whether building collections of resources or studying the search engines themselves, the search engines request that researchers use their APIs and not "scrape" the WUIs. However, anecdotal evidence suggests the interfaces produce different results. We provide the first in depth quantitative analysis of the results produced by the Google, MSN and Yahoo API and WUI interfaces. We have queried both interfaces for five months and found significant discrepancies between the interfaces in several categories. In general, we found MSN to produce the most consistent results between their two interfaces. Our findings suggest that the API indexes are not older, but they are probably smaller for Google and Yahoo. We also examined how search results decay over time and built predictive models based on the observed decay rates. Based on our findings, it can take over a year for half of the top 10 results to a popular query to be replaced in Google and Yahoo; for MSN it may take only 2-3 months.

References

  1. Ask terms of service, 2006. http://sp.ask.com/en/docs/about/terms_of_service.shtml.Google ScholarGoogle Scholar
  2. J. Bar-Ilan. Search engine results over time-A case study on search engine stability. Cybermetrics, 2/3(1), 1998/99.Google ScholarGoogle Scholar
  3. J. Bar-Ilan. Methods for measuring search engine performance over time. Journal of the American Society for Information Science and Technology, 53(4):308--319, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Bar-Ilan. Comparing rankings of search results on the Web. Information Processing & Management, 41(6):1511--1519, Dec. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Bar-Ilan. Expectations versus reality-search engine features needed for web research at mid 2005. Cybermetrics, 9(1), 2005.Google ScholarGoogle Scholar
  6. J. Bar-Ilan, M. Levene, and M. Mat-Hassan. Dynamics of search engine rankings-A case study. In Proceedings of the 3rd International Workshop on Web Dynamics, May 2004.Google ScholarGoogle Scholar
  7. J. Bar-Ilan, M. Mat-Hassan, and M. Levene. Methods for comparing rankings of search engine results. Computer Networks, 50(10):1448--1463, July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Z. Bar-Yossef and M. Gurevich. Random sampling from a search engine's index. In Proceedings of WWW '06, pages 367--376, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Bergmark. Collection synthesis. In Proceedings of JCDL'02, pages 253--262, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Bergmark, C. Lagoze, and A. Sbityakov. Focused crawls, tunneling, and digital libraries. In Proceedings of ECDL'02, pages 91--106, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In Proceedings of WWW7, pages 379--388, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Broder, M. Fontura, V. Josifovski, R. Kumar,R. Motwani, S. Nabar, R. Panigrahy, A. Tomkins, and Y. Xu. Estimating corpus size via queries. In Proceedings of CIKM '06, pages 594--603, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Clinton. Beyond the SOAP search API, Dec. 2006. http://google-code-updates.blogspot.com/2006/12/beyond-soap-search-api.html.Google ScholarGoogle Scholar
  14. K. Curran and A. Doherty. Automated broadcast media monitoring using the Google API. In Proceedings of CCNC 2006, volume 2, pages 1098--1102, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Cutts. GoogleGuy's posts, June 2005. http://www.webmasterworld.com/forum30/29720.htm.Google ScholarGoogle Scholar
  16. M. Cutts. Google datacenters. Video, July 31 2006. http://video.google.com/videoplay?docid=8726665066825965913.Google ScholarGoogle Scholar
  17. Did-it, Enquiro, and Eyetools uncover search's Golden Triangle, 2005. http://www.enquiro.com/eye-tracking-pr.asp.Google ScholarGoogle Scholar
  18. W. Ding and G. Marchionini. A comparative study of web search service performance. In Proceedings of the ASIS Annual Meeting, volume 33, pages 136--142, 1996.Google ScholarGoogle Scholar
  19. R. Fagin, R. Kumar, and D. Sivakumar. Comparing top klists. SIAM Journal on Discrete Mathematics, 17(1):134--160, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Festa. Google worm targets AOL, Yahoo. Dec. 28 2004. http://news.com.com/Google+worm+targets+AOL%2C+Yahoo/2100-7349_3-5504769.html.Google ScholarGoogle Scholar
  21. S. Gauch, G. Wang, and M. Gomez. Profusion: Intelligent fusion from multiple, distributed search engines. Journal of Universal Computer Science, 2(9):637--649, 1996.Google ScholarGoogle Scholar
  22. B. Gillette. Google blacklisting researchers? Dec. 14 2004. http://www.emailbattles.com/2005/12/14/virus_aacdehdcic_ei/.Google ScholarGoogle Scholar
  23. Google privacy center: Terms of service, 2006. http://www.google.com/terms_of_service.html.Google ScholarGoogle Scholar
  24. A. Gulli and A. Signorini. The indexable web is more than 11.5 billion pages. In Proceedings of WWW '05, pages 902--903, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. G. Habing, T. W. Cole, and W. H. Mischo. Developing a technical registry of OAI data providers. In Proceedings of ECDL '04, pages 400--410, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  26. N. Jain, M. Dahlin, and R. Tewari. Using Bloom filters to refine web search results. In Proceedings of the 8th International Workshop on the Web and Databases, 2005.Google ScholarGoogle Scholar
  27. M. Klein, M. L. Nelson, and J. Z. Pao. Augmenting OAI-PMH repository holdings using search engine APIs. In Proceedings of JCDL '07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Koehler. A longitudinal study of web pages continued: A consideration of document persistence. Information Research, 9(2), 2004.Google ScholarGoogle Scholar
  29. M. Koo and H. Skinner. Improving web searches: Case study of quit-smoking web sites for teenagers. Journal of Medical Internet Research, 5(4), Nov. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  30. R. Kraft and R. Stata. Finding buying guides with a web carnivore. In First Latin American Web Congress (LA-WEB'03), pages 84--92, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Lawrence and C. L. Giles. Accessibility of information on the web. Intelligence, 11(1):32--39, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. The Lycos 50, 2006. http://50.lycos.com/.Google ScholarGoogle Scholar
  33. P. Mayr and F. Tosques. Google Web APIs - an instrument for webometric analyses? In Proceedings of the 10th International Conference of the International Society for Scientometrics and Informetrics (ISSI '05), 2005.Google ScholarGoogle Scholar
  34. F. McCown. Comparison of search engine interfaces, 2006. http://www.cs.odu.edu/~fmccown/research/se_apis/.Google ScholarGoogle Scholar
  35. F. McCown, J. Bollen, and M. L. Nelson. Evaluation of the NSDL and Google for obtaining pedagogical resources. In Proceedings of ECDL '05, pages 344--355, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. McCown, S. Chan, M. L. Nelson, and J. Bollen. The availability and persistence of web references in D-Lib Magazine. In Proceedings of the 5th International Web Archiving Workshop (IWAW '05), Sept. 2005.Google ScholarGoogle Scholar
  37. F. McCown, X. Liu, M. L. Nelson, and M. Zubair. Search engine coverage of the OAI-PMH corpus. IEEE Internet Computing, 10(2):66--73, Mar/Apr 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Moffatt. Yahoo error: Unable to process request at this time - error 999. Feb. 14 2005. http://murraymoffatt.com/software-problem-0011.html.Google ScholarGoogle Scholar
  39. MSN terms of service, 2006. http://tou.live.com/en-us/default.aspx.Google ScholarGoogle Scholar
  40. MSN Web Search API. http://msdn.microsoft.com/msn/msnsearch/.Google ScholarGoogle Scholar
  41. G. Pant. Deriving link-context from HTML tag tree. In Proceedings of DMKD '03, pages 49--55, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. G. Pant, K. Tsioutsiouliklis, J. Johnson, and C. L. Giles. Panorama: extending digital libraries with topical crawlers. In Proceedings of JCDL '04, pages 142--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with Sawzall. Dynamic Grids and Worldwide Computing, 13(4):277--298, Nov. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. C. Snelson. Sampling the Web: The development of a custom search tool for research. Library and Information Science Research Electronic Journal, 16(1), Dec. 2005.Google ScholarGoogle Scholar
  45. A. Spink, B. J. Jansen, C. Blakely, and S. Koshman. A study of results overlap and uniqueness among major web search engines. Information Processing & Management, 42(5):1379--1391, Sept. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. K. C. Sua, S. E. Waldren, and T. B. Patrick. Differences inthe effects of filters on health information retrieval from the internet in three languages from three countries: A comparative study. In Proceedings of MEDINFO 2004, 2004.Google ScholarGoogle Scholar
  47. M. Thelwall. Can the Web give useful information about commercial uses of scientific research? Online Information Review, 28:120--130, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  48. L. Vaughan. New measurements for search engine evaluation proposed and tested. Information Processing & Management, 40(4):677--691, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. What's a "supplemental result?" Google Webmaster Help Center, 2006. http://www.google.com/support/webmasters/bin/answer.py?answer=34473.Google ScholarGoogle Scholar
  50. Wikipedia: List of basic computer science topics, 2006. http://en.wikipedia.org/wiki/List_of_basic_computer_science_topics.Google ScholarGoogle Scholar
  51. Yahoo! Web Search APIs. http://developer.yahoo.net/search/web/.Google ScholarGoogle Scholar
  52. Z. Zhuang, R. Wagle, and L. C. Giles. What's there and what's not?: Focused crawling for missing documents in digital libraries. In Proceedings of JCDL '05, pages 301--310, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Agreeing to disagree: search engines and their public interfaces

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        JCDL '07: Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
        June 2007
        534 pages
        ISBN:9781595936448
        DOI:10.1145/1255175

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 June 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate415of1,482submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader