skip to main content
article
Free Access

Experiences with selecting search engines using metasearch

Published:01 July 1997Publication History
Skip Abstract Section

Abstract

Search engines are among the most useful and high-profile resources on the Internet. The problem of finding information on the Internet has been replaced with the problem of knowing where search engines are, what they are designed to retrieve, and how to use them. This article describes and evaluates SavvySearch, a metasearch engine designed to intelligently select and interface with multiple remote search engines. The primary metasearch issue examined is the importance of carefully selecting and ranking remote search engines for user queries. We studied the efficacy of SavvySearch's incrementally acquired metaindex approach to selecting search engines by analyzing the effect of time and experience on performance. We also compared the metaindex approach to the simpler categorical approach and showed how much experience is required to surpass the simple scheme.

References

  1. BOWMAN, C. M., DANZIG, P. B., MANBER, U., AND SCHWARTZ, M.F. 1994. Scalable internet resource discovery: Research problems and approaches. Commun. ACM 37, 8 (Aug.). Google ScholarGoogle Scholar
  2. BOWMAN, C. M., DANZIG, P. B., MANBER, U., SCHWARTZ, M. F., HARDY, D. R., AND WESSELS, D. P. 1995. Harvest: A scalable, customizable discovery and access system. Tech. Rep., Univ. of Colorado, Boulder, Colo.Google ScholarGoogle Scholar
  3. DREILINGER, D. 1996. Description and evaluation of a meta-search agent. Master's thesis, Computer Science Dept., Colorado State Univ., Fort Collins, Colo.Google ScholarGoogle Scholar
  4. EICHMANN, D. 1994. Ethical web agents. In Electronic Proceedings of the 2nd World Wide Web Conference '94: Mosaic and the Web. Elsevier, London. Available as http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/Agents/eichmann.ethical/ethics.html.Google ScholarGoogle Scholar
  5. GAUCH, S., WANG, G., AND GOMEZ, M. 1996. Profusion: Intelligent fusion from multiple, different search engines. J. Univ. Comput. Sci. 2, 9 (Sept.).Google ScholarGoogle Scholar
  6. GRAVANO, L., GARC#A-MOLINA, H., AND TOMASIC, A. 1994. Precision and recall of GLOSS estimators for database discovery. In Proceedings of the 3rd International Conference on Parallel and Distributed Information Systems (PDIS'94). IEEE Computer Society, Washington, D.C. Google ScholarGoogle Scholar
  7. SALTON, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, Mass. Google ScholarGoogle Scholar
  8. SELBERG, E. AND ETZIONI, O. 1995. Multi-service search and comparison using the MetaCrawler. In Proceedings of the 4th International World Wide Web Conference.Google ScholarGoogle Scholar
  9. SHELDON, M. A., DUDA, A., WEISS, R., AND GIFFORD, D.K. 1995. Discover: A resource discovery system based on content routing. In Proceedings of the 3rd International World Wide Web Conference. Elsevier, North Holland, Amsterdam. Google ScholarGoogle Scholar
  10. WITTEN, I. H., MOFFAT, A., AND BELL, T.C. 1994. Managing Gigabytes: Compressing and Indexing Documents and Images. Von Nostrand Reinhold, New York. Google ScholarGoogle Scholar
  11. YAN, T. W. AND GARCIA-MOLINA, H. 1995. SIFT--A tool for wide-area information dissemination. In Proceedings of the 1995 USENIX Technical Conference. USENIX Assoc., Berkeley, Calif., 177-186. Google ScholarGoogle Scholar
  12. ZILBERSTEIN, S. 1995. An anytime computation approach to information gathering. In Working Notes of the AAAI Spring Symposium Series on Information Gathering from Distributed, Heterogeneous Environments. AAAI, Menlo Park, Calif.Google ScholarGoogle Scholar

Index Terms

  1. Experiences with selecting search engines using metasearch

      Recommendations

      Reviews

      Donald Harris Kraft

      A solid background in the concept of Web search engines and metasearch engines, and some experiments on SavvySearch, a metasearch engine designed by the authors, are provided in this paper. It includes easy-to-follow definitions of concepts necessary to an understanding of information retrieval, Web search engines, and metasearch engines. It is nice to see that concepts such as search engines (designed to aid in finding Web sites, given the exponentially growing number of sites) and metasearch engines (designed to aid in deciding which search engines to use, given the rapid growth in search engines), which have been known for years by library and information scientists, are being rediscovered by computer scientists. The authors note that a metasearch engine must have a dispatch mechanism to determine which search engines to employ, an interface agent to adapt a user query into a query suitable for each search engine employed, and a display mechanism by which to return the search results to the user. The paper provides a good literature search of available metasearch engines along with their Web site URLs. The authors explain how SavvySearch uses the keywords in the user's query to rank potential search engines that will eventually rank Web sites deemed relevant to the query. They note that the top search engines can be made to search in parallel. In order to rank search engines, they keep track of term frequencies at the sites searched by each search engine, and they keep track of the frequencies of success and failure of each search engine in terms of finding relevant sites for specific terms. The ranking of the search engines is accomplished by a complex formula based on concepts analogous to ranking via term weights in standard document retrieval. The ranking includes considerations of concurrency, expected network load, and local CPU load. One nice feature of the search engine ranking mechanism is the inclusion of thresholds for response times, leading to penalties for slow searches. The paper provides the results of a series of experiments with SavvySearch. A pilot study looked at how well search engines were being selected. The authors used a large set of queries (at least 2500). They varied the ordering of the search engines and the selection of the first group of search engines to be employed. Results indicate that their approach is viable, that users like the basic approach, that users follow more links found at the beginning of a search, and that past query success can be used to improve future searches. Further experiments looked at SavvySearch enhancements, such as penalties for lack of results and frequent updating of the meta-index, which is the data structure for information about search engine successes and failures and for term frequencies. Results were mixed, but, in general, SavvySearch's approach is a good one. The bottom line is that SavvySearch has garnered increased interest and use. It takes some experience for the system to learn enough about what is out there to improve on categorical searches done by other means. The approach is especially effective at figuring out where not to search. The authors continue to search for more efficient ways to use the Web to find relevant information.

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader