article

Probabilistic information retrieval approach for ranking of database query results

Authors:
Surajit Chaudhuri

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Gautam Das

University of Texas at Arlington, Arlington, TX

University of Texas at Arlington, Arlington, TX
View Profile

,
Vagelis Hristidis

Florida International University, Miami, FL

Florida International University, Miami, FL
View Profile

,
Gerhard Weikum

Max Planck Institut fur Informatik, Saarbrücken, Germany

Max Planck Institut fur Informatik, Saarbrücken, Germany
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 31 Issue 3pp 1134–1168https://doi.org/10.1145/1166074.1166085

Published:01 September 2006Publication History

ACM Transactions on Database Systems

Abstract

We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from information retrieval for structured data. Our solution is domain independent and leverages data and workload statistics and correlations. We evaluate the quality of our approach with a user survey on a real database. Furthermore, we present and experimentally evaluate algorithms to efficiently retrieve the top ranked results, which demonstrate the feasibility of our ranking system.

References

Agrawal, S., Chaudhuri, S., and Das, G. 2002. DBXplorer: A system for keyword based search over relational databases. In proceedings of ICDE. Google ScholarDigital Library
Agrawal, S., Chaudhuri, S., Das, G., and Gionis, A. 2003. Automated ranking of database query results. In proceedings of CIDR.Google Scholar
Amer-Yahia, S., Case, P., Roelleke, T., Shanmugasundaram, J., and Weikum. G. 2005a. Report on the DB/IR panel at SIGMOD 2005. ACM SIGMOD Rec. 34, 4, 71--74. Google ScholarDigital Library
Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., and Toman, D. 2005b. Structure and content scoring for XML. In proceedings of VLDB. Google ScholarDigital Library
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A. I. 1995. Fast discovery of association rules. In proceedings of KDD.Google Scholar
Barbara, D., Garcia-Molina, H., and Porter, D. 1992. The management of probabilistic data. IEEE Trans. Knoual. Data Eng. 4, 5, 487--502. Google ScholarDigital Library
Bruno, N., Gravano, L., and Chaudhuri, S. 2002a. Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Database Syst. Google ScholarDigital Library
Bruno, N., Gravano, L., and Marian, A. 2002b. Evaluating top-k queries over Web-accessible databases. In proceedings of ICDE. Google ScholarDigital Library
Breese, J., Heckerman, D., and Kadie, C. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In proceedings of the 14th Conference on Uncertainty in Artificial Intelligence. Google ScholarDigital Library
Bhalotia, G., Nakhe, C., Hulgeri, A., Chakrabarti, S., and Sudarshan, S. 2002. Keyword searching and browsing in databases using BANKS. In Proceedings of ICDE. Google ScholarDigital Library
Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval, 1st ed. Addison-Wesley, Reading, MA. Google ScholarDigital Library
Cestnik, B. 1990. Estimating probabilities: A crucial task in machine learning. In Proceedings of the European Conference on artificial Intelligence.Google Scholar
Cavallo, R. and Pittarelli, M. 1987. The theory of probabilistic databases. In Proceedings of VLDB. Google ScholarDigital Library
Chaudhuri, S., Das, G., Hristidis, V., and Weikum, G. 2004. Probabilistic ranking of database query results. In Proceedings of VLDB. Google ScholarDigital Library
Chinenyanga, T. T. and Kushmerick, N. 2002. An expressive and efficient language for XML information retrieval. J. Amer. Soc. Inform. Sci. Tech. 53, 6, 438--453. Google ScholarDigital Library
Croft, W. B. and Lafferty, J. 2003. Language Modeling for Information Retrieval. Kluwer, Norwell, MA. Google ScholarDigital Library
Carmel, D, Maarek, Y. S., Mandelbrod, M., Mass, Y., and Soffer, A. 2003. Searching XML documents via XML fragments. In Proceedings of SIGIR. Google ScholarDigital Library
Cohen, W. 1998a. Integration of heterogeneous databases without common domains using queries based on textual similarity. In Proceedings of SIGMOD. Google ScholarDigital Library
Cohen, W. 1998b. Providing database-like access to the Web using queries based on textual similarity. In Proceedings of SIGMOD. Google ScholarDigital Library
Chakrabarti, K., Porkaew, K., and Mehrotra, S. 2000. Efficient query references in multimedia databases. In Proceedings of ICDE. Google ScholarDigital Library
Dalvi, N. N. and Suciu, D. 2005. Answering queries from statistics and probabilistic Views. In Proceedings of VLDB. Google ScholarDigital Library
Fagin, R. 1998. Fuzzy queries in multimedia database systems. In Proceedings of PODS. Google ScholarDigital Library
Fagin, R., Lotem, A., and Naor, M. 2001. Optimal aggregation algorithms for middleware. In Proceedings of PODS. Google ScholarDigital Library
Fuhr, N. 1990. A probabilistic framework for vague queries and imprecise information in databases. In Proceedings of VLDB. Google ScholarDigital Library
Fuhr, N. 1993. A probabilistic relational model for the integration of IR and databases. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarDigital Library
Fuhr, N. and Grossjohann, K. 2004. XIRQL: An XML query language based on information retrieval concepts. ACM Trans. Inform. Syst. 22, 2, 313--356. Google ScholarDigital Library
Fuhr, N. and Roelleke, T. 1997. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inform. Syst. 15, 1, 32--66. Google ScholarDigital Library
Fuhr, N. and Roelleke, T. 1998. HySpirit---a probabilistic inference engine for hypermedia retrieval in large databases. In Proceedings of EDBT. Google ScholarDigital Library
Grossman, D. and Frieder, O. 2004. Information Retrieval---Algorithms and Heuristics. Springer, Berlin, Germany. Google ScholarDigital Library
Güntzer, U., Balke, W.-T., and Kiessling, W. 2000. Optimizing multi-feature queries for image databases. In Proceedings of VLDB. Google ScholarDigital Library
Guo, L., Shao, F., Botev, C., and Shanmugasundaram. J. 2003. XRANK: Ranked keyword search over XML documents. In Proceedings of SIGMOD. Google ScholarDigital Library
Harper, D. and Van Rijsbergen, C. J. 1978. An evaluation of feedback in document retrieval using co-occurrence data. J. Document. 34, 3, 189--216.Google ScholarCross Ref
Hristidis, V. and Papakonstantinou, Y. 2002. DISCOVER: Keyword search in relational databases. In Proceedings of VLDB. Google ScholarDigital Library
Hristidis, V., Gravano, L., and Papakonstantinou, Y. 2003a. Efficient IR-style keyword search over relational databases. In Proceedings of VLDB. Google ScholarDigital Library
Hristidis, V., Papakonstantinou, Y., and Balmin, A. 2003b. Keyword proximity search on XML graphs. In Proceedings of ICDE.Google Scholar
Jagadish, H. V., Poosala, V., Koudas, N., Sevcik, K., Muthukrishnan, S., and Suel, T. 1998. Optimal histograms with quality guarantees. In Proceedings of VLDB. Google ScholarDigital Library
Kiessling, W. 2002. Foundations of preferences in database systems. In Proceedings of VLDB. Google ScholarDigital Library
Lakshmanan, L. V. S., Leone, N., Ross, R., and Subrahmanian, V. S. 1997. ProbView: A flexible probabilistic database system. ACM Trans. Database Syst. 22, 3, 419--469. Google ScholarDigital Library
Lalmas, M. and Roelleke, T. 2004. Modeling vague content and structure querying in XML retrieval with a probabilistic object-relational framework. In Proceedings of FQAS.Google Scholar
Martinez, W., Martinez, A., and Wegman, E. 2004. Document classification and clustering using weighted text proximity matrices. In Proceedings of Interface.Google Scholar
Motro, A. 1988. VAGUE: A user interface to relational databases that permits vague queries. ACM Trans. Informat. Syst. 6, 3 (July), 187--214. Google ScholarDigital Library
Nazeri, Z., Bloedorn, E., and Ostwald, P. 2001. Experiences in mining aviation safety data. In Proceedings of SIGMOD. Google ScholarDigital Library
Nepal, S. and Ramakrishna, M. V. 1999. Query processing issues in image (multimedia) databases. In Proceedings of ICDE. Google ScholarDigital Library
Ortega-Binderberger, M., Chakrabarti, K., and Mehrotra, S. 2002. An approach to integrating query refinement in SQL. In Proceedings of EDBT. 15--33. Google ScholarDigital Library
Poosala, V., Ioannidis, Y. E., Haas, P. J., and Shekita, E. J. 1996. Improved histograms for selectivity estimation of range predicates. In Proceedings of SIGMOD. 294--305. Google ScholarDigital Library
Radlinski, F. and Joachims, T. 2005. Query chains: Learning to rank from implicit feedback. In Proceedings of KDD. Google ScholarDigital Library
Rui, Y., Huang, T. S., and Mehrotra, S. 1997. Content-based image retrieval with relevance feedback in MARS. In Proceedings of the IEEE Conference on Image Processing.Google Scholar
Shen, X., Tan, B. and Zhai, C. 2005. Context-sensitive information retrieval using implicit feedback. In Proceedings of SIGIR. Google ScholarDigital Library
Sparck Jones, K., Walker, S., and Robertson, S. E. 2000a. A probabilistic model of information retrieval: Development and comparative experiments---Part 1. Inf. Process. Man. 36, 6, 779--808. Google ScholarDigital Library
Sparck Jones, K., Walker, S., and Robertson, S. E. 2000a. A probabilistic model of information retrieval: Development and comparative experiments---Part 2. Inf. Process. Man. 36, 6, 809--840. Google ScholarDigital Library
Theobald, A. and Weikum, G. 2002. The index-based XXL search engine for querying XML data with relevance ranking. In Proceedings of EDBT. Google ScholarDigital Library
Theobald, M., Schenkel, R., and Weikum, G. 2005. An efficient and versatile query engine for topX search. In Proceedings of VLDB. Google ScholarDigital Library
Wu, L., Faloutsos, C., Sycara, K., and Payne, T. 2000. FALCON: Feedback adaptive loop for content-based retrieval. In Proceedings of VLDB. Google ScholarDigital Library
Whittaker, J. 1990. Graphical Models in Applied Multivariate Statistics. Wiley, New York, NY.Google Scholar
Widom, J. 2005. Trio: A system for integrated management of data, accuracy, and lineage. CIDR.Google Scholar
Wimmers, L., Haas, L. M., Roth, M. T., and Braendli, C. 1999. Using Fagin's algorithm for merging ranked results in multimedia middleware. In Proceedings of CoopIS. Google ScholarDigital Library
Xu, J. and Croft, W. B. 1996. Query expansion using local and global document analysis, In Proceedings of SIGIR. 4--11. Google ScholarDigital Library
Yu, C.T. and Meng, W. 1998. Principles of Database Query Processing for Advanced Applications. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library

Index Terms

Probabilistic information retrieval approach for ranking of database query results
1. Information systems
  1. Data management systems
    1. Database management system engines
  2. Information retrieval

Recommendations

Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Read More
Similarity Based Ranking of Query Results from Real Web Databases
ICSIP '14: Proceedings of the 2014 Fifth International Conference on Signal and Image Processing

The information available in the World Wide Web is stored using many real web databases (e.g. vehicle database). Accessing the information from these real web databases has become increasingly important for the users to find the desired information. Web ...
Read More
Query Interoperation Among Object-Oriented and Relational Databases
ICDE '95: Proceedings of the Eleventh International Conference on Data Engineering

We develop an efficient algorithm for the query interoperation among existing heterogeneous object-oriented and relational databases. Our algorithm utilizes a canonical deductive database as a uniform representation of object-oriented schema and data. ...
Read More

Reviews

Reviewer: Maria Bielikova

When many tuples are returned to a database query, ranking them can be a problem. This paper focuses on the problem of ranking answers to such queries. The problem is similar to the problem of information retrieval, where we operate on documents that are ranked according to a user query. The authors propose a domain-independent automated ranking approach that leverages the structure of the data, database workload statistics, and correlations present in the structured data. A proposal is evaluated experimentally in terms of the quality of results produced and performance. The proposed ranking function depends on a global score that captures the global importance of unspecified attribute values, and on a conditional score that captures the strengths of correlations between specified and unspecified attribute values. Correlations are automatically estimated via workload as well as data analysis. The ranking functions are based on probabilistic information retrieval ranking models. The novelty of this approach is in the use of probabilistic techniques in a systematic way, and in considering dependencies and correlations between data values. I see space for improvement by combining this approach with methods for the automatic discovery of user characteristics and interests, since the authors currently create profiles (that are, in fact, user stereotypes) manually. The paper is well written and properly organized. The intended audience of the paper is researchers in the area of information retrieval, especially people within the database management community. The paper could also be useful for researchers in general Web information retrieval, since some ideas of the proposed approach seem to be suitable for document retrievals that are less structured than those associated with a relational database.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 31, Issue 3
September 2006
400 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1166074
Issue’s Table of Contents

Copyright © 2006 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2006
Published in tods Volume 31, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Probabilistic information retrieval
automatic ranking
experimentation
indexing
relational queries
user survey
workload
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 95
  Total Citations
  View Citations
- 1,871
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Probabilistic information retrieval approach for ranking of database query results

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Re-ranking search results using query logs

Similarity Based Ranking of Query Results from Real Web Databases

Query Interoperation Among Object-Oriented and Relational Databases

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Probabilistic information retrieval approach for ranking of database query results

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Re-ranking search results using query logs

Similarity Based Ranking of Query Results from Real Web Databases

Query Interoperation Among Object-Oriented and Relational Databases

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media