Abstract
We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from information retrieval for structured data. Our solution is domain independent and leverages data and workload statistics and correlations. We evaluate the quality of our approach with a user survey on a real database. Furthermore, we present and experimentally evaluate algorithms to efficiently retrieve the top ranked results, which demonstrate the feasibility of our ranking system.
- Agrawal, S., Chaudhuri, S., and Das, G. 2002. DBXplorer: A system for keyword based search over relational databases. In proceedings of ICDE. Google ScholarDigital Library
- Agrawal, S., Chaudhuri, S., Das, G., and Gionis, A. 2003. Automated ranking of database query results. In proceedings of CIDR.Google Scholar
- Amer-Yahia, S., Case, P., Roelleke, T., Shanmugasundaram, J., and Weikum. G. 2005a. Report on the DB/IR panel at SIGMOD 2005. ACM SIGMOD Rec. 34, 4, 71--74. Google ScholarDigital Library
- Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., and Toman, D. 2005b. Structure and content scoring for XML. In proceedings of VLDB. Google ScholarDigital Library
- Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A. I. 1995. Fast discovery of association rules. In proceedings of KDD.Google Scholar
- Barbara, D., Garcia-Molina, H., and Porter, D. 1992. The management of probabilistic data. IEEE Trans. Knoual. Data Eng. 4, 5, 487--502. Google ScholarDigital Library
- Bruno, N., Gravano, L., and Chaudhuri, S. 2002a. Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Database Syst. Google ScholarDigital Library
- Bruno, N., Gravano, L., and Marian, A. 2002b. Evaluating top-k queries over Web-accessible databases. In proceedings of ICDE. Google ScholarDigital Library
- Breese, J., Heckerman, D., and Kadie, C. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In proceedings of the 14th Conference on Uncertainty in Artificial Intelligence. Google ScholarDigital Library
- Bhalotia, G., Nakhe, C., Hulgeri, A., Chakrabarti, S., and Sudarshan, S. 2002. Keyword searching and browsing in databases using BANKS. In Proceedings of ICDE. Google ScholarDigital Library
- Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval, 1st ed. Addison-Wesley, Reading, MA. Google ScholarDigital Library
- Cestnik, B. 1990. Estimating probabilities: A crucial task in machine learning. In Proceedings of the European Conference on artificial Intelligence.Google Scholar
- Cavallo, R. and Pittarelli, M. 1987. The theory of probabilistic databases. In Proceedings of VLDB. Google ScholarDigital Library
- Chaudhuri, S., Das, G., Hristidis, V., and Weikum, G. 2004. Probabilistic ranking of database query results. In Proceedings of VLDB. Google ScholarDigital Library
- Chinenyanga, T. T. and Kushmerick, N. 2002. An expressive and efficient language for XML information retrieval. J. Amer. Soc. Inform. Sci. Tech. 53, 6, 438--453. Google ScholarDigital Library
- Croft, W. B. and Lafferty, J. 2003. Language Modeling for Information Retrieval. Kluwer, Norwell, MA. Google ScholarDigital Library
- Carmel, D, Maarek, Y. S., Mandelbrod, M., Mass, Y., and Soffer, A. 2003. Searching XML documents via XML fragments. In Proceedings of SIGIR. Google ScholarDigital Library
- Cohen, W. 1998a. Integration of heterogeneous databases without common domains using queries based on textual similarity. In Proceedings of SIGMOD. Google ScholarDigital Library
- Cohen, W. 1998b. Providing database-like access to the Web using queries based on textual similarity. In Proceedings of SIGMOD. Google ScholarDigital Library
- Chakrabarti, K., Porkaew, K., and Mehrotra, S. 2000. Efficient query references in multimedia databases. In Proceedings of ICDE. Google ScholarDigital Library
- Dalvi, N. N. and Suciu, D. 2005. Answering queries from statistics and probabilistic Views. In Proceedings of VLDB. Google ScholarDigital Library
- Fagin, R. 1998. Fuzzy queries in multimedia database systems. In Proceedings of PODS. Google ScholarDigital Library
- Fagin, R., Lotem, A., and Naor, M. 2001. Optimal aggregation algorithms for middleware. In Proceedings of PODS. Google ScholarDigital Library
- Fuhr, N. 1990. A probabilistic framework for vague queries and imprecise information in databases. In Proceedings of VLDB. Google ScholarDigital Library
- Fuhr, N. 1993. A probabilistic relational model for the integration of IR and databases. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarDigital Library
- Fuhr, N. and Grossjohann, K. 2004. XIRQL: An XML query language based on information retrieval concepts. ACM Trans. Inform. Syst. 22, 2, 313--356. Google ScholarDigital Library
- Fuhr, N. and Roelleke, T. 1997. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inform. Syst. 15, 1, 32--66. Google ScholarDigital Library
- Fuhr, N. and Roelleke, T. 1998. HySpirit---a probabilistic inference engine for hypermedia retrieval in large databases. In Proceedings of EDBT. Google ScholarDigital Library
- Grossman, D. and Frieder, O. 2004. Information Retrieval---Algorithms and Heuristics. Springer, Berlin, Germany. Google ScholarDigital Library
- Güntzer, U., Balke, W.-T., and Kiessling, W. 2000. Optimizing multi-feature queries for image databases. In Proceedings of VLDB. Google ScholarDigital Library
- Guo, L., Shao, F., Botev, C., and Shanmugasundaram. J. 2003. XRANK: Ranked keyword search over XML documents. In Proceedings of SIGMOD. Google ScholarDigital Library
- Harper, D. and Van Rijsbergen, C. J. 1978. An evaluation of feedback in document retrieval using co-occurrence data. J. Document. 34, 3, 189--216.Google ScholarCross Ref
- Hristidis, V. and Papakonstantinou, Y. 2002. DISCOVER: Keyword search in relational databases. In Proceedings of VLDB. Google ScholarDigital Library
- Hristidis, V., Gravano, L., and Papakonstantinou, Y. 2003a. Efficient IR-style keyword search over relational databases. In Proceedings of VLDB. Google ScholarDigital Library
- Hristidis, V., Papakonstantinou, Y., and Balmin, A. 2003b. Keyword proximity search on XML graphs. In Proceedings of ICDE.Google Scholar
- Jagadish, H. V., Poosala, V., Koudas, N., Sevcik, K., Muthukrishnan, S., and Suel, T. 1998. Optimal histograms with quality guarantees. In Proceedings of VLDB. Google ScholarDigital Library
- Kiessling, W. 2002. Foundations of preferences in database systems. In Proceedings of VLDB. Google ScholarDigital Library
- Lakshmanan, L. V. S., Leone, N., Ross, R., and Subrahmanian, V. S. 1997. ProbView: A flexible probabilistic database system. ACM Trans. Database Syst. 22, 3, 419--469. Google ScholarDigital Library
- Lalmas, M. and Roelleke, T. 2004. Modeling vague content and structure querying in XML retrieval with a probabilistic object-relational framework. In Proceedings of FQAS.Google Scholar
- Martinez, W., Martinez, A., and Wegman, E. 2004. Document classification and clustering using weighted text proximity matrices. In Proceedings of Interface.Google Scholar
- Motro, A. 1988. VAGUE: A user interface to relational databases that permits vague queries. ACM Trans. Informat. Syst. 6, 3 (July), 187--214. Google ScholarDigital Library
- Nazeri, Z., Bloedorn, E., and Ostwald, P. 2001. Experiences in mining aviation safety data. In Proceedings of SIGMOD. Google ScholarDigital Library
- Nepal, S. and Ramakrishna, M. V. 1999. Query processing issues in image (multimedia) databases. In Proceedings of ICDE. Google ScholarDigital Library
- Ortega-Binderberger, M., Chakrabarti, K., and Mehrotra, S. 2002. An approach to integrating query refinement in SQL. In Proceedings of EDBT. 15--33. Google ScholarDigital Library
- Poosala, V., Ioannidis, Y. E., Haas, P. J., and Shekita, E. J. 1996. Improved histograms for selectivity estimation of range predicates. In Proceedings of SIGMOD. 294--305. Google ScholarDigital Library
- Radlinski, F. and Joachims, T. 2005. Query chains: Learning to rank from implicit feedback. In Proceedings of KDD. Google ScholarDigital Library
- Rui, Y., Huang, T. S., and Mehrotra, S. 1997. Content-based image retrieval with relevance feedback in MARS. In Proceedings of the IEEE Conference on Image Processing.Google Scholar
- Shen, X., Tan, B. and Zhai, C. 2005. Context-sensitive information retrieval using implicit feedback. In Proceedings of SIGIR. Google ScholarDigital Library
- Sparck Jones, K., Walker, S., and Robertson, S. E. 2000a. A probabilistic model of information retrieval: Development and comparative experiments---Part 1. Inf. Process. Man. 36, 6, 779--808. Google ScholarDigital Library
- Sparck Jones, K., Walker, S., and Robertson, S. E. 2000a. A probabilistic model of information retrieval: Development and comparative experiments---Part 2. Inf. Process. Man. 36, 6, 809--840. Google ScholarDigital Library
- Theobald, A. and Weikum, G. 2002. The index-based XXL search engine for querying XML data with relevance ranking. In Proceedings of EDBT. Google ScholarDigital Library
- Theobald, M., Schenkel, R., and Weikum, G. 2005. An efficient and versatile query engine for topX search. In Proceedings of VLDB. Google ScholarDigital Library
- Wu, L., Faloutsos, C., Sycara, K., and Payne, T. 2000. FALCON: Feedback adaptive loop for content-based retrieval. In Proceedings of VLDB. Google ScholarDigital Library
- Whittaker, J. 1990. Graphical Models in Applied Multivariate Statistics. Wiley, New York, NY.Google Scholar
- Widom, J. 2005. Trio: A system for integrated management of data, accuracy, and lineage. CIDR.Google Scholar
- Wimmers, L., Haas, L. M., Roth, M. T., and Braendli, C. 1999. Using Fagin's algorithm for merging ranked results in multimedia middleware. In Proceedings of CoopIS. Google ScholarDigital Library
- Xu, J. and Croft, W. B. 1996. Query expansion using local and global document analysis, In Proceedings of SIGIR. 4--11. Google ScholarDigital Library
- Yu, C.T. and Meng, W. 1998. Principles of Database Query Processing for Advanced Applications. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
Index Terms
- Probabilistic information retrieval approach for ranking of database query results
Recommendations
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Query Interoperation Among Object-Oriented and Relational Databases
ICDE '95: Proceedings of the Eleventh International Conference on Data EngineeringWe develop an efficient algorithm for the query interoperation among existing heterogeneous object-oriented and relational databases. Our algorithm utilizes a canonical deductive database as a uniform representation of object-oriented schema and data. ...
Similarity Based Ranking of Query Results from Real Web Databases
ICSIP '14: Proceedings of the 2014 Fifth International Conference on Signal and Image ProcessingThe information available in the World Wide Web is stored using many real web databases (e.g. vehicle database). Accessing the information from these real web databases has become increasingly important for the users to find the desired information. Web ...
Comments