ABSTRACT
Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists each object and its grade under that attribute, sorted by grade (highest grade first). There is some monotone aggregation function, or combining rule, such as min or average, that combines the individual grades to obtain an overall grade.
To determine objects that have the best overall grades, the naive algorithm must access every object in the database, to find its grade under each attribute. Fagin has given an algorithm (“Fagin's Algorithm”, or FA) that is much more efficient. For some distributions on grades, and for some monotone aggregation functions, FA is optimal in a high-probability sense.
We analyze an elegant and remarkably simple algorithm (“the threshold algorithm”, or TA) that is optimal in a much stronger sense than FA. We show that TA is essentially optimal, not just for some monotone aggregation functions, but for all of them, and not just in a high-probability sense, but over every database. Unlike FA, which requires large buffers (whose size may grow unboundedly as the database size grows), TA requires only a small, constant-size buffer.
We distinguish two types of access: sorted access (where the middleware system obtains the grade of an object in some sorted list by proceeding through the list sequentially from the top), and random access (where the middleware system requests the grade of object in a list, and obtains it in one step). We consider the scenarios where random access is either impossible, or expensive relative to sorted access, and provide algorithms that are essentially optimal for these cases as well.
- D Aksoy and M Franklin RxW A scheduling approach for large scale on demand data broadcast IEEE ACM Transactions On Networking 7(6):846-880, December 1999. Google ScholarDigital Library
- A Borodin and R El Yaniv Online Computation and Competitive Analysis Cambridge University Press New York, 1998. Google ScholarDigital Library
- M J Carey L M Haas P M Schwarz M Arya W F Cody R Fagin M Flickner A W Luniewski W Niblack D Petkovic J Thomas J H Williams and E L Wimmers Towards heterogeneous multimedia information systems the Garlic approach In RIDE DOM th Int l Workshop on Research Issues in Data Engineering Distributed Object Management pages 124-131, 1995 Google ScholarDigital Library
- R Fagin Combining fuzzy information from multiple systems J Comput System Sci., 58:83-99, 1999. Google ScholarDigital Library
- U Guntzer W T Balke and W Kiessling Optimizing multi feature queries in image databases In Proc th Very Large Databases VLDB Conference pages Cairo Egypt, 2000. Google ScholarDigital Library
- U G untzer W T Balke and W Kiessling Towards e cientmulti feature queries in heterogeneous environments In Proc of the IEEE International Conference on Information Technology Coding and Computing ITCC LasVegas USA April 2001 Google ScholarDigital Library
- D S Hochbaum editor Approximation Algorithms for NP HardProblems PWS Publishing Company Boston, MA, 1997 Google ScholarDigital Library
- R Motwani and P Raghavan RandomizedAlgorithms Cambridge University Press Cambridge U.K., 1995 Google ScholarDigital Library
- S Nepal and M V Ramakrishna Query processing issues in image multimedia databases In Proc th International Conference on Data Engineering ICDE pages March, 1999. Google ScholarDigital Library
- W Niblack R Barber W Equitz M Flickner E Glasman D Petkovic and P Yanker The QBIC project Querying images bycontent using color texture and shape In SPIE Conference on Storage and Retrieval for Image and Video Databases volume 1908, pages 173-187, 1983. QBIC Web server is http wwwqbic almaden ibm comGoogle Scholar
- G Salton Automatic Text Processing the Transformation Analysis and Retrieval of Information by Computer Addison Wesley Reading MA, 1989 Google ScholarDigital Library
- D Sleator and R E Tarjan Amortized efficiency of list update and paging rules Comm ACM 28:202-208, 1985 Google ScholarDigital Library
- E L Wimmers L M Haas M T Roth and C Braendli Using Fagin s algorithm for merging ranked results in multimedia middleware In Fourth IFCIS International ConferenceonCooperative Information Systems pages IEEE Computer Society Press September 1999 Google ScholarDigital Library
- L A Zadeh Fuzzy sets Information and Control 8:338-363, 1999Google ScholarCross Ref
- H J Zimmermann Fuzzy Set Theory Kluwer Academic Publishers Boston 3rd edition, 1996.Google Scholar
Index Terms
- Optimal aggregation algorithms for middleware
Recommendations
Optimal aggregation algorithms for middleware
Special issu on PODS 2001Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted ...
On optimal algorithms and optimal proof systems
STACS'99: Proceedings of the 16th annual conference on Theoretical aspects of computer scienceA deterministic algorithm O accepting a language L is called (polynomially) optimal if for any algorithm A accepting L there is a polynomial p such that timeO(x) ≤ p(|x|+ timeA(x)) for every x ∈ L. It is shown that an optimal acceptor for a language L ...
Near-optimal algorithms for unique games
STOC '06: Proceedings of the thirty-eighth annual ACM symposium on Theory of ComputingUnique games are constraint satisfaction problems that can be viewed as a generalization of Max-Cut to a larger domain size. The Unique Games Conjecture states that it is hard to distinguish between instances of unique games where almost all constraints ...
Comments