skip to main content
research-article

Efficient and Practical Approach for Private Record Linkage

Published:01 August 2012Publication History
Skip Abstract Section

Abstract

Record linkage is used to associate entities from multiple data sources. For example, two organizations contemplating a merger may want to know how common their customer bases are so that they may better assess the benefits of the merger. Another example is a database of people who are forbidden from a certain activity by regulators, may need to be compared to a list of people engaged in that activity. The autonomous entities who wish to carry out the record matching computation are often reluctant to fully share their data; they fear losing control over its subsequent dissemination and usage, or they want to insure privacy because the data is proprietary or confidential, and/or they are cautious simply because privacy laws forbid its disclosure or regulate the form of that disclosure. In such cases, the problem of carrying out the linkage computation without full data exchange has been called private record linkage. Previous private record linkage techniques have made use of a third party. We provide efficient techniques for private record linkage that improve on previous work in that (1) our techniques make no use of a third party, and (2) they achieve much better performance than previous schemes in terms of their execution time while maintaining acceptable quality of output compared to nonprivacy settings. Our protocol consists of two phases. The first phase primarily produces candidate record pairs for matching, by carrying out a very fast (but not accurate) matching between such pairs of records. The second phase is a novel protocol for efficiently computing distances between each candidate pair (without any expensive cryptographic operations such as modular exponentiations). Our experimental evaluation of our approach validates these claims.

References

  1. Agrawal, R., Evfimievski, A., and Srikant, R. 2003. Information sharing across private databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 86--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal, R., Asonov, D., Kantarcioglu, M., and Li, Y. 2006. Sovereign joins. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Al-Lawati, A., Lee, D., and McDaniel, P. 2005. Blocking-aware private record linkage. In Proceedings of the 2nd International Workshop on information Quality in Information Systems. ACM, 59--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arkady, M. 2007. Data Quality Assessment. Technics Publications, LLC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Atallah, M. J., Kerschbaum, F., and Du, W. 2003. Secure and private sequence comparisons. In Proceedings of the ACM Workshop on Privacy in the Electronic Society. 39--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bachteler, T., Schnell, R., and Reiher, J. 2010. An empirical comparison of approaches to approximate string matching in private record linkage. In Proceedings of Statistics Canada Symposium, Social Statistics: The Interplay among Censuses, Surveys and Administrative Data.Google ScholarGoogle Scholar
  7. Bourgain, J. 1985. On Lipschitz embedding of finite metric spaces in Hilbert space. Israel J. Math. 52, 1--2, 46--52.Google ScholarGoogle ScholarCross RefCross Ref
  8. Christen, P. 2011. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Churches, T. and Christen, P. 2004. Some methods for blindfolded record linkage. BMC Med. Inform. Decision Making 4, 9.Google ScholarGoogle ScholarCross RefCross Ref
  10. Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, M. Y. 2002. Tools for privacy preserving data mining. SIGKDD Explor. 4, 2, 28--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Du, W. and Atallah, M. J. 2000. Protocols for secure remote database access with approximate matching. In Proceedings of the 1st Workshop on Security and Privacy in E-Commerce.Google ScholarGoogle Scholar
  12. Du, W. and Atallah, M. J. 2001. Privacy-preserving statistical analysis. In Proceedings of the 17th Annual Computer Security Applications Conference. 102--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Elmagarmid, A. K., Ipeirotis, P. G., and Verykios, V. S. 2007. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng. 19, 1, 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Emekci, F., Agrawal, D., Abbadi, A. E., and Gulbeden, A. 2006. Privacy preserving query processing using third parties. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fellegi, I. P. and Sunter, A. B. 1969. A theory for record linkage. J. Amer. Statist. Assoc. 64, 328, 1183--1210.Google ScholarGoogle ScholarCross RefCross Ref
  16. Freedman, M. J., Nissim, K., and Pinkas, B. 2004. Effcient private matching and set intersection. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT).Google ScholarGoogle Scholar
  17. Goethals, B., Laur, S., Lipmaa, H., and Mielikinen, T. 2004. On private scalar product computation for privacy-preserving data mining. In Proceedings of the 7th Annual International Conference in Information Security and Cryptology. 104--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hernandez, M. A. and Stolfo, S. J. 1998. Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining Knowl. Discov. 2, 1, 9--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hjaltason, G. R. and Samet, H. 2003. Properties of embedding methods for similarity searching in metric spaces. IEEE Trans. Pattern Anal. Mach. Intell. 25, 5, 530--549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Inan, A., Kantarcioglu, M., Bertino, E., and Scannapieco, M. 2008. A hybrid approach to private record linkage. In Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE). 496--505. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Inan, A., Kantarcioglu, M., Ghinita, G., and Bertino, E. 2010. Private record matching using differential privacy. In Proceedings of the 13th International Conference on Extending Database Technology. ACM, 123--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jin, L., Li, C., and Mehrotra, S. 2003. Efficient record linkage in large data sets. In Proceedings of the 8th International Conference on Database Systems for Advanced Applications (DASFAA). IEEE Computer Society, Los Alamitos, CA, 137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Karakasidis, A. and Verykios, V. 2009. Privacy preserving record linkage using phonetic codes. In Proceedings of the 4th Balkan Conference in Informatics. IEEE, 101--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kissner, L. and Song, D. 2005. Private and threshold set-intersection. Tech. rep. CMU-CS-05-113.Google ScholarGoogle Scholar
  25. Koudas, N., Sarawagi, S., and Srivastava, D. 2006. Record linkage: similarity measures and algorithms. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 802--803. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Linial, N., London, E., and Rabinovich, Y. 1995. The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 2, 215--245.Google ScholarGoogle ScholarCross RefCross Ref
  27. McCallum, A., Nigam, K., and Ungar, L. H. 2000. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 169--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Monge, A. E. and Elkan, C. P. 1997. An efficient domain-independent algorithm for detecting approximately duplicate database records. In Proceedings 2nd ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery (DMKD). 23--29.Google ScholarGoogle Scholar
  29. Paillier, P. 1999. Public-key cryptosystems based on composite degree residuosity classes. In Proceedings of International Conference on the Theory and Application of Cryptographic Techniques (EUROCRYPT). 223--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ravikumar, P. and Fienberg, S. E. 2004. A secure protocol for computing string distance metrics. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM), Workshop on Security Aspects of Data Mining (PSDM).Google ScholarGoogle Scholar
  31. Ravikumar, P., Cohen, W., and Fienberg, S. E. 2004. A secure protocol for computing string distance metrics. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM), Workshop on Security Aspects of Data Mining (PSDM).Google ScholarGoogle Scholar
  32. Scannapieco, M., Figotin, I., Bertino, E., and Elmagarmid, A. K. 2007. Privacy preserving schema and data matching. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 653--664. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Schneier, B. 1996. Applied Cryptography 2nd Ed. John Wiley & Sons.Google ScholarGoogle Scholar
  34. Schnell, R., Bachteler, T., and Reiher, J. 2009. Privacy-preserving record linkage using bloom filters. BMC Med. Inform. Decision Making 9, 1, 41.Google ScholarGoogle ScholarCross RefCross Ref
  35. Smith, S. W. 1997. The Scientist and Engineer’s Guide to Digital Signal Processing. California Technical Publishing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sweeney, L. 2002. k-anonymity: A model for protecting privacy. Internat. J. Uncertainty, Fuzziness Knowl.-Based Syst. 10, 5, 557--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yakout, M., Atallah, M. J., and Elmagarmid, A. 2009. Efficient private record linkage. In Proceedings of the 25nd International Conference on Data Engineering (ICDE). IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient and Practical Approach for Private Record Linkage

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Journal of Data and Information Quality
            Journal of Data and Information Quality  Volume 3, Issue 3
            August 2012
            53 pages
            ISSN:1936-1955
            EISSN:1936-1963
            DOI:10.1145/2287714
            Issue’s Table of Contents

            Copyright © 2012 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 August 2012
            • Accepted: 1 April 2012
            • Revised: 1 March 2012
            • Received: 1 September 2009
            Published in jdiq Volume 3, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader