skip to main content
10.5555/2483628.2483644dlproceedingsArticle/Chapter ViewAbstractPublication PagesausdmConference Proceedingsconference-collections
research-article
Free Access

An efficient two-party protocol for approximate matching in private record linkage

Published:01 December 2011Publication History

ABSTRACT

The task of linking multiple databases with the aim to identify records that refer to the same entity is occurring increasingly in many application areas. If unique identifiers for the entities are not available in all the databases to be linked, techniques that calculate approximate similarities between records must be used for the identification of matching pairs of records. Often, the records to be linked contain personal information such as names and addresses. In many applications, the exchange of attribute values that contain such personal details between organisations is not allowed due to privacy concerns. The linking of records between databases without revealing the actual attribute values in these records is the research problem known as 'privacy-preserving record linkage' (PPRL). While various approaches have been proposed to deal with privacy within the record linkage process, a viable solution that is well applicable to real-world conditions needs to address the major aspect of scalability of linking very large databases while preserving security and linkage quality.

We propose a novel two-party protocol for PPRL that addresses scalability, security and quality/accuracy. The protocol is based on (1) the use of reference values that are available to both database owners, and allows them to individually calculate the similarities between their attribute values and the reference values; and (2) the binning of these calculated similarity values to allow their secure exchange between the two database owners. Experiments on a real-world database with nearly two million records yield linkage results that have a linear scalability to large databases and high linkage accuracy, allowing for approximate matching in the privacy-preserving context. Since the protocol has a low computational burden and allows quality approximate matching while still preserving the privacy of the databases that are matched, the protocol can be useful for many real-world applications requiring PPRL.

References

  1. Agrawal, R., Evfimievski, A. & Srikant, R. (2003), Information sharing across private databases, in 'ACM SIGMOD', ACM, pp. 86--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Al-Lawati, A., Lee, D. & McDaniel, P. (2005), Blocking-aware private record linkage, in 'International Workshop on Information Quality in Information Systems', pp. 59--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Atallah, M., Kerschbaum, F. & Du, W. (2003), Secure and private sequence comparisons, in 'ACM workshop on Privacy in the Electronic Society', pp. 39--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bachteler, T., Schnell, R. & Reiher, J. (2010), An empirical comparison of approaches to approximate string matching in private record linkage, in 'Proceedings of Statistics Canada Symposium 2010. Social Statistics: The Interplay among Censuses, Surveys and Administrative Data'.Google ScholarGoogle Scholar
  5. Christen, P. (2006a), A comparison of personal name matching: Techniques and practical issues, in 'Workshop on Mining Complex Data, held at IEEE ICDM'06', Hong Kong. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Christen, P. (2006b), Privacy-preserving data linkage and geocoding: Current approaches and research directions, in 'Workshop on Privacy Aspects of Data Mining, held at IEEE ICDM'06', Hong Kong, pp. 497--501. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Christen, P. (2011), 'A survey of indexing techniques for scalable record linkage and deduplication', IEEE Transactions on Knowledge and Data Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Churches, T. & Christen, P. (2004), 'Some methods for blindfolded record linkage', BioMed Central Medical Informatics and Decision Making 4(9).Google ScholarGoogle Scholar
  9. Du, W., Atallah, M. & Kerschbaum, F. (2000), Protocols for secure remote database access with approximate matching, in 'Proceedings of the 1st ACM Workshop on Security and Privacy in E-Commerce'.Google ScholarGoogle Scholar
  10. Durham, E., Xue, Y., Kantarcioglu, M. & Malin, B. (2011), 'Quantifying the correctness, computational complexity, and security of privacy-preserving string comparators for record linkage', Information Fusion In Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Elmagarmid, A. K., Ipeirotis, P. G. & Verykios, V. S. (2007), 'Duplicate record detection: A survey', IEEE Transactions on Knowledge and Data Engineering 19(1), 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fellegi, I. P. & Sunter, A. B. (1969), 'A theory for record linkage', Journal of the American Statistical Society 64(328).Google ScholarGoogle ScholarCross RefCross Ref
  13. Hall, R. & Fienberg, S. (2010), Privacy-preserving record linkage, in 'Privacy in Statistical Databases, Springer LNCS 6344', Corfu, Greece, pp. 269--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hernandez, M. A. & Stolfo, S. J. (1995), The merge/purge problem for large databases, in 'ACM SIGMOD'95', San Jose. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Herzog, T., Scheuren, F. & Winkler, W. (2007), Data quality and record linkage techniques, Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Inan, A., Kantarcioglu, M., Bertino, E. & Scannapieco, M. (2008), A hybrid approach to private record linkage, in 'IEEE ICDE', pp. 496--505. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Inan, A., Kantarcioglu, M., Ghinita, G. & Bertino, E. (2010), Private record matching using differential privacy, in 'International Conference on Extending Database Technology'. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Karakasidis, A. & Verykios, V. (2009), Privacy preserving record linkage using phonetic codes, in 'Fourth Balkan Conference in Informatics', IEEE, pp. 101--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Karakasidis, A. & Verykios, V. (2010), Advances in privacy preserving record linkage, in 'E-activity and Innovative Technology, Advances in Applied Intelligence Technologies Book Series', IGI Global, pp. 22--34.Google ScholarGoogle Scholar
  20. Kissner, L. & Song, D. (2005), Private and threshold set-intersection, Technical report, Carnegie Mellon University.Google ScholarGoogle Scholar
  21. Krawczyk, H., Bellare, M. & Canetti, R. (1997), HMAC: Keyed-hashing for message authentication, in 'Internet RFCs'. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Manning, C. & Schüütze, H. (1999), Foundations of statistical natural language processing, Vol. 59, MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. O'Keefe, C., Yung, M., Gu, L. & Baxter, R. (2004), Privacy-preserving data linkage protocols, in 'Proceedings of the 2004 ACM workshop on Privacy in the electronic society', pp. 94--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Pang, C., Gu, L., Hansen, D. & Maeder, A. (2009), 'Privacy-preserving fuzzy matching using a public reference table', Intelligent Patient Management pp. 71--89.Google ScholarGoogle Scholar
  25. Raghavan, V., Bollmann, P. & Jung, G. (1989), 'A critical investigation of recall and precision as measures of retrieval system performance', ACM Transactions on Information Systems (TOIS) 7(3), 205--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ravikumar, P., Cohen, W. & Fienberg, S. (2004), A secure protocol for computing string distance metrics, in 'Workshop on Privacy and Security Aspects of Data Mining held at IEEE ICDM'04', pp. 40--46.Google ScholarGoogle Scholar
  27. Scannapieco, M., Figotin, I., Bertino, E. & Elmagarmid, A. (2007), Privacy preserving schema and data matching, in 'ACM SIGMOD', pp. 653--664. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Schneier, B. (1995), Applied cryptography: Protocols, algorithms, and source code in C, 2nd Edition, John Wiley & Sons, Inc., New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Song, D., Wagner, D. & Perrig, A. (2000), 'Practical techniques for searches on encrypted data', Security and Privacy, IEEE Symposium on p. 44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Trepetin, S. (2008), 'Privacy-preserving string comparisons in record linkage systems: a review', Information Security Journal: A Global Perspective 17(5), 253--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Verykios, V., Karakasidis, A. & Mitrogiannis, V. (2009), 'Privacy preserving record linkage approaches', Int. J. of Data Mining, Modelling and Management 1(2), 206--221.Google ScholarGoogle ScholarCross RefCross Ref
  32. Winkler, W. E. (2006), Overview of record linkage and current research directions, Technical Report RR2006/02, US Bureau of the Census.Google ScholarGoogle Scholar
  33. Witten, I. H., Moffat, A. & Bell, T. C. (1999), Managing Gigabytes, 2nd Edition, Morgan Kaufmann.Google ScholarGoogle Scholar
  34. Yakout, M., Atallah, M. & Elmagarmid, A. (2009), Efficient private record linkage, in 'IEEE ICDE', pp. 1283--1286. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An efficient two-party protocol for approximate matching in private record linkage

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader