skip to main content
10.5555/2525373.2525389dlproceedingsArticle/Chapter ViewAbstractPublication PagesausdmConference Proceedingsconference-collections
research-article
Free Access

An iterative two-party protocol for scalable privacy-preserving record linkage

Authors Info & Claims
Published:05 December 2012Publication History

ABSTRACT

Record linkage is the process of identifying which records in different databases refer to the same real-world entities. When personal details of individuals, such as names and addresses, are used to link databases across different organisations, then privacy becomes a major concern. Often it is not permissible to exchange identifying data among organisations. Linking databases in situations where no private or confidential information can be revealed is known as 'privacy-preserving record linkage' (PPRL). We propose a novel protocol for scalable and approximate PPRL based on Bloom filters in a scenario where no third party is available to conduct a linkage.

While two-party protocols are more secure because there is no possibility of collusion between one of the database owners and the third party, these protocols generally require more complex and expensive techniques to ensure that a database owner cannot infer any sensitive information about the other party's data during the linkage process. Our two-party protocol uses an efficient privacy technique called Bloom filters, and conducts an iterative classification of record pairs into matches and non-matches, as selected bits of the Bloom filters are revealed. Experiments conducted on real-world databases that contain nearly two million records, show that our protocol is scalable to large databases while providing sufficient privacy characteristics and achieving high linkage quality.

References

  1. Agrawal, R., Evfimievski, A. & Srikant, R. (2003), Information sharing across private databases, in 'ACM SIGMOD', San Diego, pp. 86--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Al-Lawati, A., Lee, D. & McDaniel, P. (2005), Blocking-aware private record linkage, in 'Journal of Data and Information Quality', pp. 59--68.Google ScholarGoogle Scholar
  3. Atallah, M., Kerschbaum, F. & Du, W. (2003), Secure and private sequence comparisons, in 'ACM Workshop on Privacy in the Electronic Society', pp. 39--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bloom, B. (1970), 'Space/time trade-offs in hash coding with allowable errors', Communications of the ACM 13(7), 422--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Broder, A., Mitzenmacher, M. & Mitzenmacher, A. (2002), Network applications of Bloom filters: A survey, in 'Internet Mathematics'.Google ScholarGoogle Scholar
  6. Christen, P. (2006), Privacy-preserving data linkage and geocoding: Current approaches and research directions, in 'IEEE ICDM Workshop on Privacy Aspects of Data Mining', Hong Kong. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Christen, P. (2009), Geocode matching and privacy preservation, in 'Workshop on Privacy, Security, and Trust in KDD', Springer, pp. 7--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Christen, P. (2011), 'A survey of indexing techniques for scalable record linkage and deduplication', IEEE Transactions on Knowledge and Data Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Christen, P. (2012), Data Matching, Data-Centric Systems and Applications, Springer.Google ScholarGoogle Scholar
  10. Clifton, C., Kantarcioglu, M., Doan, A., Schadow, G., Vaidya, J., Elmagarmid, A. & Suciu, D. (2004), Privacy-preserving data integration and sharing, in 'ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery', pp. 19--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X. & Zhu, M. (2002), 'Tools for privacy preserving distributed data mining', SIGKDD Explorations 4(2), 28--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Durham, E. (2012), A framework for accurate, efficient private record linkage, PhD thesis, Vanderbilt University.Google ScholarGoogle Scholar
  13. Durham, E., Xue, Y., Kantarcioglu, M. & Malin, B. (2010), Private medical record linkage with approximate matching, in 'AMIA Annual Symposium Proceedings', p. 182.Google ScholarGoogle Scholar
  14. Durham, E., Xue, Y., Kantarcioglu, M. & Malin, B. (2011), 'Quantifying the correctness, computational complexity, and security of privacy-preserving string comparators for record linkage', Information Fusion. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dusserre, L., Quantin, C. & Bouzelat, H. (1995), 'A one way public key cryptosystem for the linkage of nominal files in epidemiological studies', Medinfo 8, 644--647.Google ScholarGoogle Scholar
  16. Freedman, M., Ishai, Y., Pinkas, B. & Reingold, O. (2005), 'Keyword search and oblivious pseudorandom functions', Theory of Cryptography. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gionis, A., Indyk, P. & Motwani, R. (1999), Similarity search in high dimensions via hashing, in 'Proceedings of the International Conference on Very Large Data Bases', pp. 518--529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Goldreich, O. (2004), Foundations of cryptography: Basic applications, Vol. 2, Cambridge Univ Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hall, R. & Fienberg, S. (2010), Privacy-preserving record linkage, in 'Privacy in Statistical Databases, Springer LNCS 6344', Corfu, Greece, pp. 269--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Inan, A., Kantarcioglu, M., Bertino, E. & Scannapieco, M. (2008), A hybrid approach to private record linkage, in 'IEEE ICDE', Cancun, Mexico, pp. 496--505. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Inan, A., Kantarcioglu, M., Ghinita, G. & Bertino, E. (2010), Private record matching using differential privacy, in 'EDBT'. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kantarcioglu, M., Jiang, W. & Malin, B. (2008), A privacy-preserving framework for integrating person-specific databases, in 'Privacy in Statistical Databases', Springer, pp. 298--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Karakasidis, A. & Verykios, V. (2010), Advances in privacy preserving record linkage, in 'E-activity and Innovative Technology, Advances in Applied Intelligence Technologies Book Series', IGI Global.Google ScholarGoogle Scholar
  24. Karakasidis, A. & Verykios, V. (2011), 'Secure blocking+secure matching = secure record linkage', Journal of Computing Science and Engineering 5, 223--235.Google ScholarGoogle ScholarCross RefCross Ref
  25. Karakasidis, A. & Verykios, V. (2012), Reference table based k-anonymous private blocking, in 'ACM Symposium on Applied Computing', Riva del Garda, Italy. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Karakasidis, A., Verykios, V. & Christen, P. (2011), Fake injection strategies for private phonetic matching, in 'International Workshop on Data Privacy Management', Leuven, Belgium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kargupta, H., Datta, S., Wang, Q. & Sivakumar, K. (2003), On the privacy preserving properties of random data perturbation techniques, in 'ICDM', IEEE, pp. 99--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kissner, L. & Song, D. (2004), Private and threshold set-intersection, in 'Technical Report', Carnegie Mellon University.Google ScholarGoogle Scholar
  29. Kuzu, M., Kantarcioglu, M., Durham, E. & Malin, B. (2011), A constraint satisfaction cryptanalysis of Bloom filters in private record linkage, in 'Privacy Enhancing Technologies', Springer, pp. 226--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lai, P., Yiu, S., Chow, K., Chong, C. & Hui, L. (2006), An Efficient Bloom filter based Solution for Multiparty Private Matching, in 'International Conference on Security and Management'.Google ScholarGoogle Scholar
  31. Lindell, Y. & Pinkas, B. (2009), 'Secure multiparty computation for privacy-preserving data mining', Journal of Privacy and Confidentiality 1(1), 5.Google ScholarGoogle ScholarCross RefCross Ref
  32. Mohammed, N., Fung, B. & Debbabi, M. (2011), 'Anonymity meets game theory: secure data integration with malicious participants', VLDB 20(4), 567--588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. O'Keefe, C., Yung, M., Gu, L. & Baxter, R. (2004), Privacy-preserving data linkage protocols, in 'ACM Workshop on Privacy in the Electronic Society'. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Pang, C., Gu, L., Hansen, D. & Maeder, A. (2009), 'Privacy-preserving fuzzy matching using a public reference table', Intelligent Patient Management pp. 71--89.Google ScholarGoogle Scholar
  35. Ravikumar, P., Cohen, W. & Fienberg, S. (2004), A secure protocol for computing string distance metrics, in 'Workshop on Privacy and Security Aspects of Data Mining held at IEEE ICDM'04', pp. 40--46.Google ScholarGoogle Scholar
  36. Scannapieco, M., Figotin, I., Bertino, E. & Elmagarmid, A. (2007), Privacy preserving schema and data matching, in 'ACM SIGMOD', pp. 653--664. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Schnell, R., Bachteler, T. & Reiher, J. (2009), 'Privacy-preserving record linkage using Bloom filters', BMC Medical Informatics and Decision Making 9(1).Google ScholarGoogle Scholar
  38. Song, D., Wagner, D. & Perrig, A. (2000), 'Practical techniques for searches on encrypted data', sp. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Trepetin, S. (2008), 'Privacy-preserving string comparisons in record linkage systems: a review', Information Security Journal: A Global Perspective 17(5), 253--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Van Eycken, E., Haustermans, K., Buntinx, F., Ceuppens, A., Weyler, J., Wauters, E., VAN, O. et al. (2000), 'Evaluation of the encryption procedure and record linkage in the Belgian National Cancer Registry', Archives of public health 58(6), 281--294.Google ScholarGoogle Scholar
  41. Vatsalan, D., Christen, P. & Verykios, V. (2011), An efficient two-party protocol for approximate matching in private record linkage, in 'AusDM, CRPIT 121'. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Vatsalan, D., Christen, P. & Verykios, V. (2013), 'A taxonomy of privacy-preserving record linkage techniques', to appear in Journal of Information Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Verykios, V., Karakasidis, A. & Mitrogiannis, V. (2009), 'Privacy preserving record linkage approaches', Int. J. of Data Mining, Modelling and Management 1(2), 206--221.Google ScholarGoogle ScholarCross RefCross Ref
  44. Weber, S., Lowe, H., Das, A. & Ferris, T. (2012), 'A simple heuristic for blindfolded record linkage', Journal of the American Medical Informatics Association.Google ScholarGoogle ScholarCross RefCross Ref
  45. Yakout, M., Atallah, M. & Elmagarmid, A. (2012), 'Efficient and practical approach for private record linkage', Journal of Data and Information Quality 3(3), 5. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    AusDM '12: Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
    December 2012
    233 pages
    ISBN:9781921770142

    Publisher

    Australian Computer Society, Inc.

    Australia

    Publication History

    • Published: 5 December 2012

    Qualifiers

    • research-article

    Acceptance Rates

    AusDM '12 Paper Acceptance Rate25of55submissions,45%Overall Acceptance Rate98of232submissions,42%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader