skip to main content
10.1145/3196494.3196553acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
research-article
Public Access

Augmenting Telephone Spam Blacklists by Mining Large CDR Datasets

Published:29 May 2018Publication History

ABSTRACT

Telephone spam has become an increasingly prevalent problem in many countries all over the world. For example, the US Federal Trade Commission's (FTC) National Do Not Call Registry's number of cumulative complaints of spam/scam calls reached 30.9 million submissions in 2016. Naturally, telephone carriers can play an important role in the fight against spam. However, due to the extremely large volume of calls that transit across large carrier networks, it is challenging to mine their vast amounts of call detail records (CDRs) to accurately detect and block spam phone calls. This is because CDRs only contain high-level metadata (e.g., source and destination numbers, call start time, call duration, etc.) related to each phone calls. In addition, ground truth about both benign and spam-related phone numbers is often very scarce (only a tiny fraction of all phone numbers can be labeled). More importantly, telephone carriers are extremely sensitive to false positives, as they need to avoid blocking any non-spam calls, making the detection of spam-related numbers even more challenging. In this paper, we present a novel detection system that aims to discover telephone numbers involved in spam campaigns. Given a small seed of known spam phone numbers, our system uses a combination of unsupervised and supervised machine learning methods to mine new, previously unknown spam numbers from large datasets of call detail records (CDRs). Our objective is not to detect all possible spam phone calls crossing a carrier's network, but rather to expand the list of known spam numbers while aiming for zero false positives, so that the newly discovered numbers may be added to a phone blacklist, for example. To evaluate our system, we have conducted experiments over a large dataset of real-world CDRs provided by a leading telephony provider in China, while tuning the system to produce no false positives. The experimental results show that our system is able to greatly expand on the initial seed of known spam numbers by up to about 250%.

References

  1. Mina Amanian, Mohammad Hossein Yaghmaee Moghaddam, and Hossein Khosravi Roshkhari . 2013. New method for evaluating anti-SPIT in VoIP networks Computer and Knowledge Engineering (ICCKE), 2013 3th International eConference on. IEEE, 374--379.Google ScholarGoogle Scholar
  2. Vijay Balasubramaniyan, Mustaque Ahamad, and Haesun Park . 2007. CallRank: Combating SPIT Using Call Duration, Social Networks and Global Reputation. CEAS.Google ScholarGoogle Scholar
  3. Randa Jabeur Ben Chikha, Tarek Abbes, Wassim Ben Chikha, and Adel Bouhoula . 2016. Behavior-based approach to detect spam over IP telephony attacks. International Journal of Information Security Vol. 15, 2 (2016), 131--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. ChuBao . 2016. 2016 China Spam Phone Call Trend Analysis Report. http://www.cnii.com.cn/industry/2016-09/29/content_1784329.htm. (2016).Google ScholarGoogle Scholar
  5. Federal Trade Commission . {n. d.}. Caller ID Spoofing and Call Authentication Technology. https://www.ftc.gov/sites/default/files/documents/public_events/robocalls-all-rage-ftc-summit/robocalls-part5-caller-id-spoofing.pdf. (. {n. d.}).Google ScholarGoogle Scholar
  6. Federal Trade Commission . 2014. National do not call registry data book fy 2016. https://www.ftc.gov/system/files/documents/reports/national-do-not-call-registry-data-book-fiscal-year-2014/dncdatabookfy2014.pdf. (2014).Google ScholarGoogle Scholar
  7. Ram Dantu and Prakash Kolan . 2005. Detecting Spam in VoIP Networks. SRUTI Vol. 5 (2005), 5--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Payas Gupta, Bharat Srinivasan, Vijay Balasubramaniyan, and Mustaque Ahamad . 2015. Phoneypot: Data-driven Understanding of Telephony Threats. NDSS.Google ScholarGoogle Scholar
  9. Hyung-Jong Kim, Myuhng Joo Kim, Yoonjeong Kim, and Hyun Cheol Jeong . 2009. DEVS-based modeling of VoIP spam callers' behavior for SPIT level calculation. Simulation Modelling Practice and Theory Vol. 17, 4 (2009), 569--584.Google ScholarGoogle ScholarCross RefCross Ref
  10. Prakash Kolan and Ram Dantu . 2007. Socio-technical defense against voice spamming. ACM Transactions on Autonomous and Adaptive Systems (TAAS) Vol. 2, 1 (2007), 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tetsuya Kusumoto, Eric Y Chen, and Mitsutaka Itoh . 2009. Using call patterns to detect unwanted communication callers Applications and the Internet, 2009. SAINT'09. Ninth Annual International Symposium On. IEEE, 64--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman . 2014. Mining of massive datasets. Cambridge university press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S Pandit, R Perdisci, M Ahmad, and P Gupta . 2018. Towards Measuring the Effectiveness of Telephony Blacklists (to appear) NDSS.Google ScholarGoogle Scholar
  14. Pushkar Patankar, Gunwoo Nam, George Kesidis, and Chita R Das . 2008. Exploring anti-spam models in large scale voip systems Distributed Computing Systems, 2008. ICDCS'08. The 28th International Conference on. IEEE, 85--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jonathan Rosenberg and Cullen Jennings . 2008. The session initiation protocol (SIP) and spam. Technical Report.Google ScholarGoogle Scholar
  16. Ming-Yang Su and Chen-Han Tsai . 2012. A prevention system for spam over internet telephony. Appl. Math Vol. 6, 2S (2012), 579S--585S.Google ScholarGoogle Scholar
  17. textbf360 Security . 2017. 2016 China Mobile Security Status Report. http://zt.360.cn/1101061855.php?dtid=1101061451&did=490260073. (2017).Google ScholarGoogle Scholar
  18. Kentaroh Toyoda and Iwao Sasase . 2015. Unsupervised clustering-based SPITters detection scheme. Journal of information processing Vol. 23, 1 (2015), 81--92.Google ScholarGoogle ScholarCross RefCross Ref
  19. Huahong Tu, Adam Doupé, Ziming Zhao, and Gail-Joon Ahn . 2016. SoK: Everyone Hates Robocalls: A Survey of Techniques against Telephone Spam Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 320--338.Google ScholarGoogle Scholar
  20. Fei Wang, Min Feng, and KeXing Yan . 2012. Voice spam detecting technique based on user behavior pattern model Wireless Communications, Networking and Mobile Computing (WiCOM), 2012 8th International Conference on. IEEE, 1--5.Google ScholarGoogle Scholar
  21. Fei Wang, Yijun Mo, and Benxiong Huang . 2007. P2p-avs: P2p based cooperative voip spam filtering Wireless Communications and Networking Conference, 2007. WCNC 2007. IEEE. IEEE, 3547--3552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Wikipedia . {n. d.}. Call detail record. https://en.wikipedia.org/wiki/Call_detail_record. (. {n. d.}).Google ScholarGoogle Scholar
  23. Yu-Sung Wu, Saurabh Bagchi, Navjot Singh, and Ratsameetip Wita . 2009. Spam detection in voice-over-ip calls through semi-supervised clustering Dependable Systems & Networks, 2009. DSN'09. IEEE/IFIP International Conference on. IEEE, 307--316.Google ScholarGoogle Scholar
  24. Tian Zhang, Raghu Ramakrishnan, and Miron Livny . 1996. BIRCH: an efficient data clustering method for very large databases ACM Sigmod Record, Vol. Vol. 25. ACM, 103--114. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Augmenting Telephone Spam Blacklists by Mining Large CDR Datasets

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ASIACCS '18: Proceedings of the 2018 on Asia Conference on Computer and Communications Security
            May 2018
            866 pages
            ISBN:9781450355766
            DOI:10.1145/3196494

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 29 May 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            ASIACCS '18 Paper Acceptance Rate52of310submissions,17%Overall Acceptance Rate418of2,322submissions,18%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader