research-article

Public Access

Augmenting Telephone Spam Blacklists by Mining Large CDR Datasets

Authors:
Jienan Liu

University of Georgia, Athens, GA, USA

University of Georgia, Athens, GA, USA
View Profile

,
Babak Rahbarinia

Auburn University Montgomery, Montgomery, AL, USA

Auburn University Montgomery, Montgomery, AL, USA
View Profile

,
Roberto Perdisci

University of Georgia, Athens, GA, USA

University of Georgia, Athens, GA, USA
View Profile

,
Haitao Du

China Mobile Research Institute, Beijing, China

China Mobile Research Institute, Beijing, China
View Profile

,
Li Su

China Mobile Research Institute, Beijing, China

China Mobile Research Institute, Beijing, China
View Profile

ASIACCS '18: Proceedings of the 2018 on Asia Conference on Computer and Communications SecurityMay 2018Pages 273–284https://doi.org/10.1145/3196494.3196553

Published:29 May 2018Publication History

ASIACCS '18: Proceedings of the 2018 on Asia Conference on Computer and Communications Security

Pages 273–284

ABSTRACT

Telephone spam has become an increasingly prevalent problem in many countries all over the world. For example, the US Federal Trade Commission's (FTC) National Do Not Call Registry's number of cumulative complaints of spam/scam calls reached 30.9 million submissions in 2016. Naturally, telephone carriers can play an important role in the fight against spam. However, due to the extremely large volume of calls that transit across large carrier networks, it is challenging to mine their vast amounts of call detail records (CDRs) to accurately detect and block spam phone calls. This is because CDRs only contain high-level metadata (e.g., source and destination numbers, call start time, call duration, etc.) related to each phone calls. In addition, ground truth about both benign and spam-related phone numbers is often very scarce (only a tiny fraction of all phone numbers can be labeled). More importantly, telephone carriers are extremely sensitive to false positives, as they need to avoid blocking any non-spam calls, making the detection of spam-related numbers even more challenging. In this paper, we present a novel detection system that aims to discover telephone numbers involved in spam campaigns. Given a small seed of known spam phone numbers, our system uses a combination of unsupervised and supervised machine learning methods to mine new, previously unknown spam numbers from large datasets of call detail records (CDRs). Our objective is not to detect all possible spam phone calls crossing a carrier's network, but rather to expand the list of known spam numbers while aiming for zero false positives, so that the newly discovered numbers may be added to a phone blacklist, for example. To evaluate our system, we have conducted experiments over a large dataset of real-world CDRs provided by a leading telephony provider in China, while tuning the system to produce no false positives. The experimental results show that our system is able to greatly expand on the initial seed of known spam numbers by up to about 250%.

References

Mina Amanian, Mohammad Hossein Yaghmaee Moghaddam, and Hossein Khosravi Roshkhari . 2013. New method for evaluating anti-SPIT in VoIP networks Computer and Knowledge Engineering (ICCKE), 2013 3th International eConference on. IEEE, 374--379.Google Scholar
Vijay Balasubramaniyan, Mustaque Ahamad, and Haesun Park . 2007. CallRank: Combating SPIT Using Call Duration, Social Networks and Global Reputation. CEAS.Google Scholar
Randa Jabeur Ben Chikha, Tarek Abbes, Wassim Ben Chikha, and Adel Bouhoula . 2016. Behavior-based approach to detect spam over IP telephony attacks. International Journal of Information Security Vol. 15, 2 (2016), 131--143. Google ScholarDigital Library
ChuBao . 2016. 2016 China Spam Phone Call Trend Analysis Report. http://www.cnii.com.cn/industry/2016-09/29/content_1784329.htm. (2016).Google Scholar
Federal Trade Commission . {n. d.}. Caller ID Spoofing and Call Authentication Technology. https://www.ftc.gov/sites/default/files/documents/public_events/robocalls-all-rage-ftc-summit/robocalls-part5-caller-id-spoofing.pdf. (. {n. d.}).Google Scholar
Federal Trade Commission . 2014. National do not call registry data book fy 2016. https://www.ftc.gov/system/files/documents/reports/national-do-not-call-registry-data-book-fiscal-year-2014/dncdatabookfy2014.pdf. (2014).Google Scholar
Ram Dantu and Prakash Kolan . 2005. Detecting Spam in VoIP Networks. SRUTI Vol. 5 (2005), 5--5. Google ScholarDigital Library
Payas Gupta, Bharat Srinivasan, Vijay Balasubramaniyan, and Mustaque Ahamad . 2015. Phoneypot: Data-driven Understanding of Telephony Threats. NDSS.Google Scholar
Hyung-Jong Kim, Myuhng Joo Kim, Yoonjeong Kim, and Hyun Cheol Jeong . 2009. DEVS-based modeling of VoIP spam callers' behavior for SPIT level calculation. Simulation Modelling Practice and Theory Vol. 17, 4 (2009), 569--584.Google ScholarCross Ref
Prakash Kolan and Ram Dantu . 2007. Socio-technical defense against voice spamming. ACM Transactions on Autonomous and Adaptive Systems (TAAS) Vol. 2, 1 (2007), 2. Google ScholarDigital Library
Tetsuya Kusumoto, Eric Y Chen, and Mitsutaka Itoh . 2009. Using call patterns to detect unwanted communication callers Applications and the Internet, 2009. SAINT'09. Ninth Annual International Symposium On. IEEE, 64--70. Google ScholarDigital Library
Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman . 2014. Mining of massive datasets. Cambridge university press. Google ScholarDigital Library
S Pandit, R Perdisci, M Ahmad, and P Gupta . 2018. Towards Measuring the Effectiveness of Telephony Blacklists (to appear) NDSS.Google Scholar
Pushkar Patankar, Gunwoo Nam, George Kesidis, and Chita R Das . 2008. Exploring anti-spam models in large scale voip systems Distributed Computing Systems, 2008. ICDCS'08. The 28th International Conference on. IEEE, 85--92. Google ScholarDigital Library
Jonathan Rosenberg and Cullen Jennings . 2008. The session initiation protocol (SIP) and spam. Technical Report.Google Scholar
Ming-Yang Su and Chen-Han Tsai . 2012. A prevention system for spam over internet telephony. Appl. Math Vol. 6, 2S (2012), 579S--585S.Google Scholar
textbf360 Security . 2017. 2016 China Mobile Security Status Report. http://zt.360.cn/1101061855.php?dtid=1101061451&did=490260073. (2017).Google Scholar
Kentaroh Toyoda and Iwao Sasase . 2015. Unsupervised clustering-based SPITters detection scheme. Journal of information processing Vol. 23, 1 (2015), 81--92.Google ScholarCross Ref
Huahong Tu, Adam Doupé, Ziming Zhao, and Gail-Joon Ahn . 2016. SoK: Everyone Hates Robocalls: A Survey of Techniques against Telephone Spam Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 320--338.Google Scholar
Fei Wang, Min Feng, and KeXing Yan . 2012. Voice spam detecting technique based on user behavior pattern model Wireless Communications, Networking and Mobile Computing (WiCOM), 2012 8th International Conference on. IEEE, 1--5.Google Scholar
Fei Wang, Yijun Mo, and Benxiong Huang . 2007. P2p-avs: P2p based cooperative voip spam filtering Wireless Communications and Networking Conference, 2007. WCNC 2007. IEEE. IEEE, 3547--3552. Google ScholarDigital Library
Wikipedia . {n. d.}. Call detail record. https://en.wikipedia.org/wiki/Call_detail_record. (. {n. d.}).Google Scholar
Yu-Sung Wu, Saurabh Bagchi, Navjot Singh, and Ratsameetip Wita . 2009. Spam detection in voice-over-ip calls through semi-supervised clustering Dependable Systems & Networks, 2009. DSN'09. IEEE/IFIP International Conference on. IEEE, 307--316.Google Scholar
Tian Zhang, Raghu Ramakrishnan, and Miron Livny . 1996. BIRCH: an efficient data clustering method for very large databases ACM Sigmod Record, Vol. Vol. 25. ACM, 103--114. Google ScholarDigital Library

Index Terms

Augmenting Telephone Spam Blacklists by Mining Large CDR Datasets
1. Security and privacy

Recommendations

Exploring Anti-Spam Models in Large Scale VoIP Systems
ICDCS '08: Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems

Although the problem of spam detection in email is well understood and has been extensively researched, a significant portion of emails today are spam. A most widely used method to detect spam involves content filtering, where the spam detector scans ...
Read More
Beyond blacklists: learning to detect malicious web sites from suspicious URLs
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Malicious Web sites are a cornerstone of Internet criminal activities. As a result, there has been broad interest in developing systems to prevent the end user from visiting such sites. In this paper, we describe an approach to this problem based on ...
Read More
Towards the effective temporal association mining of spam blacklists
CEAS '11: Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference

IP blacklists are a well-regarded anti-spam mechanism that capture global spamming patterns. These properties make such lists a practical ground-truth by which to study email spam behaviors. Observing one blacklist for nearly a year-and-a-half, we ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASIACCS '18: Proceedings of the 2018 on Asia Conference on Computer and Communications Security
May 2018
866 pages
ISBN:9781450355766
DOI:10.1145/3196494
General Chairs:
Jong Kim
Pohang University of Science and Technology, South Korea
,
Gail-Joon Ahn
Arizona State University, USA &Samsung Electronics, South Korea
,
Seungjoo Kim
Korea University, South Korea
,
Program Chairs:
Yongdae Kim
KAIST, South Korea
,
Javier Lopez
University of Malaga, Spain
,
Taesoo Kim
Georgia Tech, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
blacklisting
cdr mining
machine learning
telephone spam
voip
Qualifiers
- research-article
Conference

Acceptance Rates
ASIACCS '18 Paper Acceptance Rate52of310submissions,17%Overall Acceptance Rate418of2,322submissions,18%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 444
  Total Downloads
- Downloads (Last 12 months)79
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Augmenting Telephone Spam Blacklists by Mining Large CDR Datasets

ASIACCS '18: Proceedings of the 2018 on Asia Conference on Computer and Communications Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploring Anti-Spam Models in Large Scale VoIP Systems

Beyond blacklists: learning to detect malicious web sites from suspicious URLs

Towards the effective temporal association mining of spam blacklists

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Augmenting Telephone Spam Blacklists by Mining Large CDR Datasets

ASIACCS '18: Proceedings of the 2018 on Asia Conference on Computer and Communications Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploring Anti-Spam Models in Large Scale VoIP Systems

Beyond blacklists: learning to detect malicious web sites from suspicious URLs

Towards the effective temporal association mining of spam blacklists

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media