Sound Non-Statistical Clustering of Static Analysis Alarms

Authors:
Woosuk Lee

Seoul National University, Seoul, Korea

Seoul National University, Seoul, Korea
View Profile

,
Wonchan Lee

Stanford University

Stanford University
View Profile

,
Dongok Kang

Seoul National University, Seoul, Korea

Seoul National University, Seoul, Korea
View Profile

,
Kihong Heo

Seoul National University, Seoul, Korea

Seoul National University, Seoul, Korea
View Profile

,
Hakjoo Oh

Korea University, Seongbuk-gu, Seoul, Korea

Korea University, Seongbuk-gu, Seoul, Korea
View Profile

,
Kwangkeun Yi

Seoul National University, Seoul, Korea

Seoul National University, Seoul, Korea
View Profile

ACM Transactions on Programming Languages and Systems Volume 39 Issue 4Article No.: 16pp 1–35https://doi.org/10.1145/3095021

Published:17 August 2017Publication History

ACM Transactions on Programming Languages and Systems

Abstract

We present a sound method for clustering alarms from static analyzers. Our method clusters alarms by discovering sound dependencies between them such that if the dominant alarms of a cluster turns out to be false, all the other alarms in the same cluster are guaranteed to be false. We have implemented our clustering algorithm on top of a realistic buffer-overflow analyzer and proved that our method reduces 45% of alarm reports. Our framework is applicable to any abstract interpretation-based static analysis and orthogonal to abstraction refinements and statistical ranking schemes.

References

Gogul Balakrishnan, Sriram Sankaranarayanan, Franjo Ivančić, Ou Wei, and Aarti Gupta. 2008. SLR: Path-sensitive analysis through infeasible-path detection and syntactic language refinement. In Proceedings of the 15th International Symposium on Static Analysis (SAS’08). Springer-Verlag, Berlin, 238--254. Google ScholarDigital Library
Thomas Ball, Mayur Naik, and Sriram K. Rajamani. 2003. From symptom to cause: Localizing errors in counterexample traces. In Proceedings of the 30th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’03). ACM, New York, NY, 97--105. Google ScholarDigital Library
Mike Beaven and Ryan Stansifer. 1993. Explaining type errors in polymorphic languages. ACM Lett. Program. Lang. Syst. 2, 1--4 (March 1993), 17--30. 1057-4514 Google ScholarDigital Library
Sam Blackshear, Bor-Yuh Evan Chang, and Manu Sridharan. 2013. Thresher: Precise refutations for heap reachability. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). ACM, New York, NY, 275--286. Google ScholarDigital Library
Bruno Blanchet, Patrick Cousot, Radhia Cousot, Jérome Feret, Laurent Mauborgne, Antoine Miné, David Monniaux, and Xavier Rival. 2003. A static analyzer for large safety-critical software. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI’03). ACM, New York, NY, 196--207. Google ScholarDigital Library
Olaf Chitil. 2001. Compositional explanation of types and algorithmic debugging of type errors. In Proceedings of the 6th ACM SIGPLAN International Conference on Functional Programming (ICFP’01). ACM, New York, NY, 193--204. Google ScholarDigital Library
Cristina Cifuentes, Christian Hoermann, Nathan Keynes, Lian Li, Simon Long, Erica Mealy, Michael Mounteney, and Bernhard Scholz. 2009. BegBunch: Benchmarking for C bug detection tools. In Proceedings of the 2nd International Workshop on Defects in Large Software Systems: Held in Conjunction with the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2009) (DEFECTS’09). ACM, New York, NY, 16--20. Google ScholarDigital Library
Patrick Cousot and Rahida Cousot. 1992. Abstract interpretation and application to logic programs. J. Log. Program. 13, 2--3 (July 1992), 103--179. 0743-1066 Google ScholarDigital Library
P. Cousot, P. Ganty, and J.-F. Raskin. 2007. Fixpoint-guided abstraction refinements. In Proceedings of the 14th International Symposium on Static Analysis, SAS’07, G. Filé and H. Riis Nielson (Eds.). Springer, Berlin, 333--348. Google ScholarDigital Library
Vijay D’Silva, Leopold Haller, Daniel Kroening, and Michael Tautschnig. 2012. Numeric bounds analysis with conflict-driven learning. In Proceedings of the 18th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’12). Springer-Verlag, Berlin, 48--63. Google ScholarDigital Library
Dominic Duggan and Frederick Bent. 1995. Explaining type inference. In Science of Computer Programming. 37--83. Google ScholarDigital Library
Alex Groce and Willem Visser. 2003. What went wrong: Explaining counterexamples. In Proceedings of the 10th International Conference on Model Checking Software (SPIN’03). Springer-Verlag, Berlin, 121--136. http://dl.acm.org/citation.cfm?id=1767111.1767119 Google ScholarDigital Library
Bhargav S. Gulavani and Sriram K. Rajamani. 2006. Counterexample driven refinement for abstract interpretation. In Tools and Algorithms for the Construction and Analysis of Systems, Holger Hermanns and Jens Palsberg (Eds.). Lecture Notes in Computer Science, Vol. 3920. Springer, Berlin, 474--488. Google ScholarDigital Library
Bhargav S. Gulavani, Supratik Chakraborty, Aditya V. Nori, and Sriram K. Rajamani. 2008. Automatically refining abstract interpretations. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08/ETAPS’08). Springer-Verlag, Berlin, 443--458. Google ScholarDigital Library
Gregory F. Johnson and Janet A. Walz. 1986. A maximum-flow approach to anomaly isolation in unification-based incremental type inference. In Proceedings of the 13th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’86). ACM, New York, NY, 44--57. Google ScholarDigital Library
Yungbum Jung, Jaehwang Kim, Jaeho Shin, and Kwangkeun Yi. 2005. Taming false alarms from a domain-unaware C analyzer by a bayesian statistical post analysis. In Proceedings of the 12th International Conference on Static Analysis (SAS’05). Springer-Verlag, Berlin, 203--217. Google ScholarDigital Library
Heejung Kim, Yungbum Jung, Sunghun Kim, and Kwankeun Yi. 2011. MeCC: Memory comparison-based clone detector. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11). ACM, New York, NY, 301--310. Google ScholarDigital Library
Youil Kim, Jooyong Lee, Hwansoo Han, and Kwang-Moo Choe. 2010. Filtering false alarms of buffer overflow analysis using SMT solvers. Inf. Softw. Technol. 52, 2 (Feb. 2010), 210--219. 0950-5849 Google ScholarDigital Library
Ted Kremenek, Ken Ashcraft, Junfeng Yang, and Dawson Engler. 2004. Correlation exploitation in error ranking. In Proceedings of the 12th ACM SIGSOFT Twelfth International Symposium on Foundations of Software Engineering (SIGSOFT’04/FSE-12). ACM, New York, NY, 83--93. Google ScholarDigital Library
Ted Kremenek and Dawson Engler. 2003. Z-ranking: Using statistical analysis to counter the impact of static analysis approximations. In Proceedings of the 10th International Conference on Static Analysis (SAS’03). Springer-Verlag, Berlin, 295--315. Google ScholarDigital Library
Wei Le and Mary Lou Soffa. 2010. Path-based fault correlations. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’10). ACM, New York, NY, 307--316. Google ScholarDigital Library
Woosuk Lee, Wonchan Lee, and Kwangkeun Yi. 2012. Sound non-statistical clustering of static analysis alarms. In Proceedings of the 13th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI’12). Springer-Verlag, Berlin, 299--314. Google ScholarDigital Library
Percy Liang, Omer Tripp, and Mayur Naik. 2011. Learning minimal abstractions. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’11). ACM, New York, NY, 31--42. Google ScholarDigital Library
Ravi Mangal, Xin Zhang, Aditya V. Nori, and Mayur Naik. 2015. A user-guided approach to program analysis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, 462--473. Google ScholarDigital Library
MathWorks. 2015. Polyspace Embedded Software Verification. Retrieved from http://www.mathworks.com/products/polyspace/index.html.Google Scholar
Laurent Mauborgne and Xavier Rival. 2005. Trace partitioning in abstract interpretation based static analyzers. In Proceedings of the 14th European Conference on Programming Languages and Systems (ESOP’05). Springer-Verlag, Berlin, 5--20. Google ScholarDigital Library
Microsoft. 2015. Code Contracts. Retrieved from http://msdn.microsoft.com/en-us/devlabs/dd491992.aspx.Google Scholar
Antoine Miné. 2006. The octagon abstract domain. Higher Order Symbol. Comput. 19, 1 (March 2006), 31--100. 1388-3690 Google ScholarDigital Library
Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, Daejun Park, Jeehoon Kang, and Kwangkeun Yi. 2014. Global sparse analysis framework. ACM Trans. Program. Lang. Syst. 36, 3, Article 8 (Sept. 2014), 44 pages. 0164-0925 Google ScholarDigital Library
Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. Design and implementation of sparse global analyses for C-like languages. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, New York, NY, 229--238. Google ScholarDigital Library
Xavier Rival. 2005. Understanding the origin of alarms in ASTRÉE. In Proceedings of the 12th International Conference on Static Analysis (SAS’05). Springer-Verlag, Berlin, 303--319. Google ScholarDigital Library
ROPAS. 2017. The Sparrow Static Analyzer. Retrieved from https://github.com/ropas/sparrow.Google Scholar
F. Tip and T. B. Dinesh. 2001. A slicing-based approach for locating type errors. ACM Trans. Softw. Eng. Methodol. 10, 1 (Jan. 2001), 5--55. 1049-331X Google ScholarDigital Library
Mitchell Wand. 1986. Finding the source of type errors. In Proceedings of the 13th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’86). ACM, New York, NY, 38--43. Google ScholarDigital Library

Index Terms

Sound Non-Statistical Clustering of Static Analysis Alarms
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Sensor networks
    2. Sensors and actuators
2. Theory of computation
  1. Logic
    1. Abstraction

Recommendations

Sound non-statistical clustering of static analysis alarms
VMCAI'12: Proceedings of the 13th international conference on Verification, Model Checking, and Abstract Interpretation

We present a sound method for clustering alarms from static analyzers. Our method clusters alarms by discovering sound dependencies between them such that if the dominant alarm of a cluster turns out to be false (respectively true) then it is assured ...
Read More
Pushdown control-flow analysis for free
POPL '16

Traditional control-flow analysis (CFA) for higher-order languages introduces spurious connections between callers and callees, and different invocations of a function may pollute each other's return flows. Recently, three distinct approaches have been ...
Read More
Machine-learning-guided selectively unsound static analysis
ICSE '17: Proceedings of the 39th International Conference on Software Engineering

We present a machine-learning-based technique for selectively applying unsoundness in static analysis. Existing bug-finding static analyzers are unsound in order to be precise and scalable in practice. However, they are uniformly unsound and hence at ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Programming Languages and Systems Volume 39, Issue 4
December 2017
191 pages
ISSN:0164-0925
EISSN:1558-4593
DOI:10.1145/3133234
Editor:
Andrew Myers
Cornell University, USA
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 August 2017
- Accepted: 1 May 2017
- Revised: 1 December 2016
- Received: 1 June 2015
Published in toplas Volume 39, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Static analysis
abstract interpretation
false alarms
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 309
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sound Non-Statistical Clustering of Static Analysis Alarms

ACM Transactions on Programming Languages and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Sound non-statistical clustering of static analysis alarms

Pushdown control-flow analysis for free

Machine-learning-guided selectively unsound static analysis