research-article

AntMiner: mining more bugs by reducing noise interference

Authors:
Bin Liang

Renmin University of China, MOE, Beijing, China and Renmin University of China, Beijing, China

Renmin University of China, MOE, Beijing, China and Renmin University of China, Beijing, China
View Profile

,
Pan Bian

Renmin University of China, MOE, Beijing, China and Renmin University of China, Beijing, China

Renmin University of China, MOE, Beijing, China and Renmin University of China, Beijing, China
View Profile

,
Yan Zhang

Renmin University of China, MOE, Beijing, China and Renmin University of China, Beijing, China

Renmin University of China, MOE, Beijing, China and Renmin University of China, Beijing, China
View Profile

,
Wenchang Shi

Renmin University of China, MOE, Beijing, China and Renmin University of China, Beijing, China

Renmin University of China, MOE, Beijing, China and Renmin University of China, Beijing, China
View Profile

,
Wei You

Renmin University of China, MOE, Beijing, China and Renmin University of China, Beijing, China

Renmin University of China, MOE, Beijing, China and Renmin University of China, Beijing, China
View Profile

,
Yan Cai

Institute of Software, Beijing, China

Institute of Software, Beijing, China
View Profile

ICSE '16: Proceedings of the 38th International Conference on Software EngineeringMay 2016Pages 333–344https://doi.org/10.1145/2884781.2884870

Published:14 May 2016Publication History

ICSE '16: Proceedings of the 38th International Conference on Software Engineering

Pages 333–344

ABSTRACT

Detecting bugs with code mining has proven to be an effective approach. However, the existing methods suffer from reporting serious false positives and false negatives. In this paper, we developed an approach called AntMiner to improve the precision of code mining by carefully preprocessing the source code. Specifically, we employ the program slicing technique to decompose the original source repository into independent sub-repositories, taking critical operations (automatically extracted from source code) as slicing criteria. In this way, the statements irrelevant to a critical operation are excluded from the corresponding sub-repository. Besides, various semantics-equivalent representations are normalized into a canonical form. Eventually, the mining process can be performed on a refined code database, and false positives and false negatives can be significantly pruned. We have implemented AntMiner and applied it to detect bugs in the Linux kernel. It reported 52 violations that have been either confirmed as real bugs by the kernel development community or fixed in new kernel versions. Among them, 41 cannot be detected by a widely used representative analysis tool Coverity. Besides, the result of a comparative analysis shows that our approach can effectively improve the precision of code mining and detect subtle bugs that have previously been missed.

References

Bugzilla for kernel, https://bugzilla.kernel.org.Google Scholar
Common Weakness Enumeration, http://cwe.mitre.org.Google Scholar
Coverity, http://www.coverity.com, v7.5.Google Scholar
Klocwork, www.klocwork.com.Google Scholar
M. Acharya, T. Xie, J. Pei, and J. Xu. Mining API patterns as partial orders from source code: from usage scenarios to specifications. In Proceedings of the 11th European Software Engineering Conference Held Jointly with 15th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 163--173, 2007. Google ScholarDigital Library
H. Agrawal, J. R. Horgan, S. London, and W. E. Wong. Fault localization using execution slices and dataflow tests. In Proceedings of International Symposium on Software Reliability Engineering, pages 143--151, 1995.Google ScholarCross Ref
A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: principles, techniques, and tools. 1986. Google ScholarDigital Library
G. Ammons, R. Bodik, J. R. Larus. Mining specifications. In Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 4--16, 2002. Google ScholarDigital Library
Y. Cai, and L. Cao. Effective and precise dynamic detection of hidden races for Java programs. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, pages 450--461, 2015. Google ScholarDigital Library
Y. Cai, and W. K. Chan. Magiclock: Scalable Detection of Potential Deadlocks in Large-Scale Multithreaded Programs {J}. IEEE Transactions on Software Engineering, 40(3), pages 266--281, 2014. Google ScholarDigital Library
R-Y. Chang, A. Podgurski, and j. Yang. Finding what's not there: a new approach to revealing neglected conditions in software. In Proceedings of the 2007 International Symposium on Software Testing and Analysis, pages 163--173, 2007. Google ScholarDigital Library
T. Y. Chen and Y. Y. Cheung. Dynamic program dicing. In Proceedings of the Conference on Software Maintenance, pages 378--385, 1993. Google ScholarDigital Library
B. Chess and G. McGraw. Static analysis for security {J}. IEEE Security & Privacy, 2(6), pages 76--79, 2004. Google ScholarDigital Library
S. Christey. 2011 CWE/SANS Top 25 Most Dangerous Software Errors. http://cwe.mitre.org/top25.Google Scholar
D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as deviant behavior: a general approach to inferring errors in systems code. In Proceedings of the 18th ACM Symposium on Operating System Principles, pages 57--72, 2001. Google ScholarDigital Library
J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. In Proceedings of ACM Transactions on Programming Languages and Systems, vol. 9, pages 319--349, 1987. Google ScholarDigital Library
V. Ganapathy, D. King, T. Jaeger, and S. Jha. Mining security-sensitive operations in legacy code using concept analysis. In Proceedings of the 29th international conference on Software Engineering, pages 458--467, 2007. Google ScholarDigital Library
G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In Proceedings of Workshop on Frequent Itemset Mining Implementations, 2003.Google Scholar
H. S. Gunawi, C. Rubio-González, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and B. Liblit. EIO: Error handling is occasionally correct. In 6th USENIX Conference on File and Storage Technologies, pages 1--16, 2008. Google ScholarDigital Library
S. Hallem, B. Chelf, Y. Xie, and D. Engler. A system and language for building system-specific, static analysis. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, pages 69--82, 2002. Google ScholarDigital Library
M. Harman, S. Danicic. Using program slicing to simplify testing {J}. Software Testing, Verification and Reliability, 5(3), pages 143--162, 1995.Google Scholar
S. Horwitz, H. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. In Proceedings of ACM Transactions on Programming Languages and Systems, vol. 12, pages 26--60, 1990. Google ScholarDigital Library
T. Kremenek, P. Twohey, G. Back, A. Y. Ng, and D. Engler. From uncertainty to belief: inferring the specification within. In Proceedings of 7th Symposium on Operating Systems Design and Implementation, pages 161--176, 2006. Google ScholarDigital Library
Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In Proceedings of 21st ACM SIGOPS symposium on Operating systems principles, pages 145--158, 2007. Google ScholarDigital Library
Z. Li and Y. Zhou. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 306--315, 2005. Google ScholarDigital Library
B. Livshits and T. Zimmermann. DynaMine: finding common error patterns by mining software revision histories. In Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 296--305, 2005. Google ScholarDigital Library
D. Lo, S-C. Khoo, C. Liu. Mining past-time temporal rules from execution traces. In Proceedings of the 2008 international workshop on dynamic analysis, pages 50--56, 2008. Google ScholarDigital Library
S. Lu, S. Park, C. Hu, X. Ma, W. Jiang, Z. Li, R. A. Popa, and Y. Zhou. MUVI: Automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. In Proceedings of 21st ACM SIGOPS symposium on Operating systems principles, pages 103--116, 2007. Google ScholarDigital Library
K. J. Ottenstein, and L. M. Ottenstein. The program dependence graph in a software develop environment. In Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, pages 177--184, 1984. Google ScholarDigital Library
M. Pradel, C. Jaspan, J. Aldrich, and T. R. Gross. Statically checking API protocol conformance with mined multi-object specifications. In Proceedings of the 34th International Conference on Software Engineering, pages 521--530, 2012. Google ScholarDigital Library
M. K. Ramanathan, A. Grama, and S. Jagannathan. Static specification inference using predicate mining. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 123--134, 2007. Google ScholarDigital Library
C. Rubio-González, B. Liblit. Defective error/pointer interactions in the linux kernel. In Proceedings of the 2011 International Symposium on Software Testing and Analysis, pages 111--121, 2011. Google ScholarDigital Library
R. M. Stallman and the GCC Developer Community. GNU compiler collection internals (for GCC version 4.5.0), http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gccint.ps.gz, 2010.Google Scholar
B. Sun, G. Shu, A. Podgurski, and B. Robinson. Extending static analysis by mining project-specific rules. In Proceedings of the 34th International Conference on Software Engineering, pages 1054--1063, 2012. Google ScholarDigital Library
L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /* iComment: Bugs or bad comments?*/. In In Proceedings of the 21st ACM Symposium on Operating Systems Principles, pages 145--158, 2007. Google ScholarDigital Library
L. Tan, X. Zhang, X. Ma, W. Xiong, and Y. Zhou. AutoISES: automatically inferring security specifications and detecting violations. In Proceedings of USENIX Security Symposium '08, pages 379--394, 2008. Google ScholarDigital Library
L. Tan, Y. Zhou, and Y. Padioleau. aComment: mining annotations from comments and code to detect interrupt related concurrency bugs. In Proceedings of the 33rd International Conference on Software Engineering, pages 11--20, 2011. Google ScholarDigital Library
S. Thummalapenta and T. Xie. Alattin: mining alternative patterns for detecting neglected conditions. In Proceedings of 24th IEEE/ACM International Conference on Automated Software Engineering, pages 283--294, 2009. Google ScholarDigital Library
W. Visser, J. Geldenhuys, and M. B. Dwyer. Green: reducing, reusing and recycling constraints in program analysis. In Proceedings of FSE, 2012. Google ScholarDigital Library
T. Wang, K. Wang, X. Su, and P. Ma. Detection of semantically similar code {J}. Frontiers of Computer Science, 8(6), pages 996--1011, 2014. Google ScholarDigital Library
A. Wasylkowski and A. Zeller. Mining temporal specifications from object usage. In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering, pages 263--292, 2009. Google ScholarDigital Library
W. Weimer and G. C. Necula. Mining temporal specifications for error detection. In Proceedings of the 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 461--476, 2005. Google ScholarDigital Library
M. Weiser. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, pages 439--449, 1981. Google ScholarDigital Library
X. Xiao, A. Paradkar, S. Thummalapenta, and T. Xie. Automated extraction of security policies from natural-language software documents. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012. Google ScholarDigital Library
S. Xu, Y. S. Chee, Transformation-based diagnosis of student programs for programming tutoring systems, In IEEE Trans. Software Eng. 29 (4), pages 360--384, 2003. Google ScholarDigital Library
Z. Xu, J. Zhang, and Z. Xu. Melton: a practical and precise memory leak detection tool for C programs {J}. Frontiers of Computer Science, 9(1), pages 34--54, 2015. Google ScholarDigital Library
F. Yamaguchi, N. Golde, D. Arp, K. Rieck. Modeling and discovering vulnerabilities with code property graphs. In Proceedings of IEEE Symposium on Security and Privacy, pages 590--604, 2014. Google ScholarDigital Library
F. Yamaguchi, A. Maier, H. Gascon, and K. Rieck. Automatic inference of search patterns for taint-style vulnerabilities. In Proceedings of IEEE Symposium on Security and Privacy, pages 797--812, 2015. Google ScholarDigital Library
F. Yamaguchi, C. Wressnegger, H. Gascon, and K. Rieck. Chucky: Exposing missing checks in source code for vulnerability discovery. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 499--510, 2013. Google ScholarDigital Library
H. Zhong, L. Zhang, and T. Xie, M. Hong. Inferring resource specifications from natural language API documentation. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, pages 307--318, 2009. Google ScholarDigital Library

Index Terms

AntMiner: mining more bugs by reducing noise interference
1. Software and its engineering
  1. Software organization and properties
    1. Software functional properties
      1. Formal methods
        Automated static analysis

Recommendations

DeepBugs: a learning approach to name-based bug detection

Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug ...
Read More
NAR-miner: discovering negative association rules from code for bug detection
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Inferring programming rules from source code based on data mining techniques has been proven to be effective to detect software bugs. Existing studies focus on discovering positive rules in the form of A ⇒ B, indicating that when operation A appears, ...
Read More
Locating Source Code to Be Fixed Based on Initial Bug Reports - A Case Study on the Eclipse Project
IWESEP '12: Proceedings of the 2012 Fourth International Workshop on Empirical Software Engineering in Practice

In most software development, a Bug Tracking System is used to improve software quality. Based on bug reports managed by the bug tracking system, triagers who assign a bug to fixers and fixers need to pinpoint buggy files that should be fixed. However if ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '16: Proceedings of the 38th International Conference on Software Engineering
May 2016
1235 pages
ISBN:9781450339001
DOI:10.1145/2884781
General Chair:
Laura Dillon
Michigan State University
,
Program Chairs:
Willem Visser
Stellenbosch University, South Africa
,
Laurie Williams
North Carolina State University
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 May 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bug detection
code mining
program slicing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 54
  Total Citations
  View Citations
- 712
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

AntMiner: mining more bugs by reducing noise interference

ICSE '16: Proceedings of the 38th International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

DeepBugs: a learning approach to name-based bug detection

NAR-miner: discovering negative association rules from code for bug detection

Locating Source Code to Be Fixed Based on Initial Bug Reports - A Case Study on the Eclipse Project