ABSTRACT
Detecting bugs with code mining has proven to be an effective approach. However, the existing methods suffer from reporting serious false positives and false negatives. In this paper, we developed an approach called AntMiner to improve the precision of code mining by carefully preprocessing the source code. Specifically, we employ the program slicing technique to decompose the original source repository into independent sub-repositories, taking critical operations (automatically extracted from source code) as slicing criteria. In this way, the statements irrelevant to a critical operation are excluded from the corresponding sub-repository. Besides, various semantics-equivalent representations are normalized into a canonical form. Eventually, the mining process can be performed on a refined code database, and false positives and false negatives can be significantly pruned. We have implemented AntMiner and applied it to detect bugs in the Linux kernel. It reported 52 violations that have been either confirmed as real bugs by the kernel development community or fixed in new kernel versions. Among them, 41 cannot be detected by a widely used representative analysis tool Coverity. Besides, the result of a comparative analysis shows that our approach can effectively improve the precision of code mining and detect subtle bugs that have previously been missed.
- Bugzilla for kernel, https://bugzilla.kernel.org.Google Scholar
- Common Weakness Enumeration, http://cwe.mitre.org.Google Scholar
- Coverity, http://www.coverity.com, v7.5.Google Scholar
- Klocwork, www.klocwork.com.Google Scholar
- M. Acharya, T. Xie, J. Pei, and J. Xu. Mining API patterns as partial orders from source code: from usage scenarios to specifications. In Proceedings of the 11th European Software Engineering Conference Held Jointly with 15th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 163--173, 2007. Google ScholarDigital Library
- H. Agrawal, J. R. Horgan, S. London, and W. E. Wong. Fault localization using execution slices and dataflow tests. In Proceedings of International Symposium on Software Reliability Engineering, pages 143--151, 1995.Google ScholarCross Ref
- A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: principles, techniques, and tools. 1986. Google ScholarDigital Library
- G. Ammons, R. Bodik, J. R. Larus. Mining specifications. In Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 4--16, 2002. Google ScholarDigital Library
- Y. Cai, and L. Cao. Effective and precise dynamic detection of hidden races for Java programs. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, pages 450--461, 2015. Google ScholarDigital Library
- Y. Cai, and W. K. Chan. Magiclock: Scalable Detection of Potential Deadlocks in Large-Scale Multithreaded Programs {J}. IEEE Transactions on Software Engineering, 40(3), pages 266--281, 2014. Google ScholarDigital Library
- R-Y. Chang, A. Podgurski, and j. Yang. Finding what's not there: a new approach to revealing neglected conditions in software. In Proceedings of the 2007 International Symposium on Software Testing and Analysis, pages 163--173, 2007. Google ScholarDigital Library
- T. Y. Chen and Y. Y. Cheung. Dynamic program dicing. In Proceedings of the Conference on Software Maintenance, pages 378--385, 1993. Google ScholarDigital Library
- B. Chess and G. McGraw. Static analysis for security {J}. IEEE Security & Privacy, 2(6), pages 76--79, 2004. Google ScholarDigital Library
- S. Christey. 2011 CWE/SANS Top 25 Most Dangerous Software Errors. http://cwe.mitre.org/top25.Google Scholar
- D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as deviant behavior: a general approach to inferring errors in systems code. In Proceedings of the 18th ACM Symposium on Operating System Principles, pages 57--72, 2001. Google ScholarDigital Library
- J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. In Proceedings of ACM Transactions on Programming Languages and Systems, vol. 9, pages 319--349, 1987. Google ScholarDigital Library
- V. Ganapathy, D. King, T. Jaeger, and S. Jha. Mining security-sensitive operations in legacy code using concept analysis. In Proceedings of the 29th international conference on Software Engineering, pages 458--467, 2007. Google ScholarDigital Library
- G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In Proceedings of Workshop on Frequent Itemset Mining Implementations, 2003.Google Scholar
- H. S. Gunawi, C. Rubio-González, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and B. Liblit. EIO: Error handling is occasionally correct. In 6th USENIX Conference on File and Storage Technologies, pages 1--16, 2008. Google ScholarDigital Library
- S. Hallem, B. Chelf, Y. Xie, and D. Engler. A system and language for building system-specific, static analysis. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, pages 69--82, 2002. Google ScholarDigital Library
- M. Harman, S. Danicic. Using program slicing to simplify testing {J}. Software Testing, Verification and Reliability, 5(3), pages 143--162, 1995.Google Scholar
- S. Horwitz, H. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. In Proceedings of ACM Transactions on Programming Languages and Systems, vol. 12, pages 26--60, 1990. Google ScholarDigital Library
- T. Kremenek, P. Twohey, G. Back, A. Y. Ng, and D. Engler. From uncertainty to belief: inferring the specification within. In Proceedings of 7th Symposium on Operating Systems Design and Implementation, pages 161--176, 2006. Google ScholarDigital Library
- Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In Proceedings of 21st ACM SIGOPS symposium on Operating systems principles, pages 145--158, 2007. Google ScholarDigital Library
- Z. Li and Y. Zhou. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 306--315, 2005. Google ScholarDigital Library
- B. Livshits and T. Zimmermann. DynaMine: finding common error patterns by mining software revision histories. In Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 296--305, 2005. Google ScholarDigital Library
- D. Lo, S-C. Khoo, C. Liu. Mining past-time temporal rules from execution traces. In Proceedings of the 2008 international workshop on dynamic analysis, pages 50--56, 2008. Google ScholarDigital Library
- S. Lu, S. Park, C. Hu, X. Ma, W. Jiang, Z. Li, R. A. Popa, and Y. Zhou. MUVI: Automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. In Proceedings of 21st ACM SIGOPS symposium on Operating systems principles, pages 103--116, 2007. Google ScholarDigital Library
- K. J. Ottenstein, and L. M. Ottenstein. The program dependence graph in a software develop environment. In Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, pages 177--184, 1984. Google ScholarDigital Library
- M. Pradel, C. Jaspan, J. Aldrich, and T. R. Gross. Statically checking API protocol conformance with mined multi-object specifications. In Proceedings of the 34th International Conference on Software Engineering, pages 521--530, 2012. Google ScholarDigital Library
- M. K. Ramanathan, A. Grama, and S. Jagannathan. Static specification inference using predicate mining. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 123--134, 2007. Google ScholarDigital Library
- C. Rubio-González, B. Liblit. Defective error/pointer interactions in the linux kernel. In Proceedings of the 2011 International Symposium on Software Testing and Analysis, pages 111--121, 2011. Google ScholarDigital Library
- R. M. Stallman and the GCC Developer Community. GNU compiler collection internals (for GCC version 4.5.0), http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gccint.ps.gz, 2010.Google Scholar
- B. Sun, G. Shu, A. Podgurski, and B. Robinson. Extending static analysis by mining project-specific rules. In Proceedings of the 34th International Conference on Software Engineering, pages 1054--1063, 2012. Google ScholarDigital Library
- L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /* iComment: Bugs or bad comments?*/. In In Proceedings of the 21st ACM Symposium on Operating Systems Principles, pages 145--158, 2007. Google ScholarDigital Library
- L. Tan, X. Zhang, X. Ma, W. Xiong, and Y. Zhou. AutoISES: automatically inferring security specifications and detecting violations. In Proceedings of USENIX Security Symposium '08, pages 379--394, 2008. Google ScholarDigital Library
- L. Tan, Y. Zhou, and Y. Padioleau. aComment: mining annotations from comments and code to detect interrupt related concurrency bugs. In Proceedings of the 33rd International Conference on Software Engineering, pages 11--20, 2011. Google ScholarDigital Library
- S. Thummalapenta and T. Xie. Alattin: mining alternative patterns for detecting neglected conditions. In Proceedings of 24th IEEE/ACM International Conference on Automated Software Engineering, pages 283--294, 2009. Google ScholarDigital Library
- W. Visser, J. Geldenhuys, and M. B. Dwyer. Green: reducing, reusing and recycling constraints in program analysis. In Proceedings of FSE, 2012. Google ScholarDigital Library
- T. Wang, K. Wang, X. Su, and P. Ma. Detection of semantically similar code {J}. Frontiers of Computer Science, 8(6), pages 996--1011, 2014. Google ScholarDigital Library
- A. Wasylkowski and A. Zeller. Mining temporal specifications from object usage. In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering, pages 263--292, 2009. Google ScholarDigital Library
- W. Weimer and G. C. Necula. Mining temporal specifications for error detection. In Proceedings of the 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 461--476, 2005. Google ScholarDigital Library
- M. Weiser. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, pages 439--449, 1981. Google ScholarDigital Library
- X. Xiao, A. Paradkar, S. Thummalapenta, and T. Xie. Automated extraction of security policies from natural-language software documents. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012. Google ScholarDigital Library
- S. Xu, Y. S. Chee, Transformation-based diagnosis of student programs for programming tutoring systems, In IEEE Trans. Software Eng. 29 (4), pages 360--384, 2003. Google ScholarDigital Library
- Z. Xu, J. Zhang, and Z. Xu. Melton: a practical and precise memory leak detection tool for C programs {J}. Frontiers of Computer Science, 9(1), pages 34--54, 2015. Google ScholarDigital Library
- F. Yamaguchi, N. Golde, D. Arp, K. Rieck. Modeling and discovering vulnerabilities with code property graphs. In Proceedings of IEEE Symposium on Security and Privacy, pages 590--604, 2014. Google ScholarDigital Library
- F. Yamaguchi, A. Maier, H. Gascon, and K. Rieck. Automatic inference of search patterns for taint-style vulnerabilities. In Proceedings of IEEE Symposium on Security and Privacy, pages 797--812, 2015. Google ScholarDigital Library
- F. Yamaguchi, C. Wressnegger, H. Gascon, and K. Rieck. Chucky: Exposing missing checks in source code for vulnerability discovery. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 499--510, 2013. Google ScholarDigital Library
- H. Zhong, L. Zhang, and T. Xie, M. Hong. Inferring resource specifications from natural language API documentation. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, pages 307--318, 2009. Google ScholarDigital Library
Index Terms
- AntMiner: mining more bugs by reducing noise interference
Recommendations
DeepBugs: a learning approach to name-based bug detection
Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug ...
NAR-miner: discovering negative association rules from code for bug detection
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringInferring programming rules from source code based on data mining techniques has been proven to be effective to detect software bugs. Existing studies focus on discovering positive rules in the form of A ⇒ B, indicating that when operation A appears, ...
Locating Source Code to Be Fixed Based on Initial Bug Reports - A Case Study on the Eclipse Project
IWESEP '12: Proceedings of the 2012 Fourth International Workshop on Empirical Software Engineering in PracticeIn most software development, a Bug Tracking System is used to improve software quality. Based on bug reports managed by the bug tracking system, triagers who assign a bug to fixers and fixers need to pinpoint buggy files that should be fixed. However if ...
Comments