skip to main content
10.1145/2884781.2884870acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

AntMiner: mining more bugs by reducing noise interference

Authors Info & Claims
Published:14 May 2016Publication History

ABSTRACT

Detecting bugs with code mining has proven to be an effective approach. However, the existing methods suffer from reporting serious false positives and false negatives. In this paper, we developed an approach called AntMiner to improve the precision of code mining by carefully preprocessing the source code. Specifically, we employ the program slicing technique to decompose the original source repository into independent sub-repositories, taking critical operations (automatically extracted from source code) as slicing criteria. In this way, the statements irrelevant to a critical operation are excluded from the corresponding sub-repository. Besides, various semantics-equivalent representations are normalized into a canonical form. Eventually, the mining process can be performed on a refined code database, and false positives and false negatives can be significantly pruned. We have implemented AntMiner and applied it to detect bugs in the Linux kernel. It reported 52 violations that have been either confirmed as real bugs by the kernel development community or fixed in new kernel versions. Among them, 41 cannot be detected by a widely used representative analysis tool Coverity. Besides, the result of a comparative analysis shows that our approach can effectively improve the precision of code mining and detect subtle bugs that have previously been missed.

References

  1. Bugzilla for kernel, https://bugzilla.kernel.org.Google ScholarGoogle Scholar
  2. Common Weakness Enumeration, http://cwe.mitre.org.Google ScholarGoogle Scholar
  3. Coverity, http://www.coverity.com, v7.5.Google ScholarGoogle Scholar
  4. Klocwork, www.klocwork.com.Google ScholarGoogle Scholar
  5. M. Acharya, T. Xie, J. Pei, and J. Xu. Mining API patterns as partial orders from source code: from usage scenarios to specifications. In Proceedings of the 11th European Software Engineering Conference Held Jointly with 15th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 163--173, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Agrawal, J. R. Horgan, S. London, and W. E. Wong. Fault localization using execution slices and dataflow tests. In Proceedings of International Symposium on Software Reliability Engineering, pages 143--151, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  7. A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: principles, techniques, and tools. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Ammons, R. Bodik, J. R. Larus. Mining specifications. In Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 4--16, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Cai, and L. Cao. Effective and precise dynamic detection of hidden races for Java programs. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, pages 450--461, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Cai, and W. K. Chan. Magiclock: Scalable Detection of Potential Deadlocks in Large-Scale Multithreaded Programs {J}. IEEE Transactions on Software Engineering, 40(3), pages 266--281, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R-Y. Chang, A. Podgurski, and j. Yang. Finding what's not there: a new approach to revealing neglected conditions in software. In Proceedings of the 2007 International Symposium on Software Testing and Analysis, pages 163--173, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Y. Chen and Y. Y. Cheung. Dynamic program dicing. In Proceedings of the Conference on Software Maintenance, pages 378--385, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Chess and G. McGraw. Static analysis for security {J}. IEEE Security & Privacy, 2(6), pages 76--79, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Christey. 2011 CWE/SANS Top 25 Most Dangerous Software Errors. http://cwe.mitre.org/top25.Google ScholarGoogle Scholar
  15. D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as deviant behavior: a general approach to inferring errors in systems code. In Proceedings of the 18th ACM Symposium on Operating System Principles, pages 57--72, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. In Proceedings of ACM Transactions on Programming Languages and Systems, vol. 9, pages 319--349, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. V. Ganapathy, D. King, T. Jaeger, and S. Jha. Mining security-sensitive operations in legacy code using concept analysis. In Proceedings of the 29th international conference on Software Engineering, pages 458--467, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In Proceedings of Workshop on Frequent Itemset Mining Implementations, 2003.Google ScholarGoogle Scholar
  19. H. S. Gunawi, C. Rubio-González, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and B. Liblit. EIO: Error handling is occasionally correct. In 6th USENIX Conference on File and Storage Technologies, pages 1--16, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Hallem, B. Chelf, Y. Xie, and D. Engler. A system and language for building system-specific, static analysis. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, pages 69--82, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Harman, S. Danicic. Using program slicing to simplify testing {J}. Software Testing, Verification and Reliability, 5(3), pages 143--162, 1995.Google ScholarGoogle Scholar
  22. S. Horwitz, H. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. In Proceedings of ACM Transactions on Programming Languages and Systems, vol. 12, pages 26--60, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Kremenek, P. Twohey, G. Back, A. Y. Ng, and D. Engler. From uncertainty to belief: inferring the specification within. In Proceedings of 7th Symposium on Operating Systems Design and Implementation, pages 161--176, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In Proceedings of 21st ACM SIGOPS symposium on Operating systems principles, pages 145--158, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Z. Li and Y. Zhou. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 306--315, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Livshits and T. Zimmermann. DynaMine: finding common error patterns by mining software revision histories. In Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 296--305, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Lo, S-C. Khoo, C. Liu. Mining past-time temporal rules from execution traces. In Proceedings of the 2008 international workshop on dynamic analysis, pages 50--56, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Lu, S. Park, C. Hu, X. Ma, W. Jiang, Z. Li, R. A. Popa, and Y. Zhou. MUVI: Automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. In Proceedings of 21st ACM SIGOPS symposium on Operating systems principles, pages 103--116, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. J. Ottenstein, and L. M. Ottenstein. The program dependence graph in a software develop environment. In Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, pages 177--184, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Pradel, C. Jaspan, J. Aldrich, and T. R. Gross. Statically checking API protocol conformance with mined multi-object specifications. In Proceedings of the 34th International Conference on Software Engineering, pages 521--530, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. K. Ramanathan, A. Grama, and S. Jagannathan. Static specification inference using predicate mining. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 123--134, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Rubio-González, B. Liblit. Defective error/pointer interactions in the linux kernel. In Proceedings of the 2011 International Symposium on Software Testing and Analysis, pages 111--121, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. M. Stallman and the GCC Developer Community. GNU compiler collection internals (for GCC version 4.5.0), http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gccint.ps.gz, 2010.Google ScholarGoogle Scholar
  34. B. Sun, G. Shu, A. Podgurski, and B. Robinson. Extending static analysis by mining project-specific rules. In Proceedings of the 34th International Conference on Software Engineering, pages 1054--1063, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /* iComment: Bugs or bad comments?*/. In In Proceedings of the 21st ACM Symposium on Operating Systems Principles, pages 145--158, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. L. Tan, X. Zhang, X. Ma, W. Xiong, and Y. Zhou. AutoISES: automatically inferring security specifications and detecting violations. In Proceedings of USENIX Security Symposium '08, pages 379--394, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. Tan, Y. Zhou, and Y. Padioleau. aComment: mining annotations from comments and code to detect interrupt related concurrency bugs. In Proceedings of the 33rd International Conference on Software Engineering, pages 11--20, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Thummalapenta and T. Xie. Alattin: mining alternative patterns for detecting neglected conditions. In Proceedings of 24th IEEE/ACM International Conference on Automated Software Engineering, pages 283--294, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. W. Visser, J. Geldenhuys, and M. B. Dwyer. Green: reducing, reusing and recycling constraints in program analysis. In Proceedings of FSE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. T. Wang, K. Wang, X. Su, and P. Ma. Detection of semantically similar code {J}. Frontiers of Computer Science, 8(6), pages 996--1011, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. A. Wasylkowski and A. Zeller. Mining temporal specifications from object usage. In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering, pages 263--292, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. W. Weimer and G. C. Necula. Mining temporal specifications for error detection. In Proceedings of the 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 461--476, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Weiser. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, pages 439--449, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. X. Xiao, A. Paradkar, S. Thummalapenta, and T. Xie. Automated extraction of security policies from natural-language software documents. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. Xu, Y. S. Chee, Transformation-based diagnosis of student programs for programming tutoring systems, In IEEE Trans. Software Eng. 29 (4), pages 360--384, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Z. Xu, J. Zhang, and Z. Xu. Melton: a practical and precise memory leak detection tool for C programs {J}. Frontiers of Computer Science, 9(1), pages 34--54, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. F. Yamaguchi, N. Golde, D. Arp, K. Rieck. Modeling and discovering vulnerabilities with code property graphs. In Proceedings of IEEE Symposium on Security and Privacy, pages 590--604, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. F. Yamaguchi, A. Maier, H. Gascon, and K. Rieck. Automatic inference of search patterns for taint-style vulnerabilities. In Proceedings of IEEE Symposium on Security and Privacy, pages 797--812, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. F. Yamaguchi, C. Wressnegger, H. Gascon, and K. Rieck. Chucky: Exposing missing checks in source code for vulnerability discovery. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 499--510, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. H. Zhong, L. Zhang, and T. Xie, M. Hong. Inferring resource specifications from natural language API documentation. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, pages 307--318, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. AntMiner: mining more bugs by reducing noise interference

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICSE '16: Proceedings of the 38th International Conference on Software Engineering
      May 2016
      1235 pages
      ISBN:9781450339001
      DOI:10.1145/2884781

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 May 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate276of1,856submissions,15%

      Upcoming Conference

      ICSE 2025

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader