skip to main content
10.1145/2884781.2884881acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Automatic model generation from documentation for Java API functions

Published:14 May 2016Publication History

ABSTRACT

Modern software systems are becoming increasingly complex, relying on a lot of third-party library support. Library behaviors are hence an integral part of software behaviors. Analyzing them is as important as analyzing the software itself. However, analyzing libraries is highly challenging due to the lack of source code, implementation in different languages, and complex optimizations. We observe that many Java library functions provide excellent documentation, which concisely describes the functionalities of the functions. We develop a novel technique that can construct models for Java API functions by analyzing the documentation. These models are simpler implementations in Java compared to the original ones and hence easier to analyze. More importantly, they provide the same functionalities as the original functions. Our technique successfully models 326 functions from 14 widely used Java classes. We also use these models in static taint analysis on Android apps and dynamic slicing for Java programs, demonstrating the effectiveness and efficiency of our models.

References

  1. Dacapo. http://dacapobench.org/.Google ScholarGoogle Scholar
  2. Funkyjfilter benchmark. https://github.com/olim7t/java-benchmarks/blob/master/src/main.java/FunkyJFilterBenchmark.java.Google ScholarGoogle Scholar
  3. J2SE's javadoc. http://docs.oracle.com/javase/8/docs/api/.Google ScholarGoogle Scholar
  4. Javaslicer. https://github.com/hammacher/javaslicer.Google ScholarGoogle Scholar
  5. Jtreg. http://openjdk.java.net/jtreg/.Google ScholarGoogle Scholar
  6. Listappend benchmark. https://github.com/olim7t/java-benchmarks/blob/master/src/main.java/ListAppendBenchmark.java.Google ScholarGoogle Scholar
  7. Specjbb. https://www.spec.org/.Google ScholarGoogle Scholar
  8. The stanford natural language processing group. http://nlp.stanford.edu/software/lex-parser.shtml, 1999.Google ScholarGoogle Scholar
  9. H. Agrawal and J. R. Horgan. Dynamic program slicing. In ACM SIGPLAN Notices. ACM, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. A. Andrew and P. Jens. Modern compiler implementation in java, 2002.Google ScholarGoogle Scholar
  11. C. Cadar, D. Dunbar, and D. R. Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, volume 8, pages 209--224, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Ceccarello and O. Tkachuk. Automated generation of model classes for java pathfinder. ACM SIGSOFT Software Engineering Notes, 39(1):1--5, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Cristian Cadar and D. Dunbar. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. OSDI, San Diego, CA, USA (December 2008), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Henkel, C. Reichenbach, and A. Diwan. Discovering documentation for java container classes. Software Engineering, IEEE Transactions on, 33(8):526--543, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Jurafsky and J. H. Martin. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Klein and C. D. Manning. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pages 423--430. Association for Computational Linguistics, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. D. Manning and H. Schiitze. Foundations of statistical natural language processing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. C. Myers. Jflow: Practical mostly-static information flow control. In Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 228--241. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Newsome and D. Song. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. 2005.Google ScholarGoogle Scholar
  20. C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In Software Engineering, 2007. ICSE 2007. 29th International Conference on, pages 75--84. IEEE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. K. Palepu, G. Xu, J. Jones, et al. Improving efficiency of dynamic analysis with dynamic dependence summaries. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on, pages 59--69. IEEE, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Pandita, X. Xiao, H. Zhong, T. Xie, S. Oney, and A. Paradkar. Inferring method specifications from natural language api descriptions. In Proceedings of the 34th International Conference on Software Engineering, pages 815--825. IEEE Press, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Qi, W. N. Sumner, F. Qin, M. Zheng, X. Zhang, and A. Roychoudhury. Modeling software execution environment. In Reverse Engineering (WCRE), 2012 19th Working Conference on, pages 415--424. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. N. Rafferty and C. D. Manning. Parsing three german treebanks: Lexicalized and unlexicalized baselines. In Proceedings of the Workshop on Parsing German, pages 40--46. Association for Computational Linguistics, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Runeson, M. Alexandersson, and O. Nyholm. Detection of duplicate defect reports using natural language processing. In Software Engineering, 2007. ICSE 2007. 29th International Conference on, pages 499--510. IEEE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Sinha, S. M. Sutton, and A. Paradkar. Text2test: Automated inspection of natural language use cases. In Software Testing, Verification and Validation (ICST), 2010 Third International Conference on, pages 155--164. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena. Bitblaze: A new approach to computer security via binary analysis. In Information systems security, pages 1--25. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Sun, D. Lo, X. Wang, J. Jiang, and S.-C. Khoo. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, pages 45--54. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /* icomment: Bugs or bad comments?*. In ACM SIGOPS Operating Systems Review, volume 41, pages 145--158. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. H. Tan, D. Marinov, L. Tan, and G. T. Leavens. @ tcomment: Testing javadoc comments to detect comment-code inconsistencies. In Software Testing, Verification and Validation (ICST), 2012 IEEE Fifth International Conference on, pages 260--269. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. O. Tkachuk. Ocsegen: Open components and systems environment generator. In Proceedings of the 2nd ACM SIGPLAN International Workshop on State Of the Art in Java Program analysis, pages 9--12. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. O. Tkachuk, M. B. Dwyer, and C. S. Păsăreanu. Automated environment generation for software model checking. In Automated Software Engineering, 2003. Proceedings. 18th IEEE International Conference on, pages 116--127. IEEE, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. van der Merwe, O. Tkachuk, B. van der Merwe, and W. Visser. Generation of library models for verification of android applications. ACM SIGSOFT Software Engineering Notes, 40(1):1--5, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. W. Visser, K. Havelund, G. Brat, S. Park, and F. Lerda. Model checking programs. Automated Software Engineering, 10(2):203--232, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Z. Yang, M. Yang, Y. Zhang, G. Gu, P. Ning, and X. S. Wang. Appintent: Analyzing sensitive data transmission in android for privacy leakage detection. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 1043--1054. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. Mapo: Mining and recommending api usage patterns. In ECOOP 2009--Object-Oriented Programming, pages 318--343. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H. Zhong, L. Zhang, T. Xie, and H. Mei. Inferring resource specifications from natural language api documentation. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, pages 307--318. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Automatic model generation from documentation for Java API functions

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICSE '16: Proceedings of the 38th International Conference on Software Engineering
      May 2016
      1235 pages
      ISBN:9781450339001
      DOI:10.1145/2884781

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 May 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate276of1,856submissions,15%

      Upcoming Conference

      ICSE 2025

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader