research-article

Automatic model generation from documentation for Java API functions

Authors:
Juan Zhai

Nanjing University

Nanjing University
View Profile

,
Jianjun Huang

Purdue University

Purdue University
View Profile

,
Shiqing Ma

Purdue University

Purdue University
View Profile

,
Xiangyu Zhang

Purdue University

Purdue University
View Profile

,
Lin Tan

University of Waterloo

University of Waterloo
View Profile

,
Jianhua Zhao

Nanjing University

Nanjing University
View Profile

,
Feng Qin

Ohio State University

Ohio State University
View Profile

ICSE '16: Proceedings of the 38th International Conference on Software EngineeringMay 2016Pages 380–391https://doi.org/10.1145/2884781.2884881

Published:14 May 2016Publication History

ICSE '16: Proceedings of the 38th International Conference on Software Engineering

Pages 380–391

ABSTRACT

Modern software systems are becoming increasingly complex, relying on a lot of third-party library support. Library behaviors are hence an integral part of software behaviors. Analyzing them is as important as analyzing the software itself. However, analyzing libraries is highly challenging due to the lack of source code, implementation in different languages, and complex optimizations. We observe that many Java library functions provide excellent documentation, which concisely describes the functionalities of the functions. We develop a novel technique that can construct models for Java API functions by analyzing the documentation. These models are simpler implementations in Java compared to the original ones and hence easier to analyze. More importantly, they provide the same functionalities as the original functions. Our technique successfully models 326 functions from 14 widely used Java classes. We also use these models in static taint analysis on Android apps and dynamic slicing for Java programs, demonstrating the effectiveness and efficiency of our models.

References

Dacapo. http://dacapobench.org/.Google Scholar
Funkyjfilter benchmark. https://github.com/olim7t/java-benchmarks/blob/master/src/main.java/FunkyJFilterBenchmark.java.Google Scholar
J2SE's javadoc. http://docs.oracle.com/javase/8/docs/api/.Google Scholar
Javaslicer. https://github.com/hammacher/javaslicer.Google Scholar
Jtreg. http://openjdk.java.net/jtreg/.Google Scholar
Listappend benchmark. https://github.com/olim7t/java-benchmarks/blob/master/src/main.java/ListAppendBenchmark.java.Google Scholar
Specjbb. https://www.spec.org/.Google Scholar
The stanford natural language processing group. http://nlp.stanford.edu/software/lex-parser.shtml, 1999.Google Scholar
H. Agrawal and J. R. Horgan. Dynamic program slicing. In ACM SIGPLAN Notices. ACM, 1990. Google ScholarDigital Library
W. A. Andrew and P. Jens. Modern compiler implementation in java, 2002.Google Scholar
C. Cadar, D. Dunbar, and D. R. Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, volume 8, pages 209--224, 2008. Google ScholarDigital Library
M. Ceccarello and O. Tkachuk. Automated generation of model classes for java pathfinder. ACM SIGSOFT Software Engineering Notes, 39(1):1--5, 2014. Google ScholarDigital Library
D. Cristian Cadar and D. Dunbar. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. OSDI, San Diego, CA, USA (December 2008), 2008. Google ScholarDigital Library
J. Henkel, C. Reichenbach, and A. Diwan. Discovering documentation for java container classes. Software Engineering, IEEE Transactions on, 33(8):526--543, 2007. Google ScholarDigital Library
D. Jurafsky and J. H. Martin. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Google ScholarDigital Library
D. Klein and C. D. Manning. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pages 423--430. Association for Computational Linguistics, 2003. Google ScholarDigital Library
C. D. Manning and H. Schiitze. Foundations of statistical natural language processing. Google ScholarDigital Library
A. C. Myers. Jflow: Practical mostly-static information flow control. In Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 228--241. ACM, 1999. Google ScholarDigital Library
J. Newsome and D. Song. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. 2005.Google Scholar
C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In Software Engineering, 2007. ICSE 2007. 29th International Conference on, pages 75--84. IEEE, 2007. Google ScholarDigital Library
V. K. Palepu, G. Xu, J. Jones, et al. Improving efficiency of dynamic analysis with dynamic dependence summaries. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on, pages 59--69. IEEE, 2013.Google ScholarDigital Library
R. Pandita, X. Xiao, H. Zhong, T. Xie, S. Oney, and A. Paradkar. Inferring method specifications from natural language api descriptions. In Proceedings of the 34th International Conference on Software Engineering, pages 815--825. IEEE Press, 2012. Google ScholarDigital Library
D. Qi, W. N. Sumner, F. Qin, M. Zheng, X. Zhang, and A. Roychoudhury. Modeling software execution environment. In Reverse Engineering (WCRE), 2012 19th Working Conference on, pages 415--424. IEEE, 2012. Google ScholarDigital Library
A. N. Rafferty and C. D. Manning. Parsing three german treebanks: Lexicalized and unlexicalized baselines. In Proceedings of the Workshop on Parsing German, pages 40--46. Association for Computational Linguistics, 2008. Google ScholarDigital Library
P. Runeson, M. Alexandersson, and O. Nyholm. Detection of duplicate defect reports using natural language processing. In Software Engineering, 2007. ICSE 2007. 29th International Conference on, pages 499--510. IEEE, 2007. Google ScholarDigital Library
A. Sinha, S. M. Sutton, and A. Paradkar. Text2test: Automated inspection of natural language use cases. In Software Testing, Verification and Validation (ICST), 2010 Third International Conference on, pages 155--164. IEEE, 2010. Google ScholarDigital Library
D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena. Bitblaze: A new approach to computer security via binary analysis. In Information systems security, pages 1--25. Springer, 2008. Google ScholarDigital Library
C. Sun, D. Lo, X. Wang, J. Jiang, and S.-C. Khoo. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, pages 45--54. ACM, 2010. Google ScholarDigital Library
L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /* icomment: Bugs or bad comments?*. In ACM SIGOPS Operating Systems Review, volume 41, pages 145--158. ACM, 2007. Google ScholarDigital Library
S. H. Tan, D. Marinov, L. Tan, and G. T. Leavens. @ tcomment: Testing javadoc comments to detect comment-code inconsistencies. In Software Testing, Verification and Validation (ICST), 2012 IEEE Fifth International Conference on, pages 260--269. IEEE, 2012. Google ScholarDigital Library
O. Tkachuk. Ocsegen: Open components and systems environment generator. In Proceedings of the 2nd ACM SIGPLAN International Workshop on State Of the Art in Java Program analysis, pages 9--12. ACM, 2013. Google ScholarDigital Library
O. Tkachuk, M. B. Dwyer, and C. S. Păsăreanu. Automated environment generation for software model checking. In Automated Software Engineering, 2003. Proceedings. 18th IEEE International Conference on, pages 116--127. IEEE, 2003.Google ScholarDigital Library
H. van der Merwe, O. Tkachuk, B. van der Merwe, and W. Visser. Generation of library models for verification of android applications. ACM SIGSOFT Software Engineering Notes, 40(1):1--5, 2015. Google ScholarDigital Library
W. Visser, K. Havelund, G. Brat, S. Park, and F. Lerda. Model checking programs. Automated Software Engineering, 10(2):203--232, 2003. Google ScholarDigital Library
Z. Yang, M. Yang, Y. Zhang, G. Gu, P. Ning, and X. S. Wang. Appintent: Analyzing sensitive data transmission in android for privacy leakage detection. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 1043--1054. ACM, 2013. Google ScholarDigital Library
H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. Mapo: Mining and recommending api usage patterns. In ECOOP 2009--Object-Oriented Programming, pages 318--343. Springer, 2009. Google ScholarDigital Library
H. Zhong, L. Zhang, T. Xie, and H. Mei. Inferring resource specifications from natural language api documentation. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, pages 307--318. IEEE Computer Society, 2009. Google ScholarDigital Library

Automatic model generation from documentation for Java API functions
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '16: Proceedings of the 38th International Conference on Software Engineering
May 2016
1235 pages
ISBN:9781450339001
DOI:10.1145/2884781
General Chair:
Laura Dillon
Michigan State University
,
Program Chairs:
Willem Visser
Stellenbosch University, South Africa
,
Laurie Williams
North Carolina State University
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 May 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 711
  Total Downloads
- Downloads (Last 12 months)39
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic model generation from documentation for Java API functions

ICSE '16: Proceedings of the 38th International Conference on Software Engineering

ABSTRACT

References

Cited By

Recommendations

Java Persistence API in EJB 3 for Professionals

Java Message Service API tutorial and reference: messaging for the J2EE platform

Automatic Summarization of API Artifacts from Informal Documentation