ABSTRACT
Modern software systems are becoming increasingly complex, relying on a lot of third-party library support. Library behaviors are hence an integral part of software behaviors. Analyzing them is as important as analyzing the software itself. However, analyzing libraries is highly challenging due to the lack of source code, implementation in different languages, and complex optimizations. We observe that many Java library functions provide excellent documentation, which concisely describes the functionalities of the functions. We develop a novel technique that can construct models for Java API functions by analyzing the documentation. These models are simpler implementations in Java compared to the original ones and hence easier to analyze. More importantly, they provide the same functionalities as the original functions. Our technique successfully models 326 functions from 14 widely used Java classes. We also use these models in static taint analysis on Android apps and dynamic slicing for Java programs, demonstrating the effectiveness and efficiency of our models.
- Dacapo. http://dacapobench.org/.Google Scholar
- Funkyjfilter benchmark. https://github.com/olim7t/java-benchmarks/blob/master/src/main.java/FunkyJFilterBenchmark.java.Google Scholar
- J2SE's javadoc. http://docs.oracle.com/javase/8/docs/api/.Google Scholar
- Javaslicer. https://github.com/hammacher/javaslicer.Google Scholar
- Jtreg. http://openjdk.java.net/jtreg/.Google Scholar
- Listappend benchmark. https://github.com/olim7t/java-benchmarks/blob/master/src/main.java/ListAppendBenchmark.java.Google Scholar
- Specjbb. https://www.spec.org/.Google Scholar
- The stanford natural language processing group. http://nlp.stanford.edu/software/lex-parser.shtml, 1999.Google Scholar
- H. Agrawal and J. R. Horgan. Dynamic program slicing. In ACM SIGPLAN Notices. ACM, 1990. Google ScholarDigital Library
- W. A. Andrew and P. Jens. Modern compiler implementation in java, 2002.Google Scholar
- C. Cadar, D. Dunbar, and D. R. Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, volume 8, pages 209--224, 2008. Google ScholarDigital Library
- M. Ceccarello and O. Tkachuk. Automated generation of model classes for java pathfinder. ACM SIGSOFT Software Engineering Notes, 39(1):1--5, 2014. Google ScholarDigital Library
- D. Cristian Cadar and D. Dunbar. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. OSDI, San Diego, CA, USA (December 2008), 2008. Google ScholarDigital Library
- J. Henkel, C. Reichenbach, and A. Diwan. Discovering documentation for java container classes. Software Engineering, IEEE Transactions on, 33(8):526--543, 2007. Google ScholarDigital Library
- D. Jurafsky and J. H. Martin. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Google ScholarDigital Library
- D. Klein and C. D. Manning. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pages 423--430. Association for Computational Linguistics, 2003. Google ScholarDigital Library
- C. D. Manning and H. Schiitze. Foundations of statistical natural language processing. Google ScholarDigital Library
- A. C. Myers. Jflow: Practical mostly-static information flow control. In Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 228--241. ACM, 1999. Google ScholarDigital Library
- J. Newsome and D. Song. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. 2005.Google Scholar
- C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In Software Engineering, 2007. ICSE 2007. 29th International Conference on, pages 75--84. IEEE, 2007. Google ScholarDigital Library
- V. K. Palepu, G. Xu, J. Jones, et al. Improving efficiency of dynamic analysis with dynamic dependence summaries. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on, pages 59--69. IEEE, 2013.Google ScholarDigital Library
- R. Pandita, X. Xiao, H. Zhong, T. Xie, S. Oney, and A. Paradkar. Inferring method specifications from natural language api descriptions. In Proceedings of the 34th International Conference on Software Engineering, pages 815--825. IEEE Press, 2012. Google ScholarDigital Library
- D. Qi, W. N. Sumner, F. Qin, M. Zheng, X. Zhang, and A. Roychoudhury. Modeling software execution environment. In Reverse Engineering (WCRE), 2012 19th Working Conference on, pages 415--424. IEEE, 2012. Google ScholarDigital Library
- A. N. Rafferty and C. D. Manning. Parsing three german treebanks: Lexicalized and unlexicalized baselines. In Proceedings of the Workshop on Parsing German, pages 40--46. Association for Computational Linguistics, 2008. Google ScholarDigital Library
- P. Runeson, M. Alexandersson, and O. Nyholm. Detection of duplicate defect reports using natural language processing. In Software Engineering, 2007. ICSE 2007. 29th International Conference on, pages 499--510. IEEE, 2007. Google ScholarDigital Library
- A. Sinha, S. M. Sutton, and A. Paradkar. Text2test: Automated inspection of natural language use cases. In Software Testing, Verification and Validation (ICST), 2010 Third International Conference on, pages 155--164. IEEE, 2010. Google ScholarDigital Library
- D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena. Bitblaze: A new approach to computer security via binary analysis. In Information systems security, pages 1--25. Springer, 2008. Google ScholarDigital Library
- C. Sun, D. Lo, X. Wang, J. Jiang, and S.-C. Khoo. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, pages 45--54. ACM, 2010. Google ScholarDigital Library
- L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /* icomment: Bugs or bad comments?*. In ACM SIGOPS Operating Systems Review, volume 41, pages 145--158. ACM, 2007. Google ScholarDigital Library
- S. H. Tan, D. Marinov, L. Tan, and G. T. Leavens. @ tcomment: Testing javadoc comments to detect comment-code inconsistencies. In Software Testing, Verification and Validation (ICST), 2012 IEEE Fifth International Conference on, pages 260--269. IEEE, 2012. Google ScholarDigital Library
- O. Tkachuk. Ocsegen: Open components and systems environment generator. In Proceedings of the 2nd ACM SIGPLAN International Workshop on State Of the Art in Java Program analysis, pages 9--12. ACM, 2013. Google ScholarDigital Library
- O. Tkachuk, M. B. Dwyer, and C. S. Păsăreanu. Automated environment generation for software model checking. In Automated Software Engineering, 2003. Proceedings. 18th IEEE International Conference on, pages 116--127. IEEE, 2003.Google ScholarDigital Library
- H. van der Merwe, O. Tkachuk, B. van der Merwe, and W. Visser. Generation of library models for verification of android applications. ACM SIGSOFT Software Engineering Notes, 40(1):1--5, 2015. Google ScholarDigital Library
- W. Visser, K. Havelund, G. Brat, S. Park, and F. Lerda. Model checking programs. Automated Software Engineering, 10(2):203--232, 2003. Google ScholarDigital Library
- Z. Yang, M. Yang, Y. Zhang, G. Gu, P. Ning, and X. S. Wang. Appintent: Analyzing sensitive data transmission in android for privacy leakage detection. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 1043--1054. ACM, 2013. Google ScholarDigital Library
- H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. Mapo: Mining and recommending api usage patterns. In ECOOP 2009--Object-Oriented Programming, pages 318--343. Springer, 2009. Google ScholarDigital Library
- H. Zhong, L. Zhang, T. Xie, and H. Mei. Inferring resource specifications from natural language api documentation. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, pages 307--318. IEEE Computer Society, 2009. Google ScholarDigital Library
- Automatic model generation from documentation for Java API functions
Comments