Abstract
Regression test selection (RTS) aims to speed up regression testing by rerunning only tests that are affected by code changes. RTS can be performed using static or dynamic analysis techniques. Our prior study showed that static and dynamic RTS perform similarly for medium-sized Java projects. However, the results of that prior study also showed that static RTS can be unsafe, missing to select tests that dynamic RTS selects, and that reflection was the only cause of unsafety observed among the evaluated projects.
In this paper, we investigate five techniques—three purely static techniques and two hybrid static-dynamic techniques—that aim to make static RTS safe with respect to reflection. We implement these reflection-aware (RA) techniques by extending the reflection-unaware (RU) class-level static RTS technique in a tool called STARTS. To evaluate these RA techniques, we compare their end-to-end times with RU, and with RetestAll, which reruns all tests after every code change. We also compare safety and precision of the RA techniques with Ekstazi, a state-of-the-art dynamic RTS technique; precision is a measure of unaffected tests selected.
Our evaluation on 1173 versions of 24 open-source Java projects shows negative results. The RA techniques improve the safety of RU but at very high costs. The purely static techniques are safe in our experiments but decrease the precision of RU, with end-to-end time at best 85.8% of RetestAll time, versus 69.1% for RU. One hybrid static-dynamic technique improves the safety of RU but at high cost, with end-to-end time that is 91.2% of RetestAll. The other hybrid static-dynamic technique provides better precision, is safer than RU, and incurs lower end-to-end time—75.8% of RetestAll, but it can still be unsafe in the presence of test-order dependencies. Our study highlights the challenges involved in making static RTS safe with respect to reflection.
Supplemental Material
- Apache Software Foundation. 2019a. Apache Camel. (2019). http://camel.apache.org/ .Google Scholar
- Apache Software Foundation. 2019b. Apache Commons Math. (2019). https://commons.apache.org/proper/commons-math/ .Google Scholar
- Apache Software Foundation. 2019c. Apache CXF. (2019). https://cxf.apache.org/ .Google Scholar
- Linda Badri, Mourad Badri, and Daniel St-Yves. 2005. Supporting predictive change impact analysis: A control call graph based technique. In APSEC. 167–175.Google Scholar
- Paulo Barros, René Just, Suzanne Millstein, Paul Vines, Werner Dietl, Marcelo d’Amorim, and Michael D. Ernst. 2015. Static analysis of implicit control flow: Resolving Java reflection and Android intents. In ASE. 669–679.Google Scholar
- Jonathan Bell and Gail Kaiser. 2014. Unit test virtualization with VMVM. In ICSE. 550–561.Google Scholar
- Jonathan Bell, Gail Kaiser, Eric Melski, and Mohan Dattatreya. 2015. Efficient dependency detection for safe Java test acceleration. In ESEC/FSE. 770–781.Google Scholar
- Eric Bodden, Andreas Sewe, Jan Sinschek, Hela Oueslati, and Mira Mezini. 2011. Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders. In ICSE. 241–250.Google Scholar
- Ahmet Çelik, Young Chul Lee, and Milos Gligoric. 2018. Regression test selection for TizenRT. In FSE Industry Track. 845–850.Google Scholar
- Ahmet Çelik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric. 2017. Regression test selection across JVM boundaries. In ESEC/FSE. 809–820.Google Scholar
- Lingchao Chen and Lingming Zhang. 2018. Speeding up mutation testing via regression test selection: An extensive study. In ICST. 58–69.Google Scholar
- Yih-Farn Chen, David S. Rosenblum, and Kiem-Phong Vo. 1994. TestTube: A system for selective regression testing. In ICSE. 211–220.Google Scholar
- Shigeru Chiba. 2000. Load-time structural reflection in Java. In ECOOP. 313–336.Google Scholar
- Aske Simon Christensen, Anders Møller, and Michael I. Schwartzbach. 2003. Precise analysis of string expressions. In SAS. 1–18.Google Scholar
- Nima Dini, Allison Sullivan, Milos Gligoric, and Gregg Rothermel. 2016. The effect of test suite type on regression test selection. In ISSRE. 47–58.Google Scholar
- Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for improving regression testing in continuous integration development environments. In FSE. 235–245.Google Scholar
- Hamed Esfahani, Jonas Fietz, Qi Ke, Alexei Kolomiets, Erica Lan, Erik Mavrinac, Wolfram Schulte, Newton Sanches, and Srikanth Kandula. 2016. CloudBuild: Microsoft’s distributed and caching build service. In ICSE SEIP. 11–20.Google Scholar
- Alessio Gambi, Jonathan Bell, and Andreas Zeller. 2018. Practical test dependency detection. In ICST. 1–11.Google Scholar
- Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. 2015a. Ekstazi: Lightweight test selection. In ICSE Demo. 713–716.Google Scholar
- Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. 2015b. Practical regression test selection with dynamic file dependencies. In ISSTA. 211–222.Google Scholar
- Neville Grech, George Kastrinis, and Yannis Smaragdakis. 2018. Efficient reflection string analysis via graph coloring. In ECOOP . 1–25.Google Scholar
- Michaela Greiler, Arie van Deursen, and Margaret-Anne Storey. 2013. Automated detection of test fixture strategies and smells. In ICST. 322–331.Google Scholar
- José de Oliveira Guimarães. 1998. Reflection for statically typed languages. In ECOOP. 440–461.Google Scholar
- Pooja Gupta, Mark Ivey, and John Penix. 2011. Testing at the speed and scale of Google. (Jun 2011). http://goo.gl/2B5cyl .Google Scholar
- Alex Gyori, Owolabi Legunsen, Farah Hariri, and Darko Marinov. 2018. Evaluating regression test selection opportunities in a very large open-source ecosystem. In ISSRE. 112–122.Google Scholar
- Alex Gyori, August Shi, Farah Hariri, and Darko Marinov. 2015. Reliable testing: Detecting state-polluting tests to prevent test dependency. In ISSTA. 223–233.Google Scholar
- Milica Hadzi-Tanovic. 2018. Reflection-aware static regression test selection. Master’s thesis. University of Illinois at Urbana-Champaign, USA.Google Scholar
- Mary Jean Harrold, James A. Jones, Tongyu Li, Donglin Liang, Alessandro Orso, Maikel Pennings, Saurabh Sinha, S. Alexander Spoon, and Ashish Gujarathi. 2001. Regression test selection for Java software. In OOPSLA. 312–326.Google Scholar
- Kim Herzig and Nachi Nagappan. 2015. Empirically detecting false test alarms using association rules. In ICSE. 39–48.Google Scholar
- Chen Huo and James Clause. 2014. Improving oracle quality by detecting brittle assertions and unused inputs in tests. In ISSTA . 621–631.Google Scholar
- Henrik Karlsson. 2019. Limiting transitive closure for static regression test selection approaches. Master’s thesis. KTH Royal Institute of Technology, Sweden.Google Scholar
- Christian Kirkegaard, Anders Moller, and Michael I. Schwartzbach. 2004. Static analysis of XML transformations in Java. TSE 30, 3 (2004), 181–192.Google ScholarDigital Library
- David Chenho Kung, Jerry Gao, Pei Hsia, Jeremy Lin, and Yasufumi Toyoshima. 1995. Class firewall, test order, and regression testing of object-oriented programs. JOOP 8, 2 (1995), 51–65.Google Scholar
- Wing Lam, Reed Oei, August Shi, Darko Marinov, and Tao Xie. 2019. iDFlakies: A framework for detecting and partially classifying flaky tests. In ICST. 312–322.Google Scholar
- Wing Lam, Sai Zhang, and Michael D. Ernst. 2015. When tests collide: Evaluating and coping with the impact of test dependence. Technical Report. University of Washington CSE Dept.Google Scholar
- Davy Landman, Alexander Serebrenik, and Jurgen J. Vinju. 2017. Challenges for static analysis of Java reflection: Literature review and empirical study. In ICSE. 507–518.Google Scholar
- Owolabi Legunsen, Farah Hariri, August Shi, Yafeng Lu, Lingming Zhang, and Darko Marinov. 2016. An extensive study of static regression test selection in modern software evolution. In FSE. 583–594.Google Scholar
- Owolabi Legunsen, Darko Marinov, and Grigore Roşu. 2015. Evolution-aware monitoring-oriented programming. In ICSE NIER . 615–618.Google Scholar
- Owolabi Legunsen, August Shi, and Darko Marinov. 2017. STARTS: STAtic Regression Test Selection. In ASE. 949–954.Google Scholar
- Owolabi Legunsen, Yi Zhang, Milica Hadzi-Tanovic, Grigore Roşu, and Darko Marinov. 2019. Techniques for evolution-aware runtime verification. In ICST. 300–311.Google Scholar
- Hareton K.N. Leung and Lee White. 1990. A study of integration testing and software regression at the integration level. In ICSM . 290–301.Google Scholar
- Ding Li, Yingjun Lyu, Mian Wan, and William G.J. Halfond. 2015a. String analysis for Java and Android applications. In ESEC/FSE . 661–672.Google Scholar
- Li Li, Tegawendé F Bissyandé, Damien Octeau, and Jacques Klein. 2016a. Droidra: Taming reflection to support wholeprogram analysis of Android apps. In ISSTA. 318–329.Google Scholar
- Li Li, Tegawendé F Bissyandé, Damien Octeau, and Jacques Klein. 2016b. Reflection-aware static analysis of Android apps. In ASE. 756–761.Google Scholar
- Yue Li, Tian Tan, Yulei Sui, and Jingling Xue. 2014. Self-inferencing reflection resolution for Java. In ECOOP. 27–53.Google Scholar
- Yue Li, Tian Tan, and Jingling Xue. 2015b. Effective soundness-guided reflection analysis. In SAS. 162–180.Google Scholar
- Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondrej Lhoták, José Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z Guyer, Uday P Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In defense of soundiness: A manifesto. CACM 58, 2 (2015), 44–46.Google ScholarDigital Library
- Erik Lundsten. 2019. EALRTS: A predictive regression test selection tool. Master’s thesis. KTH Royal Institute of Technology, Sweden.Google Scholar
- Mateusz Machalica, Alex Samylkin, Meredith Porth, and Satish Chandra. 2019. Predictive test selection. In ICSE SEIP. 91–100.Google Scholar
- Atif M. Memon, Zebao Gao, Bao N. Nguyen, Sanjeev Dhanda, Eric Nickell, Rob Siemborski, and John Micco. 2017. Taming Google-scale continuous testing. In ICSE-SEIP. 233–242.Google Scholar
- Jesper Öqvist, Görel Hedin, and Boris Magnusson. 2016. Extraction-based regression test selection. In PPPJ. 1–10.Google Scholar
- Oracle. 2018. jdeps. (2018). https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jdeps.html .Google Scholar
- Alessandro Orso, Nanjuan Shi, and Mary Jean Harrold. 2004. Scaling regression testing to large software systems. In FSE. 241–251.Google Scholar
- OW2 Consortium. 2018. ASM. (2018). http://asm.ow2.org/ .Google Scholar
- Fabio Palomba and Andy Zaidman. 2017. Does refactoring of test smells induce fixing flaky tests?. In ICSME. 1–12.Google Scholar
- Xiaoxia Ren, Fenil Shah, Frank Tip, Barbara G Ryder, and Ophelia Chesley. 2004. Chianti: A tool for change impact analysis of Java programs. In ACM Sigplan Notices, Vol. 39. 432–448.Google ScholarDigital Library
- Xiaoxia Ren, Fenil Shah, Frank Tip, Barbara G Ryder, Ophelia Chesley, and Julian Dolby. 2003. Chianti: A prototype change impact analysis tool for Java . Technical Report DCS-TR-533. Rutgers University CS Dept.Google Scholar
- Gregg Rothermel and Mary Jean Harrold. 1993. A safe, efficient algorithm for regression test selection. In ICSM. 358–367.Google Scholar
- Gregg Rothermel and Mary Jean Harrold. 1997. A safe, efficient regression test selection technique. TOSEM 6, 2 (1997), 173–210.Google ScholarDigital Library
- August Shi, Wing Lam, Reed Oei, Tao Xie, and Darko Marinov. 2019. iFixFlakies: A framework for automatically fixing order-dependent flaky tests. In ESEC/FSE. 545–555.Google Scholar
- August Shi, Tifany Yung, Alex Gyori, and Darko Marinov. 2015. Comparing and combining test-suite reduction and regression test selection. In ESEC/FSE. 237–247.Google Scholar
- Yannis Smaragdakis, George Balatsouras, George Kastrinis, and Martin Bravenboer. 2015. More sound static handling of Java reflection. In APLAS. 485–503.Google Scholar
- Davide Spadini, Fabio Palomba, Andy Zaidman, Magiel Bruntink, and Alberto Bacchelli. 2018. On the relation of test smells to software code quality. In ICSME. 1–12.Google Scholar
- Amitabh Srivastava and Jay Thiagarajan. 2002. Effectively prioritizing tests in development environment. In ISSTA. 97–106.Google Scholar
- STARTS Team. 2018. STARTS webpage. (2018). https://github.com/TestingResearchIllinois/starts .Google Scholar
- Andreas Thies and Eric Bodden. 2012. RefaFlex: Safer refactorings for reflective Java programs. In ISSTA. 1–11.Google Scholar
- Michele Tufano, Fabio Palomba, Gabriele Bavota, Rocco Oliveto, Massimiliano Di Penta, Andrea De Lucia, and Denys Poshyvanyk. 2015. When and why your code starts to smell bad. In ICSE. 403–414.Google Scholar
- Kaiyuan Wang, Chenguang Zhu, Ahmet Çelik, Jongwook Kim, Don Batory, and Milos Gligoric. 2018. Towards refactoringaware regression test selection. In ICSE. 233–244.Google Scholar
- Ugur Yilmaz. 2019. A method for selecting regression test cases based on software changes and software faults. Master’s thesis. Hacettepe University, Turkey.Google Scholar
- Shin Yoo and Mark Harman. 2012. Regression testing minimization, selection and prioritization: A survey. STVR 22, 2 (2012), 67–120.Google Scholar
- Nathan York. 2011. Tools for continuous integration at Google scale. (Jan 2011). https://goo.gl/Gqj7uL .Google Scholar
- Lingming Zhang. 2018. Hybrid regression test selection. In ICSE. 199–209.Google Scholar
- Lingming Zhang, Miryung Kim, and Sarfraz Khurshid. 2011. Localizing failure-inducing program edits based on spectrum information. In ICSM. 23–32.Google Scholar
- Sai Zhang, Darioush Jalali, Jochen Wuttke, Kivanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin. 2014. Empirically revisiting the test independence assumption. In ISSTA. 385–396.Google Scholar
- Chenguang Zhu, Owolabi Legunsen, August Shi, and Milos Gligoric. 2019. A framework for checking regression test selection tools. In ICSE. 430–441.Google Scholar
Index Terms
- Reflection-aware static regression test selection
Recommendations
An extensive study of static regression test selection in modern software evolution
FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software EngineeringRegression test selection (RTS) aims to reduce regression testing time by only re-running the tests affected by code changes. Prior research on RTS can be broadly split into dy namic and static techniques. A recently developed dynamic RTS technique ...
A safe, efficient regression test selection technique
Regression testing is an expensive but necessary maintenance activity performed on modified software to provide confidence that changes are correct and do not adversely affect other portions of the softwore. A regression test selection technique choses, ...
Empirical Studies of a Safe Regression Test Selection Technique
Regression testing is an expensive testing procedure utilized to validate modified software. Regression test selection techniques attempt to reduce the cost of regression testing by selecting a subset of a program's existing test suite. Safe regression ...
Comments