skip to main content
research-article
Open Access

Reflection-aware static regression test selection

Published:10 October 2019Publication History
Skip Abstract Section

Abstract

Regression test selection (RTS) aims to speed up regression testing by rerunning only tests that are affected by code changes. RTS can be performed using static or dynamic analysis techniques. Our prior study showed that static and dynamic RTS perform similarly for medium-sized Java projects. However, the results of that prior study also showed that static RTS can be unsafe, missing to select tests that dynamic RTS selects, and that reflection was the only cause of unsafety observed among the evaluated projects.

In this paper, we investigate five techniques—three purely static techniques and two hybrid static-dynamic techniques—that aim to make static RTS safe with respect to reflection. We implement these reflection-aware (RA) techniques by extending the reflection-unaware (RU) class-level static RTS technique in a tool called STARTS. To evaluate these RA techniques, we compare their end-to-end times with RU, and with RetestAll, which reruns all tests after every code change. We also compare safety and precision of the RA techniques with Ekstazi, a state-of-the-art dynamic RTS technique; precision is a measure of unaffected tests selected.

Our evaluation on 1173 versions of 24 open-source Java projects shows negative results. The RA techniques improve the safety of RU but at very high costs. The purely static techniques are safe in our experiments but decrease the precision of RU, with end-to-end time at best 85.8% of RetestAll time, versus 69.1% for RU. One hybrid static-dynamic technique improves the safety of RU but at high cost, with end-to-end time that is 91.2% of RetestAll. The other hybrid static-dynamic technique provides better precision, is safer than RU, and incurs lower end-to-end time—75.8% of RetestAll, but it can still be unsafe in the presence of test-order dependencies. Our study highlights the challenges involved in making static RTS safe with respect to reflection.

Skip Supplemental Material Section

Supplemental Material

a187-shi.webm

webm

100.7 MB

References

  1. Apache Software Foundation. 2019a. Apache Camel. (2019). http://camel.apache.org/ .Google ScholarGoogle Scholar
  2. Apache Software Foundation. 2019b. Apache Commons Math. (2019). https://commons.apache.org/proper/commons-math/ .Google ScholarGoogle Scholar
  3. Apache Software Foundation. 2019c. Apache CXF. (2019). https://cxf.apache.org/ .Google ScholarGoogle Scholar
  4. Linda Badri, Mourad Badri, and Daniel St-Yves. 2005. Supporting predictive change impact analysis: A control call graph based technique. In APSEC. 167–175.Google ScholarGoogle Scholar
  5. Paulo Barros, René Just, Suzanne Millstein, Paul Vines, Werner Dietl, Marcelo d’Amorim, and Michael D. Ernst. 2015. Static analysis of implicit control flow: Resolving Java reflection and Android intents. In ASE. 669–679.Google ScholarGoogle Scholar
  6. Jonathan Bell and Gail Kaiser. 2014. Unit test virtualization with VMVM. In ICSE. 550–561.Google ScholarGoogle Scholar
  7. Jonathan Bell, Gail Kaiser, Eric Melski, and Mohan Dattatreya. 2015. Efficient dependency detection for safe Java test acceleration. In ESEC/FSE. 770–781.Google ScholarGoogle Scholar
  8. Eric Bodden, Andreas Sewe, Jan Sinschek, Hela Oueslati, and Mira Mezini. 2011. Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders. In ICSE. 241–250.Google ScholarGoogle Scholar
  9. Ahmet Çelik, Young Chul Lee, and Milos Gligoric. 2018. Regression test selection for TizenRT. In FSE Industry Track. 845–850.Google ScholarGoogle Scholar
  10. Ahmet Çelik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric. 2017. Regression test selection across JVM boundaries. In ESEC/FSE. 809–820.Google ScholarGoogle Scholar
  11. Lingchao Chen and Lingming Zhang. 2018. Speeding up mutation testing via regression test selection: An extensive study. In ICST. 58–69.Google ScholarGoogle Scholar
  12. Yih-Farn Chen, David S. Rosenblum, and Kiem-Phong Vo. 1994. TestTube: A system for selective regression testing. In ICSE. 211–220.Google ScholarGoogle Scholar
  13. Shigeru Chiba. 2000. Load-time structural reflection in Java. In ECOOP. 313–336.Google ScholarGoogle Scholar
  14. Aske Simon Christensen, Anders Møller, and Michael I. Schwartzbach. 2003. Precise analysis of string expressions. In SAS. 1–18.Google ScholarGoogle Scholar
  15. Nima Dini, Allison Sullivan, Milos Gligoric, and Gregg Rothermel. 2016. The effect of test suite type on regression test selection. In ISSRE. 47–58.Google ScholarGoogle Scholar
  16. Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for improving regression testing in continuous integration development environments. In FSE. 235–245.Google ScholarGoogle Scholar
  17. Hamed Esfahani, Jonas Fietz, Qi Ke, Alexei Kolomiets, Erica Lan, Erik Mavrinac, Wolfram Schulte, Newton Sanches, and Srikanth Kandula. 2016. CloudBuild: Microsoft’s distributed and caching build service. In ICSE SEIP. 11–20.Google ScholarGoogle Scholar
  18. Alessio Gambi, Jonathan Bell, and Andreas Zeller. 2018. Practical test dependency detection. In ICST. 1–11.Google ScholarGoogle Scholar
  19. Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. 2015a. Ekstazi: Lightweight test selection. In ICSE Demo. 713–716.Google ScholarGoogle Scholar
  20. Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. 2015b. Practical regression test selection with dynamic file dependencies. In ISSTA. 211–222.Google ScholarGoogle Scholar
  21. Neville Grech, George Kastrinis, and Yannis Smaragdakis. 2018. Efficient reflection string analysis via graph coloring. In ECOOP . 1–25.Google ScholarGoogle Scholar
  22. Michaela Greiler, Arie van Deursen, and Margaret-Anne Storey. 2013. Automated detection of test fixture strategies and smells. In ICST. 322–331.Google ScholarGoogle Scholar
  23. José de Oliveira Guimarães. 1998. Reflection for statically typed languages. In ECOOP. 440–461.Google ScholarGoogle Scholar
  24. Pooja Gupta, Mark Ivey, and John Penix. 2011. Testing at the speed and scale of Google. (Jun 2011). http://goo.gl/2B5cyl .Google ScholarGoogle Scholar
  25. Alex Gyori, Owolabi Legunsen, Farah Hariri, and Darko Marinov. 2018. Evaluating regression test selection opportunities in a very large open-source ecosystem. In ISSRE. 112–122.Google ScholarGoogle Scholar
  26. Alex Gyori, August Shi, Farah Hariri, and Darko Marinov. 2015. Reliable testing: Detecting state-polluting tests to prevent test dependency. In ISSTA. 223–233.Google ScholarGoogle Scholar
  27. Milica Hadzi-Tanovic. 2018. Reflection-aware static regression test selection. Master’s thesis. University of Illinois at Urbana-Champaign, USA.Google ScholarGoogle Scholar
  28. Mary Jean Harrold, James A. Jones, Tongyu Li, Donglin Liang, Alessandro Orso, Maikel Pennings, Saurabh Sinha, S. Alexander Spoon, and Ashish Gujarathi. 2001. Regression test selection for Java software. In OOPSLA. 312–326.Google ScholarGoogle Scholar
  29. Kim Herzig and Nachi Nagappan. 2015. Empirically detecting false test alarms using association rules. In ICSE. 39–48.Google ScholarGoogle Scholar
  30. Chen Huo and James Clause. 2014. Improving oracle quality by detecting brittle assertions and unused inputs in tests. In ISSTA . 621–631.Google ScholarGoogle Scholar
  31. Henrik Karlsson. 2019. Limiting transitive closure for static regression test selection approaches. Master’s thesis. KTH Royal Institute of Technology, Sweden.Google ScholarGoogle Scholar
  32. Christian Kirkegaard, Anders Moller, and Michael I. Schwartzbach. 2004. Static analysis of XML transformations in Java. TSE 30, 3 (2004), 181–192.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. David Chenho Kung, Jerry Gao, Pei Hsia, Jeremy Lin, and Yasufumi Toyoshima. 1995. Class firewall, test order, and regression testing of object-oriented programs. JOOP 8, 2 (1995), 51–65.Google ScholarGoogle Scholar
  34. Wing Lam, Reed Oei, August Shi, Darko Marinov, and Tao Xie. 2019. iDFlakies: A framework for detecting and partially classifying flaky tests. In ICST. 312–322.Google ScholarGoogle Scholar
  35. Wing Lam, Sai Zhang, and Michael D. Ernst. 2015. When tests collide: Evaluating and coping with the impact of test dependence. Technical Report. University of Washington CSE Dept.Google ScholarGoogle Scholar
  36. Davy Landman, Alexander Serebrenik, and Jurgen J. Vinju. 2017. Challenges for static analysis of Java reflection: Literature review and empirical study. In ICSE. 507–518.Google ScholarGoogle Scholar
  37. Owolabi Legunsen, Farah Hariri, August Shi, Yafeng Lu, Lingming Zhang, and Darko Marinov. 2016. An extensive study of static regression test selection in modern software evolution. In FSE. 583–594.Google ScholarGoogle Scholar
  38. Owolabi Legunsen, Darko Marinov, and Grigore Roşu. 2015. Evolution-aware monitoring-oriented programming. In ICSE NIER . 615–618.Google ScholarGoogle Scholar
  39. Owolabi Legunsen, August Shi, and Darko Marinov. 2017. STARTS: STAtic Regression Test Selection. In ASE. 949–954.Google ScholarGoogle Scholar
  40. Owolabi Legunsen, Yi Zhang, Milica Hadzi-Tanovic, Grigore Roşu, and Darko Marinov. 2019. Techniques for evolution-aware runtime verification. In ICST. 300–311.Google ScholarGoogle Scholar
  41. Hareton K.N. Leung and Lee White. 1990. A study of integration testing and software regression at the integration level. In ICSM . 290–301.Google ScholarGoogle Scholar
  42. Ding Li, Yingjun Lyu, Mian Wan, and William G.J. Halfond. 2015a. String analysis for Java and Android applications. In ESEC/FSE . 661–672.Google ScholarGoogle Scholar
  43. Li Li, Tegawendé F Bissyandé, Damien Octeau, and Jacques Klein. 2016a. Droidra: Taming reflection to support wholeprogram analysis of Android apps. In ISSTA. 318–329.Google ScholarGoogle Scholar
  44. Li Li, Tegawendé F Bissyandé, Damien Octeau, and Jacques Klein. 2016b. Reflection-aware static analysis of Android apps. In ASE. 756–761.Google ScholarGoogle Scholar
  45. Yue Li, Tian Tan, Yulei Sui, and Jingling Xue. 2014. Self-inferencing reflection resolution for Java. In ECOOP. 27–53.Google ScholarGoogle Scholar
  46. Yue Li, Tian Tan, and Jingling Xue. 2015b. Effective soundness-guided reflection analysis. In SAS. 162–180.Google ScholarGoogle Scholar
  47. Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondrej Lhoták, José Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z Guyer, Uday P Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In defense of soundiness: A manifesto. CACM 58, 2 (2015), 44–46.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Erik Lundsten. 2019. EALRTS: A predictive regression test selection tool. Master’s thesis. KTH Royal Institute of Technology, Sweden.Google ScholarGoogle Scholar
  49. Mateusz Machalica, Alex Samylkin, Meredith Porth, and Satish Chandra. 2019. Predictive test selection. In ICSE SEIP. 91–100.Google ScholarGoogle Scholar
  50. Atif M. Memon, Zebao Gao, Bao N. Nguyen, Sanjeev Dhanda, Eric Nickell, Rob Siemborski, and John Micco. 2017. Taming Google-scale continuous testing. In ICSE-SEIP. 233–242.Google ScholarGoogle Scholar
  51. Jesper Öqvist, Görel Hedin, and Boris Magnusson. 2016. Extraction-based regression test selection. In PPPJ. 1–10.Google ScholarGoogle Scholar
  52. Oracle. 2018. jdeps. (2018). https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jdeps.html .Google ScholarGoogle Scholar
  53. Alessandro Orso, Nanjuan Shi, and Mary Jean Harrold. 2004. Scaling regression testing to large software systems. In FSE. 241–251.Google ScholarGoogle Scholar
  54. OW2 Consortium. 2018. ASM. (2018). http://asm.ow2.org/ .Google ScholarGoogle Scholar
  55. Fabio Palomba and Andy Zaidman. 2017. Does refactoring of test smells induce fixing flaky tests?. In ICSME. 1–12.Google ScholarGoogle Scholar
  56. Xiaoxia Ren, Fenil Shah, Frank Tip, Barbara G Ryder, and Ophelia Chesley. 2004. Chianti: A tool for change impact analysis of Java programs. In ACM Sigplan Notices, Vol. 39. 432–448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Xiaoxia Ren, Fenil Shah, Frank Tip, Barbara G Ryder, Ophelia Chesley, and Julian Dolby. 2003. Chianti: A prototype change impact analysis tool for Java . Technical Report DCS-TR-533. Rutgers University CS Dept.Google ScholarGoogle Scholar
  58. Gregg Rothermel and Mary Jean Harrold. 1993. A safe, efficient algorithm for regression test selection. In ICSM. 358–367.Google ScholarGoogle Scholar
  59. Gregg Rothermel and Mary Jean Harrold. 1997. A safe, efficient regression test selection technique. TOSEM 6, 2 (1997), 173–210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. August Shi, Wing Lam, Reed Oei, Tao Xie, and Darko Marinov. 2019. iFixFlakies: A framework for automatically fixing order-dependent flaky tests. In ESEC/FSE. 545–555.Google ScholarGoogle Scholar
  61. August Shi, Tifany Yung, Alex Gyori, and Darko Marinov. 2015. Comparing and combining test-suite reduction and regression test selection. In ESEC/FSE. 237–247.Google ScholarGoogle Scholar
  62. Yannis Smaragdakis, George Balatsouras, George Kastrinis, and Martin Bravenboer. 2015. More sound static handling of Java reflection. In APLAS. 485–503.Google ScholarGoogle Scholar
  63. Davide Spadini, Fabio Palomba, Andy Zaidman, Magiel Bruntink, and Alberto Bacchelli. 2018. On the relation of test smells to software code quality. In ICSME. 1–12.Google ScholarGoogle Scholar
  64. Amitabh Srivastava and Jay Thiagarajan. 2002. Effectively prioritizing tests in development environment. In ISSTA. 97–106.Google ScholarGoogle Scholar
  65. STARTS Team. 2018. STARTS webpage. (2018). https://github.com/TestingResearchIllinois/starts .Google ScholarGoogle Scholar
  66. Andreas Thies and Eric Bodden. 2012. RefaFlex: Safer refactorings for reflective Java programs. In ISSTA. 1–11.Google ScholarGoogle Scholar
  67. Michele Tufano, Fabio Palomba, Gabriele Bavota, Rocco Oliveto, Massimiliano Di Penta, Andrea De Lucia, and Denys Poshyvanyk. 2015. When and why your code starts to smell bad. In ICSE. 403–414.Google ScholarGoogle Scholar
  68. Kaiyuan Wang, Chenguang Zhu, Ahmet Çelik, Jongwook Kim, Don Batory, and Milos Gligoric. 2018. Towards refactoringaware regression test selection. In ICSE. 233–244.Google ScholarGoogle Scholar
  69. Ugur Yilmaz. 2019. A method for selecting regression test cases based on software changes and software faults. Master’s thesis. Hacettepe University, Turkey.Google ScholarGoogle Scholar
  70. Shin Yoo and Mark Harman. 2012. Regression testing minimization, selection and prioritization: A survey. STVR 22, 2 (2012), 67–120.Google ScholarGoogle Scholar
  71. Nathan York. 2011. Tools for continuous integration at Google scale. (Jan 2011). https://goo.gl/Gqj7uL .Google ScholarGoogle Scholar
  72. Lingming Zhang. 2018. Hybrid regression test selection. In ICSE. 199–209.Google ScholarGoogle Scholar
  73. Lingming Zhang, Miryung Kim, and Sarfraz Khurshid. 2011. Localizing failure-inducing program edits based on spectrum information. In ICSM. 23–32.Google ScholarGoogle Scholar
  74. Sai Zhang, Darioush Jalali, Jochen Wuttke, Kivanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin. 2014. Empirically revisiting the test independence assumption. In ISSTA. 385–396.Google ScholarGoogle Scholar
  75. Chenguang Zhu, Owolabi Legunsen, August Shi, and Milos Gligoric. 2019. A framework for checking regression test selection tools. In ICSE. 430–441.Google ScholarGoogle Scholar

Index Terms

  1. Reflection-aware static regression test selection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader