skip to main content
10.1145/3084226.3084241acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
research-article

Automatic Classification of Non-Functional Requirements from Augmented App User Reviews

Authors Info & Claims
Published:15 June 2017Publication History

ABSTRACT

Context: The leading App distribution platforms, Apple App Store, Google Play, and Windows Phone Store, have over 4 million Apps. Research shows that user reviews contain abundant useful information which may help developers to improve their Apps. Extracting and considering Non-Functional Requirements (NFRs), which describe a set of quality attributes wanted for an App and are hidden in user reviews, can help developers to deliver a product which meets users' expectations. Objective: Developers need to be aware of the NFRs from massive user reviews during software maintenance and evolution. Automatic user reviews classification based on an NFR standard provides a feasible way to achieve this goal. Method: In this paper, user reviews were automatically classified into four types of NFRs (reliability, usability, portability, and performance), Functional Requirements (FRs), and Others. We combined four classification techniques BoW, TF-IDF, CHI2, and AUR-BoW (proposed in this work) with three machine learning algorithms Naive Bayes, J48, and Bagging to classify user reviews. We conducted experiments to compare the F-measures of the classification results through all the combinations of the techniques and algorithms. Results: We found that the combination of AUR-BoW with Bagging achieves the best result (a precision of 71.4%, a recall of 72.3%, and an F-measure of 71.8%) among all the combinations. Conclusion: Our finding shows that augmented user reviews can lead to better classification results, and the machine learning algorithm Bagging is more suitable for NFRs classification from user reviews than Naïve Bayes and J48.

References

  1. W. Maalej and H. Nabil. 2015. Bug report feature request or simply praise? On automatically classifying app reviews. In Proceedings of the 23rd IEEE International Requirements Engineering Conference (RE'15). IEEE, 116--125.Google ScholarGoogle Scholar
  2. D. Pagano and W. Maalej. 2013. User feedback in the appstore: an empirical study. In Proceedings of the 21st IEEE International Requirements Engineering Conference (RE'13). IEEE, 125--134.Google ScholarGoogle Scholar
  3. C. Iacob and R. Harrison. 2013. Retrieving and analyzing mobile apps feature requests from online reviews. In Proceeding of the 10th IEEE Working Conference on Mining Software Repositories (MSR'13). IEEE, 41--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Chandy and H. Gu. 2012. Identifying spam in the IOS app store. In Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality'12). ACM, 56--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Yang and J. P. Pedersen. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML'97). Morgan Kaufmann, 412--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Chen, J. Lin, Steven C. H. Hoi, X. Xiao, and B. Zhang. 2014. AR-miner: mining informative reviews for developers from mobile app marketplace. In Proceedings of the 36th International Conference on Software Engineering (ICSE'14). ACM, 767--778. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3, (2003), 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Di Panichella, A. Sorbo, E. Guzman, C. A. Visaggio, G. Canfora, and H. C. Gall. 2015. How can I improve my app? Classifying user reviews for software maintenance and evolution. In Proceedings of the 31st IEEE International Conference on Software Maintenance and Evolution (ICSME'15). IEEE, 281--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. Gu and S. Kim. What parts of your apps are loved by users? 2015. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE'15). IEEE, 760--770.Google ScholarGoogle Scholar
  10. P. M. Vu, T. T. Nguyen, and H. V. Pham. 2015. Mining user opinions in mobile app reviews: a keyword-based approach. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE'15). IEEE, 749--759.Google ScholarGoogle Scholar
  11. T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. In Workshop of 1st International Conference on Learning Representations (ICLR'13).Google ScholarGoogle Scholar
  12. S. McIlroy, N. Ali, H. Khalid, and A. E. Hassan. 2016. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering 21, 3 (2016), 1067--1106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Zhang, R. Jin, and Z. H. Zhou. 2010. Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics 1, 1--4 (2010), 43--52.Google ScholarGoogle ScholarCross RefCross Ref
  14. P. Liang, P. Avgeriou, K. He, and L. Xu. 2010 From collective knowledge to intelligence: pre-requirements analysis of large and complex systems. In Proceedings of the 1st Workshop on Web 2.0 for Software Engineering (Web2SE'10), ACM, 26--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Forman. 2003. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 3 (2003), 1289--1305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. H. Li, J. C. Yang, and S. C. Park. 2012. Text categorization algorithms using semantic approaches corpus-based thesaurus and WordNet. Expert Systems with Applications 39, 1 (2012), 765--772. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Zhou, Y. Tong, R. Gu and H. Gall. 2014. Combining text mining and data mining for bug report classification? In Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution (ICSME'14). IEEE, 311--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Maalej, M. Nayebi, T. Johann, and G. Ruhe. 2016. Toward data-driven requirements engineering. IEEE Software 33, 1 (2016), 48--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Gao, H. Xu, J. Hu, and Y. Zhou. 2015. Ar-tracker: track the dynamics of mobile apps via user review mining. In Proceedings of the 10th IEEE Symposium on Service-Oriented System Engineering (SOSE'15). IEEE, 284--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Xie, G. Wang, S. Lin, and P. S. Yu. 2012. Review spam detection via temporal pattern discovery. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12). ACM, 823--831. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Oh, D. Kim, U. Lee, J. G. Lee, and J. Song. 2013. Facilitating developer-user interactions with mobile app review digests. In CHI'13 Extended Abstracts on Human Factors in Computing Systems (CHI'13). ACM, 1809--1814. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Di Sorbo, S. Panichella, C. V. Alexandru, J. Shimagaki, C. A. Visaggio, G. Canfora, and H. Gall. 2016. What would users change in my app? summarizing app reviews for recommending software changes. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE'16). ACM, 499--510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Rastkar, G. C. Murphy, and G. Murray. 2014. Automatic summarization of bug reports. IEEE Transactions on Software Engineering 40, 4 (2014), 366--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. V. Galvis Carreno and K. Winbladh. 2013. Analysis of user comments: an approach for software requirements evolution. In Proceedings of the 35th International Conference on Software Engineering (ICSE'13). IEEE, 582--591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Cleland-Huang, R. Settimi, X. Zou, and P. Solc. 2007. Automated classification of non-functional requirements. Requirements Engineering 12, 2 (2007), 103--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Mahmoud and W. Grant. 2016. Detecting classifying and tracing non-functional software requirements. Requirements Engineering 21, 3 (2016), 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. McIlroy, W. Shang, N. Ali, and A. Hassan. 2015. Is it worth responding to reviews? A case study of the top free apps in the Google Play store. IEEE Software. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Martin, F. Sarro, Y. Jia, Y. Zhang, and M. Harman. 2016. A Survey of app store analysis for software engineering. IEEE Transactions on Software Engineering.Google ScholarGoogle Scholar
  29. Y. Tian, M. Nagappan, D. Lo, and A. E. Hassan. 2015. What are the characteristics of high-rated apps? A case study on free Android applications. In Proceedings of the 31th IEEE International Conference on Software Maintenance and Evolution (ICSME'15). IEEE, 301--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. A. Al-Subaihin, F. Sarro, S. Black, L. Capra, M. Harman, Y. Jia, and Y. ZhangTavecchia. 2016. Clustering mobile apps based on mined textual features. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM'16). ACM, 1--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Number of apps available in leading app stores as of June 2016, http://www.statista.com/statistics/276623/number-of-apps-available-inleading-app-stores/, accessed on 2016-07-01.Google ScholarGoogle Scholar
  32. J. R. Quinlan. 1996. Bagging boosting and C4.5. In Proceedings of the 13th AAAI Conference on Artificial Intelligence (AAAI'96). AAAI Press, 725--730. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. F. Shull, J. Singer, and D. I. Sjøberg. 2008. Guide to advanced empirical software engineering. Springer-Verlag, London. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. W. Zhang, Y. Yang, Q. Wang, and F. Shu. 2015. An empirical study on classification of non-functional requirements. In Proceedings of the 23rd International Conference on Software Engineering and Knowledge Engineering (SEKE'15). Knowledge Systems Institute, 190--195.Google ScholarGoogle Scholar
  35. ISO, ISO/IEC 25010, 2011. Systems and software engineering --- Systems and software Quality Requirements and Evaluation (SQuaRE) --- System and software quality models. In ISO/IEC FDIS 25010, 2011, 1--34.Google ScholarGoogle Scholar
  36. L. Hoon, M. A. Rodriguez-García, R. Vasa, R. Valencia-García, and J. G. Schneider. 2016 App reviews: breaking the user and developer language barrier. In Trends and Applications in Software Engineering. Springer International Publishing, 223--233.Google ScholarGoogle Scholar
  37. T. Dietterich. 1995. Overfitting and undercomputing in machine learning. ACM computing surveys 27, 3 (1995), 326--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. P. Liang and H. Yang. 2015. Identification and classification of requirements from app user reviews. In Proceedings of the 27th International Conference on Software Engineering and Knowledge Engineering (SEKE'15). Knowledge Systems Institute, 7--12.Google ScholarGoogle Scholar
  39. L. Villarroel, G. Bavota, B. Russo, R. Oliveto, and M. Di Penta. 2016. Release planning of mobile apps based on user reviews. In Proceedings of the 38th International Conference on Software Engineering (ICSE'16). ACM, 14--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G. B. Chen and H. Y. Kao. 2015. Word co-occurrence augmented topic model in short text. International Journal of Computational Linguistics and Chinese Language Processing 20, 2 (2015), 45--64.Google ScholarGoogle Scholar
  41. Emitza Guzman, Omar Aly, and Bernd Bruegge. 2015. Retrieving diverse opinions from app reviews. In Proceedings of the 9th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM'15). ACM, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  42. B. Wallace, K. Small, C. Brodley, and T. Trikalinos. 2011 Class imbalance, redux. In Proceedings of the 11th IEEE International Conference on Data Mining (ICDM'11). IEEE, 754--763. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic Classification of Non-Functional Requirements from Augmented App User Reviews

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering
        June 2017
        405 pages
        ISBN:9781450348041
        DOI:10.1145/3084226

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 June 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate71of232submissions,31%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader