research-article

Automatic Classification of Non-Functional Requirements from Augmented App User Reviews

Authors:
Mengmeng Lu

State Key Lab of Software Engineering, School of Computer Science, Wuhan University, Luojiasha, Wuhan, China

State Key Lab of Software Engineering, School of Computer Science, Wuhan University, Luojiasha, Wuhan, China
View Profile

,
Peng Liang

State Key Lab of Software Engineering, School of Computer Science, Wuhan University, Luojiasha, Wuhan, China

State Key Lab of Software Engineering, School of Computer Science, Wuhan University, Luojiasha, Wuhan, China
View Profile

EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software EngineeringJune 2017Pages 344–353https://doi.org/10.1145/3084226.3084241

Published:15 June 2017Publication History

EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering

Pages 344–353

ABSTRACT

Context: The leading App distribution platforms, Apple App Store, Google Play, and Windows Phone Store, have over 4 million Apps. Research shows that user reviews contain abundant useful information which may help developers to improve their Apps. Extracting and considering Non-Functional Requirements (NFRs), which describe a set of quality attributes wanted for an App and are hidden in user reviews, can help developers to deliver a product which meets users' expectations. Objective: Developers need to be aware of the NFRs from massive user reviews during software maintenance and evolution. Automatic user reviews classification based on an NFR standard provides a feasible way to achieve this goal. Method: In this paper, user reviews were automatically classified into four types of NFRs (reliability, usability, portability, and performance), Functional Requirements (FRs), and Others. We combined four classification techniques BoW, TF-IDF, CHI2, and AUR-BoW (proposed in this work) with three machine learning algorithms Naive Bayes, J48, and Bagging to classify user reviews. We conducted experiments to compare the F-measures of the classification results through all the combinations of the techniques and algorithms. Results: We found that the combination of AUR-BoW with Bagging achieves the best result (a precision of 71.4%, a recall of 72.3%, and an F-measure of 71.8%) among all the combinations. Conclusion: Our finding shows that augmented user reviews can lead to better classification results, and the machine learning algorithm Bagging is more suitable for NFRs classification from user reviews than Naïve Bayes and J48.

References

W. Maalej and H. Nabil. 2015. Bug report feature request or simply praise? On automatically classifying app reviews. In Proceedings of the 23rd IEEE International Requirements Engineering Conference (RE'15). IEEE, 116--125.Google Scholar
D. Pagano and W. Maalej. 2013. User feedback in the appstore: an empirical study. In Proceedings of the 21st IEEE International Requirements Engineering Conference (RE'13). IEEE, 125--134.Google Scholar
C. Iacob and R. Harrison. 2013. Retrieving and analyzing mobile apps feature requests from online reviews. In Proceeding of the 10th IEEE Working Conference on Mining Software Repositories (MSR'13). IEEE, 41--44. Google ScholarDigital Library
R. Chandy and H. Gu. 2012. Identifying spam in the IOS app store. In Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality'12). ACM, 56--59. Google ScholarDigital Library
Y. Yang and J. P. Pedersen. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML'97). Morgan Kaufmann, 412--420. Google ScholarDigital Library
N. Chen, J. Lin, Steven C. H. Hoi, X. Xiao, and B. Zhang. 2014. AR-miner: mining informative reviews for developers from mobile app marketplace. In Proceedings of the 36th International Conference on Software Engineering (ICSE'14). ACM, 767--778. Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3, (2003), 993--1022. Google ScholarDigital Library
S. Di Panichella, A. Sorbo, E. Guzman, C. A. Visaggio, G. Canfora, and H. C. Gall. 2015. How can I improve my app? Classifying user reviews for software maintenance and evolution. In Proceedings of the 31st IEEE International Conference on Software Maintenance and Evolution (ICSME'15). IEEE, 281--290. Google ScholarDigital Library
X. Gu and S. Kim. What parts of your apps are loved by users? 2015. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE'15). IEEE, 760--770.Google Scholar
P. M. Vu, T. T. Nguyen, and H. V. Pham. 2015. Mining user opinions in mobile app reviews: a keyword-based approach. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE'15). IEEE, 749--759.Google Scholar
T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. In Workshop of 1st International Conference on Learning Representations (ICLR'13).Google Scholar
S. McIlroy, N. Ali, H. Khalid, and A. E. Hassan. 2016. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering 21, 3 (2016), 1067--1106. Google ScholarDigital Library
Y. Zhang, R. Jin, and Z. H. Zhou. 2010. Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics 1, 1--4 (2010), 43--52.Google ScholarCross Ref
P. Liang, P. Avgeriou, K. He, and L. Xu. 2010 From collective knowledge to intelligence: pre-requirements analysis of large and complex systems. In Proceedings of the 1st Workshop on Web 2.0 for Software Engineering (Web2SE'10), ACM, 26--30. Google ScholarDigital Library
G. Forman. 2003. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 3 (2003), 1289--1305. Google ScholarDigital Library
C. H. Li, J. C. Yang, and S. C. Park. 2012. Text categorization algorithms using semantic approaches corpus-based thesaurus and WordNet. Expert Systems with Applications 39, 1 (2012), 765--772. Google ScholarDigital Library
Y. Zhou, Y. Tong, R. Gu and H. Gall. 2014. Combining text mining and data mining for bug report classification? In Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution (ICSME'14). IEEE, 311--320. Google ScholarDigital Library
W. Maalej, M. Nayebi, T. Johann, and G. Ruhe. 2016. Toward data-driven requirements engineering. IEEE Software 33, 1 (2016), 48--54. Google ScholarDigital Library
C. Gao, H. Xu, J. Hu, and Y. Zhou. 2015. Ar-tracker: track the dynamics of mobile apps via user review mining. In Proceedings of the 10th IEEE Symposium on Service-Oriented System Engineering (SOSE'15). IEEE, 284--290. Google ScholarDigital Library
S. Xie, G. Wang, S. Lin, and P. S. Yu. 2012. Review spam detection via temporal pattern discovery. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12). ACM, 823--831. Google ScholarDigital Library
J. Oh, D. Kim, U. Lee, J. G. Lee, and J. Song. 2013. Facilitating developer-user interactions with mobile app review digests. In CHI'13 Extended Abstracts on Human Factors in Computing Systems (CHI'13). ACM, 1809--1814. Google ScholarDigital Library
A. Di Sorbo, S. Panichella, C. V. Alexandru, J. Shimagaki, C. A. Visaggio, G. Canfora, and H. Gall. 2016. What would users change in my app? summarizing app reviews for recommending software changes. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE'16). ACM, 499--510. Google ScholarDigital Library
S. Rastkar, G. C. Murphy, and G. Murray. 2014. Automatic summarization of bug reports. IEEE Transactions on Software Engineering 40, 4 (2014), 366--380. Google ScholarDigital Library
L. V. Galvis Carreno and K. Winbladh. 2013. Analysis of user comments: an approach for software requirements evolution. In Proceedings of the 35th International Conference on Software Engineering (ICSE'13). IEEE, 582--591. Google ScholarDigital Library
J. Cleland-Huang, R. Settimi, X. Zou, and P. Solc. 2007. Automated classification of non-functional requirements. Requirements Engineering 12, 2 (2007), 103--120. Google ScholarDigital Library
A. Mahmoud and W. Grant. 2016. Detecting classifying and tracing non-functional software requirements. Requirements Engineering 21, 3 (2016), 1--25. Google ScholarDigital Library
S. McIlroy, W. Shang, N. Ali, and A. Hassan. 2015. Is it worth responding to reviews? A case study of the top free apps in the Google Play store. IEEE Software. Google ScholarDigital Library
W. Martin, F. Sarro, Y. Jia, Y. Zhang, and M. Harman. 2016. A Survey of app store analysis for software engineering. IEEE Transactions on Software Engineering.Google Scholar
Y. Tian, M. Nagappan, D. Lo, and A. E. Hassan. 2015. What are the characteristics of high-rated apps? A case study on free Android applications. In Proceedings of the 31th IEEE International Conference on Software Maintenance and Evolution (ICSME'15). IEEE, 301--310. Google ScholarDigital Library
A. A. Al-Subaihin, F. Sarro, S. Black, L. Capra, M. Harman, Y. Jia, and Y. ZhangTavecchia. 2016. Clustering mobile apps based on mined textual features. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM'16). ACM, 1--38. Google ScholarDigital Library
Number of apps available in leading app stores as of June 2016, http://www.statista.com/statistics/276623/number-of-apps-available-inleading-app-stores/, accessed on 2016-07-01.Google Scholar
J. R. Quinlan. 1996. Bagging boosting and C4.5. In Proceedings of the 13th AAAI Conference on Artificial Intelligence (AAAI'96). AAAI Press, 725--730. Google ScholarDigital Library
F. Shull, J. Singer, and D. I. Sjøberg. 2008. Guide to advanced empirical software engineering. Springer-Verlag, London. Google ScholarDigital Library
W. Zhang, Y. Yang, Q. Wang, and F. Shu. 2015. An empirical study on classification of non-functional requirements. In Proceedings of the 23rd International Conference on Software Engineering and Knowledge Engineering (SEKE'15). Knowledge Systems Institute, 190--195.Google Scholar
ISO, ISO/IEC 25010, 2011. Systems and software engineering --- Systems and software Quality Requirements and Evaluation (SQuaRE) --- System and software quality models. In ISO/IEC FDIS 25010, 2011, 1--34.Google Scholar
L. Hoon, M. A. Rodriguez-García, R. Vasa, R. Valencia-García, and J. G. Schneider. 2016 App reviews: breaking the user and developer language barrier. In Trends and Applications in Software Engineering. Springer International Publishing, 223--233.Google Scholar
T. Dietterich. 1995. Overfitting and undercomputing in machine learning. ACM computing surveys 27, 3 (1995), 326--327. Google ScholarDigital Library
P. Liang and H. Yang. 2015. Identification and classification of requirements from app user reviews. In Proceedings of the 27th International Conference on Software Engineering and Knowledge Engineering (SEKE'15). Knowledge Systems Institute, 7--12.Google Scholar
L. Villarroel, G. Bavota, B. Russo, R. Oliveto, and M. Di Penta. 2016. Release planning of mobile apps based on user reviews. In Proceedings of the 38th International Conference on Software Engineering (ICSE'16). ACM, 14--24. Google ScholarDigital Library
G. B. Chen and H. Y. Kao. 2015. Word co-occurrence augmented topic model in short text. International Journal of Computational Linguistics and Chinese Language Processing 20, 2 (2015), 45--64.Google Scholar
Emitza Guzman, Omar Aly, and Bernd Bruegge. 2015. Retrieving diverse opinions from app reviews. In Proceedings of the 9th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM'15). ACM, 1--10.Google ScholarCross Ref
B. Wallace, K. Small, C. Brodley, and T. Trikalinos. 2011 Class imbalance, redux. In Proceedings of the 11th IEEE International Conference on Data Mining (ICDM'11). IEEE, 754--763. Google ScholarDigital Library

Index Terms

Automatic Classification of Non-Functional Requirements from Augmented App User Reviews
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Software and its engineering
  1. Software creation and management
    1. Designing software
      1. Requirements analysis

Recommendations

Can app changelogs improve requirements classification from app reviews?: an exploratory study
ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

[Background] Recent research on mining app reviews for software evolution indicated that the elicitation and analysis of user requirements can benefit from supplementing user reviews by data from other sources. However, only a few studies reported ...
Read More
ARdoc: app reviews development oriented classifier
FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

Google Play, Apple App Store and Windows Phone Store are well known distribution platforms where users can download mobile apps, rate them and write review comments about the apps they are using. Previous research studies demonstrated that these ...
Read More
Non-Functional Requirements Analysis Based on Application Reviews in the Android App Market

There are more than 3 million mobile apps in the Android market. The development process of every mobile application is rigorous, and many types of research on application quality requirements are derived, which are highly related to the development ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering
June 2017
405 pages
ISBN:9781450348041
DOI:10.1145/3084226
Conference Chair:
Emilia Mendes,
Program Chairs:
Steve Counsell,
Kai Petersen
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Automatic Classification
Non-Functional Requirements
Textual Semantics
User Reviews
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate71of232submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 97
  Total Citations
  View Citations
- 1,298
  Total Downloads
- Downloads (Last 12 months)140
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic Classification of Non-Functional Requirements from Augmented App User Reviews

EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Can app changelogs improve requirements classification from app reviews?: an exploratory study

ARdoc: app reviews development oriented classifier

Non-Functional Requirements Analysis Based on Application Reviews in the Android App Market

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatic Classification of Non-Functional Requirements from Augmented App User Reviews

EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Can app changelogs improve requirements classification from app reviews?: an exploratory study

ARdoc: app reviews development oriented classifier

Non-Functional Requirements Analysis Based on Application Reviews in the Android App Market

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media