Abstract
Understanding how people interact with the web is key for a variety of applications, e.g., from the design of effective web pages to the definition of successful online marketing campaigns. Browsing behavior has been traditionally represented and studied by means of clickstreams, i.e., graphs whose vertices are web pages, and edges are the paths followed by users. Obtaining large and representative data to extract clickstreams is, however, challenging.
The evolution of the web questions whether browsing behavior is changing and, by consequence, whether properties of clickstreams are changing. This article presents a longitudinal study of clickstreams from 2013 to 2016. We evaluate an anonymized dataset of HTTP traces captured in a large ISP, where thousands of households are connected. We first propose a methodology to identify actual URLs requested by users from the massive set of requests automatically fired by browsers when rendering web pages. Then, we characterize web usage patterns and clickstreams, taking into account both the temporal evolution and the impact of the device used to explore the web. Our analyses precisely quantify various aspects of clickstreams and uncover interesting patterns, such as the typical short paths followed by people while navigating the web, the fast increasing trend in browsing from mobile devices, and the different roles of search engines and social networks in promoting content.
Finally, we contribute a dataset of anonymized clickstreams to the community to foster new studies.<sup;>1</sup;>
- Eytan Adar, Jaime Teevan, and Susan T. Dumais. 2008. Large scale analysis of web revisitation patterns. In Proceedings of the 2008 SIGCHI Conference on Human Factors in Computing Systems. ACM, 1197--1260. Google ScholarDigital Library
- Xiao Bai, B. Barla Cambazoglu, and Flavio P. Junqueira. 2011. Discovering URLs through user feedback. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, 77--86. Google ScholarDigital Library
- Ignacio N. Bermudez, Marco Mellia, Maurizio M. Munafo, Ram Keralapura, and Antonio Nucci. 2012. DNS to the rescue: Discerning content and services in a tangled web. In Proceedings of the 2012 ACM SIGCOMM Internet Measurement Conference. ACM, 413--426. Google ScholarDigital Library
- Andrea Bianco, Gianluca Mardente, Marco Mellia, Maurizio Munafò, and Luca Muscariello. 2009. Web user-session inference by means of clustering techniques. IEEE/ACM Trans. Netw. 17, 2 (2009), 405--416. Google ScholarDigital Library
- Matthias Böhmer, Brent Hecht, Johannes Schöning, Antonio Krüger, and Gernot Bauer. 2011. Falling asleep with angry birds, facebook and kindle: A large scale study on mobile application usage. In Proceedings of the 13th International Conference on Human Computer Interaction with Mobile Devices and Services. ACM, 47--56. Google ScholarDigital Library
- Leo Breiman. 2001. Random forests. Mach. Learn. 45, 1 (2001), 5--32. Google ScholarDigital Library
- Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC press.Google Scholar
- Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph structure in the web. Comput. Netw. 33, 1 (2000), 309--320. Google ScholarDigital Library
- Randolph E. Bucklin and Catarina Sismeiro. 2009. Click here for internet insight: Advances in clickstream data analysis in marketing. J. Interact. Market. 23, 1 (2009), 35--48.Google ScholarCross Ref
- Michael Butkiewicz, Harsha V. Madhyastha, and Vyas Sekar. 2014. Characterizing web page complexity and its impact. IEEE/ACM Trans. Netw. 22, 3 (2014), 943--956. Google ScholarDigital Library
- Lara D. Catledge and James E. Pitkow. 1995. Characterizing browsing strategies in the world-wide web. Elsevier Comput. Netw. ISDN Syst. 27, 6 (1995), 1065--1073. Google ScholarDigital Library
- Nick Craswell and Martin Szummer. 2007. Random walks on the click graph. In Proceedings of the 30th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 239--246. Google ScholarDigital Library
- Yanqing Cui and Virpi Roto. 2008. How people use the web on mobile devices. In Proceedings of the 17th International Conference on World Wide Web. ACM, 905--914. Google ScholarDigital Library
- Sergio Duarte Torres, Ingmar Weber, and Djoerd Hiemstra. 2014. Analysis of search and browsing behavior of young users on the web. ACM Trans. Web 8, 2 (2014), 1--54. Google ScholarDigital Library
- Adrienne Porter Felt, Richard Barnes, April King, Chris Palmer, and Chris Bentzel. 2017. Measuring HTTPS adoption on the web. In Proceedings of the 26th USENIX Security Symposium. 1323--1338. Google ScholarDigital Library
- Alessandro Finamore, Marco Mellia, Michela Meo, Maurizio Munafo, and Dario Rossi. 2011. Experiences of internet traffic monitoring with tstat. IEEE Netw. 25, 3 (2011), 8--14.Google ScholarCross Ref
- Alessandro Finamore, Matteo Varvello, and Kostantina Papagiannaki. 2017. Mind the gap between HTTP and HTTPS in mobile networks. In Proceedings of the 2017 International Conference on Passive and Active Network Measurement. Springer, 217--228.Google ScholarCross Ref
- Max I. Fomitchev. 2010. How google analytics and conventional cookie tracking techniques overestimate unique visitors. In Proceedings of the 19th International Conference on World Wide Web. ACM, 1093--1094. Google ScholarDigital Library
- Vinicius Gehlen, Alessandro Finamore, Marco Mellia, and Maurizio M. Munafò. 2012. Uncovering the big players of the web. In Proceedings of the 2012 International Workshop on Traffic Monitoring and Analysis. Springer, 15--28. Google ScholarDigital Library
- Torsten J. Gerpott and Sandra Thomas. 2014. Empirical research on mobile Internet usage: A meta-analysis of the literature. Telecommun. Policy 38, 3 (2014), 291--310.Google ScholarCross Ref
- Simon Haykin. 1994. Neural Networks: A Comprehensive Foundation. Prentice Hall PTR. Google ScholarDigital Library
- Zied Ben Houidi, Giuseppe Scavo, Samir Ghamri-Doudane, Alessandro Finamore, Stefano Traverso, and Marco Mellia. 2014. Gold mining in a river of internet content traffic. In Proceedings of the 2014 International Workshop on Traffic Monitoring and Analysis. Springer, 91--103.Google Scholar
- Bernardo A. Huberman, Peter L. T. Pirolli, James E. Pitkow, and Rajan M. Lukose. 1998. Strong regularities in world wide web surfing. AAAS Sci. 280, 5360 (1998), 95--97.Google Scholar
- Sunghwan Ihm and Vivek S. Pai. 2011. Towards understanding modern web traffic. In Proceedings of the 2011 ACM SIGCOMM Internet Measurement Conference. ACM, 295--312. Google ScholarDigital Library
- Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 133--142. Google ScholarDigital Library
- Nils Kammenhuber, Julia Luxenburger, Anja Feldmann, and Gerhard Weikum. 2006. Web search clickstreams. In Proceedings of the 2006 ACM SIGCOMM Internet Measurement Conference. ACM, 245--250. Google ScholarDigital Library
- Ravi Kumar and Andrew Tomkins. 2010. A characterization of online browsing behavior. In Proceedings of the 19th International Conference on World Wide Web. ACM, 561--570. Google ScholarDigital Library
- Ida Mele. 2013. Web usage mining for enhancing search-result delivery and helping users to find interesting web content. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. ACM, 765--770. Google ScholarDigital Library
- Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2014. Graph structure in the web—revisited: A trick of the heavy tail. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 427--432. Google ScholarDigital Library
- Tom Mitchell and McGraw Hill. 1997. Machine Learning. McGraw-Hill. Google ScholarDigital Library
- Hartmut Obendorf, Harald Weinreich, Eelco Herder, and Matthias Mayer. 2007. Web page revisitation revisited: Implications of a long-term click-stream study of browser usage. In Proceedings of the 2007 SIGCHI Conference on Human Factors in Computing Systems. ACM, 597--606. Google ScholarDigital Library
- Daniel Olmedilla, Enrique Frías-Martínez, and Rubén Lara. 2010. Mobile web profiling: A study of off-portal surfing habits of mobile users. In Proceedings of the 18th International Conference on User Modeling, Adaptation, and Personalization. Springer-Verlag, 339--350. Google ScholarDigital Library
- Antti Oulasvirta, Tye Rattenbury, Lingyi Ma, and Eeva Raita. 2012. Habits make smartphone use more pervasive. Pers. Ubiq. Comput. 16, 1 (2012), 105--114. Google ScholarDigital Library
- Ioannis Papapanagiotou, Erich M. Nahum, and Vasileios Pappas. 2012. Smartphones vs. laptops: Comparing web browsing behavior and the implications for caching. ACM SIGMETRICS Perf. Eval. Rev. 40, 1 (2012), 423--424. Google ScholarDigital Library
- Katy E. Pearce and Ronald E. Rice. 2013. Digital divides from access to activities: Comparing mobile and personal computer internet users. J. Commun. 63, 4 (2013), 721--744.Google ScholarCross Ref
- K. Sudheer Reddy, M. Kantha Reddy, and V. Sitaramulu. 2013. An effective data preprocessing method for web usage mining. In Proceedings of the 2013 International Conference on Information Communication and Embedded Systems. IEEE, 7--10.Google Scholar
- Y. Ren, M. Tomko, F. Salim, K. Ong, and M. Sanderson. 2017. Analyzing web behavior in indoor retail spaces. John Wiley and Sons Association for Information Science and Technology Journal 68, 1 (2017), 62--76. Google ScholarDigital Library
- Fabian Schneider, Anja Feldmann, Balachander Krishnamurthy, and Walter Willinger. 2009. Understanding online social network usage from a network perspective. In Proceedings of the 2009 ACM SIGCOMM Internet Measurement Conference. ACM, 35--48. Google ScholarDigital Library
- Abigail J. Sellen, Rachel Murphy, and Kate L. Shaw. 2002. How knowledge workers use the web. In Proceedings of the 2002 SIGCHI Conference on Human Factors in Computing Systems. ACM, 227--234. Google ScholarDigital Library
- Yang Song, Hao Ma, Hongning Wang, and Kuansan Wang. 2013. Exploring and exploiting user search behavior on mobile and tablet devices to improve search relevance. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1201--1212. Google ScholarDigital Library
- Jaideep Srivastava, Robert Cooley, Mukund Deshpande, and Pang-Ning Tan. 2000. Web usage mining: Discovery and applications of usage patterns from web data. ACM SIGKDD Explor. Newslett. 1, 2 (2000), 12--23. Google ScholarDigital Library
- Mitali Srivastava, Rakhi Garg, and P. K. Mishra. 2015. Analysis of data extraction and data cleaning in web usage mining. In Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering 8 Technology. ACM, 1--6. Google ScholarDigital Library
- Alexey Tikhonov, Liudmila Ostroumova Prokhorenkova, Arseniy Chelnokov, Ivan Bogatyy, and Gleb Gusev. 2015. What can be found on the web and how: A characterization of web browsing patterns. In Proceedings of the 2015 ACM Web Science Conference. ACM, 1--10. Google ScholarDigital Library
- Chad Tossell, Philip Kortum, Ahmad Rahmati, Clayton Shepard, and Lin Zhong. 2012. Characterizing web use on smartphones. In Proceedings of the 2012 SIGCHI Conference on Human Factors in Computing Systems. ACM, 2769--2778. Google ScholarDigital Library
- Luca Vassio, Idilio Drago, and Marco Mellia. 2016. Detecting user actions from HTTP traces: Toward an automatic approach. In Proceedings of the 2016 International Wireless Communications and Mobile Computing Conference. IEEE, 50--55.Google ScholarCross Ref
- Gang Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng, and Ben Y. Zhao. 2013. You are how you click: Clickstream analysis for sybil detection. In Proceedings of the 22nd USENIX Security Symposium. USENIX Association, 241--256. Google ScholarDigital Library
- Gang Wang, Xinyi Zhang, Shiliang Tang, Haitao Zheng, and Ben Y. Zhao. 2016. Unsupervised clickstream clustering for user behavior analysis. In Proceedings of the 2016 SIGCHI Conference on Human Factors in Computing Systems. ACM, 225--236. Google ScholarDigital Library
- Harald Weinreich, Hartmut Obendorf, Eelco Herder, and Matthias Mayer. 2008. Not quite the average: An empirical study of web use. ACM Trans. Web 2, 1 (2008), 1--31. Google ScholarDigital Library
- Guowu Xie, Marios Iliofotou, Thomas Karagiannis, Michalis Faloutsos, and Yaohui Jin. 2013. Resurf: Reconstructing web-surfing activity from network traffic. In Proceedings of the 2013 IFIP Networking Conference. 1--9.Google Scholar
Index Terms
- You, the Web, and Your Device: Longitudinal Characterization of Browsing Habits
Recommendations
Experience: Analyzing Missing Web Page Visits and Unintentional Web Page Visits from the Client-side Web Logs
Web logs have been widely used to represent the web page visits of online users. However, we found that web logs in Chrome’s browsing history only record 57% of users’ visited websites, i.e., nearly half of a user’s website visits are not recorded. ...
Web search clickstreams
IMC '06: Proceedings of the 6th ACM SIGCOMM conference on Internet measurementSearch engines are a vital part of the Web and thus the Internet infrastructure. Therefore understanding the behavior of users searching the Web gives insights into trends, and enables enhancements of future search capabilities. Possible data sources ...
Measuring Web Speed From Passive Traces
ANRW '18: Proceedings of the Applied Networking Research WorkshopUnderstanding the quality of Experience (QoE) of web browsing is key to optimize services and keep users' loyalty. This is crucial for both Content Providers and Internet Service Providers (ISPs). Quality is subjective, and the complexity of today's ...
Comments