skip to main content
research-article

You, the Web, and Your Device: Longitudinal Characterization of Browsing Habits

Published:27 September 2018Publication History
Skip Abstract Section

Abstract

Understanding how people interact with the web is key for a variety of applications, e.g., from the design of effective web pages to the definition of successful online marketing campaigns. Browsing behavior has been traditionally represented and studied by means of clickstreams, i.e., graphs whose vertices are web pages, and edges are the paths followed by users. Obtaining large and representative data to extract clickstreams is, however, challenging.

The evolution of the web questions whether browsing behavior is changing and, by consequence, whether properties of clickstreams are changing. This article presents a longitudinal study of clickstreams from 2013 to 2016. We evaluate an anonymized dataset of HTTP traces captured in a large ISP, where thousands of households are connected. We first propose a methodology to identify actual URLs requested by users from the massive set of requests automatically fired by browsers when rendering web pages. Then, we characterize web usage patterns and clickstreams, taking into account both the temporal evolution and the impact of the device used to explore the web. Our analyses precisely quantify various aspects of clickstreams and uncover interesting patterns, such as the typical short paths followed by people while navigating the web, the fast increasing trend in browsing from mobile devices, and the different roles of search engines and social networks in promoting content.

Finally, we contribute a dataset of anonymized clickstreams to the community to foster new studies.<sup;>1</sup;>

References

  1. Eytan Adar, Jaime Teevan, and Susan T. Dumais. 2008. Large scale analysis of web revisitation patterns. In Proceedings of the 2008 SIGCHI Conference on Human Factors in Computing Systems. ACM, 1197--1260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Xiao Bai, B. Barla Cambazoglu, and Flavio P. Junqueira. 2011. Discovering URLs through user feedback. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, 77--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ignacio N. Bermudez, Marco Mellia, Maurizio M. Munafo, Ram Keralapura, and Antonio Nucci. 2012. DNS to the rescue: Discerning content and services in a tangled web. In Proceedings of the 2012 ACM SIGCOMM Internet Measurement Conference. ACM, 413--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrea Bianco, Gianluca Mardente, Marco Mellia, Maurizio Munafò, and Luca Muscariello. 2009. Web user-session inference by means of clustering techniques. IEEE/ACM Trans. Netw. 17, 2 (2009), 405--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Matthias Böhmer, Brent Hecht, Johannes Schöning, Antonio Krüger, and Gernot Bauer. 2011. Falling asleep with angry birds, facebook and kindle: A large scale study on mobile application usage. In Proceedings of the 13th International Conference on Human Computer Interaction with Mobile Devices and Services. ACM, 47--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Leo Breiman. 2001. Random forests. Mach. Learn. 45, 1 (2001), 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC press.Google ScholarGoogle Scholar
  8. Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph structure in the web. Comput. Netw. 33, 1 (2000), 309--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Randolph E. Bucklin and Catarina Sismeiro. 2009. Click here for internet insight: Advances in clickstream data analysis in marketing. J. Interact. Market. 23, 1 (2009), 35--48.Google ScholarGoogle ScholarCross RefCross Ref
  10. Michael Butkiewicz, Harsha V. Madhyastha, and Vyas Sekar. 2014. Characterizing web page complexity and its impact. IEEE/ACM Trans. Netw. 22, 3 (2014), 943--956. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lara D. Catledge and James E. Pitkow. 1995. Characterizing browsing strategies in the world-wide web. Elsevier Comput. Netw. ISDN Syst. 27, 6 (1995), 1065--1073. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nick Craswell and Martin Szummer. 2007. Random walks on the click graph. In Proceedings of the 30th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 239--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yanqing Cui and Virpi Roto. 2008. How people use the web on mobile devices. In Proceedings of the 17th International Conference on World Wide Web. ACM, 905--914. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sergio Duarte Torres, Ingmar Weber, and Djoerd Hiemstra. 2014. Analysis of search and browsing behavior of young users on the web. ACM Trans. Web 8, 2 (2014), 1--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Adrienne Porter Felt, Richard Barnes, April King, Chris Palmer, and Chris Bentzel. 2017. Measuring HTTPS adoption on the web. In Proceedings of the 26th USENIX Security Symposium. 1323--1338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alessandro Finamore, Marco Mellia, Michela Meo, Maurizio Munafo, and Dario Rossi. 2011. Experiences of internet traffic monitoring with tstat. IEEE Netw. 25, 3 (2011), 8--14.Google ScholarGoogle ScholarCross RefCross Ref
  17. Alessandro Finamore, Matteo Varvello, and Kostantina Papagiannaki. 2017. Mind the gap between HTTP and HTTPS in mobile networks. In Proceedings of the 2017 International Conference on Passive and Active Network Measurement. Springer, 217--228.Google ScholarGoogle ScholarCross RefCross Ref
  18. Max I. Fomitchev. 2010. How google analytics and conventional cookie tracking techniques overestimate unique visitors. In Proceedings of the 19th International Conference on World Wide Web. ACM, 1093--1094. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Vinicius Gehlen, Alessandro Finamore, Marco Mellia, and Maurizio M. Munafò. 2012. Uncovering the big players of the web. In Proceedings of the 2012 International Workshop on Traffic Monitoring and Analysis. Springer, 15--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Torsten J. Gerpott and Sandra Thomas. 2014. Empirical research on mobile Internet usage: A meta-analysis of the literature. Telecommun. Policy 38, 3 (2014), 291--310.Google ScholarGoogle ScholarCross RefCross Ref
  21. Simon Haykin. 1994. Neural Networks: A Comprehensive Foundation. Prentice Hall PTR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zied Ben Houidi, Giuseppe Scavo, Samir Ghamri-Doudane, Alessandro Finamore, Stefano Traverso, and Marco Mellia. 2014. Gold mining in a river of internet content traffic. In Proceedings of the 2014 International Workshop on Traffic Monitoring and Analysis. Springer, 91--103.Google ScholarGoogle Scholar
  23. Bernardo A. Huberman, Peter L. T. Pirolli, James E. Pitkow, and Rajan M. Lukose. 1998. Strong regularities in world wide web surfing. AAAS Sci. 280, 5360 (1998), 95--97.Google ScholarGoogle Scholar
  24. Sunghwan Ihm and Vivek S. Pai. 2011. Towards understanding modern web traffic. In Proceedings of the 2011 ACM SIGCOMM Internet Measurement Conference. ACM, 295--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 133--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Nils Kammenhuber, Julia Luxenburger, Anja Feldmann, and Gerhard Weikum. 2006. Web search clickstreams. In Proceedings of the 2006 ACM SIGCOMM Internet Measurement Conference. ACM, 245--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ravi Kumar and Andrew Tomkins. 2010. A characterization of online browsing behavior. In Proceedings of the 19th International Conference on World Wide Web. ACM, 561--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ida Mele. 2013. Web usage mining for enhancing search-result delivery and helping users to find interesting web content. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. ACM, 765--770. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2014. Graph structure in the web—revisited: A trick of the heavy tail. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 427--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tom Mitchell and McGraw Hill. 1997. Machine Learning. McGraw-Hill. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hartmut Obendorf, Harald Weinreich, Eelco Herder, and Matthias Mayer. 2007. Web page revisitation revisited: Implications of a long-term click-stream study of browser usage. In Proceedings of the 2007 SIGCHI Conference on Human Factors in Computing Systems. ACM, 597--606. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Daniel Olmedilla, Enrique Frías-Martínez, and Rubén Lara. 2010. Mobile web profiling: A study of off-portal surfing habits of mobile users. In Proceedings of the 18th International Conference on User Modeling, Adaptation, and Personalization. Springer-Verlag, 339--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Antti Oulasvirta, Tye Rattenbury, Lingyi Ma, and Eeva Raita. 2012. Habits make smartphone use more pervasive. Pers. Ubiq. Comput. 16, 1 (2012), 105--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ioannis Papapanagiotou, Erich M. Nahum, and Vasileios Pappas. 2012. Smartphones vs. laptops: Comparing web browsing behavior and the implications for caching. ACM SIGMETRICS Perf. Eval. Rev. 40, 1 (2012), 423--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Katy E. Pearce and Ronald E. Rice. 2013. Digital divides from access to activities: Comparing mobile and personal computer internet users. J. Commun. 63, 4 (2013), 721--744.Google ScholarGoogle ScholarCross RefCross Ref
  36. K. Sudheer Reddy, M. Kantha Reddy, and V. Sitaramulu. 2013. An effective data preprocessing method for web usage mining. In Proceedings of the 2013 International Conference on Information Communication and Embedded Systems. IEEE, 7--10.Google ScholarGoogle Scholar
  37. Y. Ren, M. Tomko, F. Salim, K. Ong, and M. Sanderson. 2017. Analyzing web behavior in indoor retail spaces. John Wiley and Sons Association for Information Science and Technology Journal 68, 1 (2017), 62--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Fabian Schneider, Anja Feldmann, Balachander Krishnamurthy, and Walter Willinger. 2009. Understanding online social network usage from a network perspective. In Proceedings of the 2009 ACM SIGCOMM Internet Measurement Conference. ACM, 35--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Abigail J. Sellen, Rachel Murphy, and Kate L. Shaw. 2002. How knowledge workers use the web. In Proceedings of the 2002 SIGCHI Conference on Human Factors in Computing Systems. ACM, 227--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yang Song, Hao Ma, Hongning Wang, and Kuansan Wang. 2013. Exploring and exploiting user search behavior on mobile and tablet devices to improve search relevance. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1201--1212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jaideep Srivastava, Robert Cooley, Mukund Deshpande, and Pang-Ning Tan. 2000. Web usage mining: Discovery and applications of usage patterns from web data. ACM SIGKDD Explor. Newslett. 1, 2 (2000), 12--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Mitali Srivastava, Rakhi Garg, and P. K. Mishra. 2015. Analysis of data extraction and data cleaning in web usage mining. In Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering 8 Technology. ACM, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Alexey Tikhonov, Liudmila Ostroumova Prokhorenkova, Arseniy Chelnokov, Ivan Bogatyy, and Gleb Gusev. 2015. What can be found on the web and how: A characterization of web browsing patterns. In Proceedings of the 2015 ACM Web Science Conference. ACM, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Chad Tossell, Philip Kortum, Ahmad Rahmati, Clayton Shepard, and Lin Zhong. 2012. Characterizing web use on smartphones. In Proceedings of the 2012 SIGCHI Conference on Human Factors in Computing Systems. ACM, 2769--2778. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Luca Vassio, Idilio Drago, and Marco Mellia. 2016. Detecting user actions from HTTP traces: Toward an automatic approach. In Proceedings of the 2016 International Wireless Communications and Mobile Computing Conference. IEEE, 50--55.Google ScholarGoogle ScholarCross RefCross Ref
  46. Gang Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng, and Ben Y. Zhao. 2013. You are how you click: Clickstream analysis for sybil detection. In Proceedings of the 22nd USENIX Security Symposium. USENIX Association, 241--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Gang Wang, Xinyi Zhang, Shiliang Tang, Haitao Zheng, and Ben Y. Zhao. 2016. Unsupervised clickstream clustering for user behavior analysis. In Proceedings of the 2016 SIGCHI Conference on Human Factors in Computing Systems. ACM, 225--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Harald Weinreich, Hartmut Obendorf, Eelco Herder, and Matthias Mayer. 2008. Not quite the average: An empirical study of web use. ACM Trans. Web 2, 1 (2008), 1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Guowu Xie, Marios Iliofotou, Thomas Karagiannis, Michalis Faloutsos, and Yaohui Jin. 2013. Resurf: Reconstructing web-surfing activity from network traffic. In Proceedings of the 2013 IFIP Networking Conference. 1--9.Google ScholarGoogle Scholar

Index Terms

  1. You, the Web, and Your Device: Longitudinal Characterization of Browsing Habits

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on the Web
        ACM Transactions on the Web  Volume 12, Issue 4
        November 2018
        215 pages
        ISSN:1559-1131
        EISSN:1559-114X
        DOI:10.1145/3281744
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 September 2018
        • Accepted: 1 June 2018
        • Revised: 1 April 2018
        • Received: 1 May 2017
        Published in tweb Volume 12, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader