skip to main content
Skip header Section
Mining of Massive DatasetsDecember 2014
Publisher:
  • Cambridge University Press
  • 40 W. 20 St. New York, NY
  • United States
ISBN:978-1-107-07723-2
Published:29 December 2014
Pages:
476
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

Written by leading authorities in database and Web technologies, this book is essential reading for students and practitioners alike. The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and can be applied successfully to even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. Other chapters cover the PageRank idea and related tricks for organizing the Web, the problems of finding frequent itemsets and clustering. This second edition includes new and extended coverage on social networks, machine learning and dimensionality reduction.

Cited By

  1. ACM
    Koa K, Ma Y, Ng R and Chua T Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models Proceedings of the ACM on Web Conference 2024, (4304-4315)
  2. Chuang Y and Jhang J (2024). Trustworthy retrieval system in mobile P2P wireless network, Ad Hoc Networks, 154:C, Online publication date: 1-Mar-2024.
  3. ACM
    Roy C, Nourani M, Arya S, Shanbhag M, Rahman T, Ragan E, Ruozzi N and Gogate V (2023). Explainable Activity Recognition in Videos using Deep Learning and Tractable Probabilistic Models, ACM Transactions on Interactive Intelligent Systems, 13:4, (1-32), Online publication date: 31-Dec-2024.
  4. ACM
    Fan Y, Wang C, Feng F, Cui H, Wu Y and Li Y Learning What to Ask: Mining Product Attributes for E-commerce Sales from Massive Dialogue Corpora Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, (5031-5035)
  5. Pellizzoni P, Pietracaprina A and Pucci G Fully Dynamic Clustering and Diversity Maximization in Doubling Metrics Algorithms and Data Structures, (620-636)
  6. ACM
    Fakas G and Kalamatianos G (2023). Proportionality on Spatial Data with Context, ACM Transactions on Database Systems, 48:2, (1-37), Online publication date: 30-Jun-2023.
  7. ACM
    Cormode G Applications of Sketching and Pathways to Impact Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, (5-10)
  8. ACM
    Peng Z, Wang Z and Deng D (2023). Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation, Proceedings of the ACM on Management of Data, 1:2, (1-18), Online publication date: 13-Jun-2023.
  9. Khatamifard S, Chowdhury Z, Pande N, Razaviyayn M, Kim C and Karpuzcu U (2021). GeNVoM: Read Mapping Near Non-Volatile Memory, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19:6, (3482-3496), Online publication date: 1-Nov-2022.
  10. ACM
    Chen X, Jiang J and Wang W Scalable Graph Representation Learning via Locality-Sensitive Hashing Proceedings of the 31st ACM International Conference on Information & Knowledge Management, (3878-3882)
  11. ACM
    Nanayakkara C and Christen P Locality Sensitive Hashing with Temporal and Spatial Constraints for Efficient Population Record Linkage Proceedings of the 31st ACM International Conference on Information & Knowledge Management, (4354-4358)
  12. ACM
    Cao Y, Zhou X, Feng J, Huang P, Xiao Y, Chen D and Chen S Sampling Is All You Need on Modeling Long-Term User Behaviors for CTR Prediction Proceedings of the 31st ACM International Conference on Information & Knowledge Management, (2974-2983)
  13. Alzanin S, Azmi A and Aboalsamh H (2022). Short text classification for Arabic social media tweets, Journal of King Saud University - Computer and Information Sciences, 34:9, (6595-6604), Online publication date: 1-Oct-2022.
  14. ACM
    Greca R, Miranda B, Gligoric M and Bertolino A Comparing and combining file-based selection and similarity-based prioritization towards regression test orchestration Proceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test, (115-125)
  15. ACM
    O’hare K, Jurek-Loughrey A and De Campos C (2021). High-Value Token-Blocking: Efficient Blocking Method for Record Linkage, ACM Transactions on Knowledge Discovery from Data, 16:2, (1-17), Online publication date: 30-Apr-2022.
  16. ACM
    Zhang H, Santos A and Freire J DSDD Proceedings of the 30th ACM International Conference on Information & Knowledge Management, (2527-2536)
  17. ACM
    Amsterdamer Y and Cohen M Automated Selection of Multiple Datasets for Extension by Integration Proceedings of the 30th ACM International Conference on Information & Knowledge Management, (27-36)
  18. ACM
    Yao Y, Ghai T, Ravi S and Szekely P AMPPERE Proceedings of the 30th ACM International Conference on Information & Knowledge Management, (2394-2403)
  19. ACM
    Wang L, Yu Z, Wang M, Zhu X and Zhou Y MOOC Dropout Prediction Based on Dynamic Embedding Representation Learning Proceedings of the 5th International Conference on Computer Science and Application Engineering, (1-6)
  20. Naghavi Nozad S, Amir Haeri M and Folino G (2021). SDCOR, Knowledge-Based Systems, 228:C, Online publication date: 27-Sep-2021.
  21. Thirumuruganathan S, Li H, Tang N, Ouzzani M, Govind Y, Paulsen D, Fung G and Doan A (2021). Deep learning for blocking in entity matching, Proceedings of the VLDB Endowment, 14:11, (2459-2472), Online publication date: 1-Jul-2021.
  22. ACM
    Pandurangan G, Robinson P and Scquizzato M (2021). On the Distributed Complexity of Large-Scale Graph Computations, ACM Transactions on Parallel Computing, 8:2, (1-28), Online publication date: 30-Jun-2021.
  23. ACM
    Hosseini S and Turhan B A comparison of similarity based instance selection methods for cross project defect prediction Proceedings of the 36th Annual ACM Symposium on Applied Computing, (1455-1464)
  24. ACM
    Huang K, Zhai J, Zheng Z, Yi Y and Shen X Understanding and bridging the gaps in current GNN performance optimizations Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (119-132)
  25. Noei E, Zhang F and Zou Y (2021). Too Many User-Reviews! What Should App Developers Look at First?, IEEE Transactions on Software Engineering, 47:2, (367-378), Online publication date: 1-Feb-2021.
  26. Asgari-Chenaghlu M, Feizi-Derakhshi M, Farzinvash L, Balafar M, Motamed C and Xiong F (2021). Topic Detection and Tracking Techniques on Twitter, Complexity, 2021, Online publication date: 1-Jan-2021.
  27. ACM
    Makrani H, Sayadi H, Nazari N, Dinakarrao S, Sasan A, Mohsenin T, Rafatirad S and Homayoun H (2021). Adaptive Performance Modeling of Data-intensive Workloads for Resource Provisioning in Virtualized Environment, ACM Transactions on Modeling and Performance Evaluation of Computing Systems, 5:4, (1-24), Online publication date: 31-Dec-2021.
  28. ACM
    Beneventano D, Bergamaschi S, Gagliardelli L and Simonini G (2020). BLAST2, Journal of Data and Information Quality, 12:4, (1-22), Online publication date: 31-Dec-2021.
  29. Khan A and Zubair M (2020). Classification of multi-lingual tweets, into multi-class model using Naïve Bayes and semi-supervised learning, Multimedia Tools and Applications, 79:43-44, (32749-32767), Online publication date: 1-Nov-2020.
  30. ACM
    Ceccarello M, Pietracaprina A and Pucci G (2020). A General Coreset-Based Approach to Diversity Maximization under Matroid Constraints, ACM Transactions on Knowledge Discovery from Data, 14:5, (1-27), Online publication date: 31-Oct-2020.
  31. ACM
    Koumarelas I, Jiang L and Naumann F (2020). Data Preparation for Duplicate Detection, Journal of Data and Information Quality, 12:3, (1-24), Online publication date: 30-Sep-2020.
  32. ACM
    Liu Z, Lian J, Yang J, Lian D and Xie X Octopus Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, (289-298)
  33. Xydis S, Christoforidis E and Soudris D DDOT Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference, (1-6)
  34. ACM
    Abboud A, Cohen-Addad V and Klein P New hardness results for planar graph problems in p and an algorithm for sparsest cut Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, (996-1009)
  35. ACM
    Patroumpas K and Skoutas D Similarity search over enriched geospatial data Proceedings of the Sixth International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data, (1-6)
  36. Suresh A, Kumar R and Varatharajan R (2018). Health care data analysis using evolutionary algorithm, The Journal of Supercomputing, 76:6, (4262-4271), Online publication date: 1-Jun-2020.
  37. Guzun G and Canahuate G (2019). High-dimensional similarity searches using query driven dynamic quantization and distributed indexing, Distributed and Parallel Databases, 38:2, (255-286), Online publication date: 1-Jun-2020.
  38. Mongia M, Soudry B, Davoodi A and Mohimani H Efficient Database Search via Tensor Distribution Bucketing Advances in Knowledge Discovery and Data Mining, (341-353)
  39. ACM
    Doan K and Reddy C Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder Proceedings of The Web Conference 2020, (684-694)
  40. ACM
    Rashtchian C, Sharma A and Woodruff D LSF-Join: Locality Sensitive Filtering for Distributed All-Pairs Set Similarity Under Skew Proceedings of The Web Conference 2020, (2998-3004)
  41. Talat R, Muzammal M and Shan R (2019). A decentralised approach to scene completion using distributed feature hashgram, Multimedia Tools and Applications, 79:15-16, (9799-9817), Online publication date: 1-Apr-2020.
  42. ACM
    Jung Y and Wise A How and how well do students reflect? Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, (595-604)
  43. Carrillo-Mondejar J, Castelo Gomez J, Núñez-Gómez C, Roldán Gómez J, Martínez J and Zhang Y (2020). Automatic Analysis Architecture of IoT Malware Samples, Security and Communication Networks, 2020, Online publication date: 1-Jan-2020.
  44. Suarez-Tangil G, Edwards M, Peersman C, Stringhini G, Rashid A and Whitty M (2019). Automatically Dismantling Online Dating Fraud, IEEE Transactions on Information Forensics and Security, 15, (1128-1137), Online publication date: 1-Jan-2020.
  45. Bhih A, Johnson P and Randles M (2019). An optimisation tool for robust community detection algorithms using content and topology information, The Journal of Supercomputing, 76:1, (226-254), Online publication date: 1-Jan-2020.
  46. ACM
    Scherzinger S (2019). Build your own SQL-on-Hadoop Query Engine, ACM SIGMOD Record, 48:2, (33-38), Online publication date: 19-Dec-2019.
  47. Abboud A, Cohen-Addad V and Houdrouge H Subquadratic high-dimensional hierarchical clustering Proceedings of the 33rd International Conference on Neural Information Processing Systems, (11580-11590)
  48. Baharav T and Tse D Ultra fast medoid identification via correlated sequential halving Proceedings of the 33rd International Conference on Neural Information Processing Systems, (3655-3664)
  49. ACM
    Doan K, Yadav P and Reddy C Adversarial Factorization Autoencoder for Look-alike Modeling Proceedings of the 28th ACM International Conference on Information and Knowledge Management, (2803-2812)
  50. ACM
    Rozemberczki B, Davies R, Sarkar R and Sutton C GEMSEC Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, (65-72)
  51. Kim H, Bang J, Son S, Joo N, Choi M and Moon Y Message Latency-Based Load Shedding Mechanism in Apache Kafka Euro-Par 2019: Parallel Processing Workshops, (731-736)
  52. Li L, Yan J, Yang X and Jin Y Learning interpretable deep state space model for probabilistic time series forecasting Proceedings of the 28th International Joint Conference on Artificial Intelligence, (2901-2908)
  53. ACM
    Kumar S, Zhang X and Leskovec J Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (1269-1278)
  54. Kunft A, Katsifodimos A, Schelter S, Breß S, Rabl T and Markl V (2019). An intermediate representation for optimizing machine learning pipelines, Proceedings of the VLDB Endowment, 12:11, (1553-1567), Online publication date: 1-Jul-2019.
  55. Simonini G, Gagliardelli L, Bergamaschi S and Jagadish H (2019). Scaling entity resolution, Information Systems, 83:C, (145-165), Online publication date: 1-Jul-2019.
  56. ACM
    de Pablo-Sánchez C and Herruzo E About the possibility of recovering a trade network from Bill of Lading data Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets, (1-4)
  57. ACM
    Attanasio G, Cagliero L, Garza P and Baralis E Quantitative cryptocurrency trading Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets, (1-6)
  58. ACM
    Farvardin M, Colazzo D, Belhajjame K and Sartiani C Streaming saturation for large RDF graphs with dynamic schema information Proceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages, (42-52)
  59. Shen B, Shen Y and Ji W (2019). Profit optimization in service-oriented data market, Future Generation Computer Systems, 95:C, (17-25), Online publication date: 1-Jun-2019.
  60. Čebirić Š, Goasdoué F, Kondylakis H, Kotzinos D, Manolescu I, Troullinou G and Zneika M (2019). Summarizing semantic graphs, The VLDB Journal — The International Journal on Very Large Data Bases, 28:3, (295-327), Online publication date: 1-Jun-2019.
  61. ACM
    Liu Y, Safavi T, Dighe A and Koutra D (2018). Graph Summarization Methods and Applications, ACM Computing Surveys, 51:3, (1-34), Online publication date: 31-May-2019.
  62. Cruciani E, Miranda B, Verdecchia R and Bertolino A Scalable approaches for test suite reduction Proceedings of the 41st International Conference on Software Engineering, (419-429)
  63. Mohammadi M, Petkov N, Bunte K, Peletier R and Schleif F (2022). Globular cluster detection in the GAIA survey, Neurocomputing, 342:C, (164-171), Online publication date: 21-May-2019.
  64. ACM
    Ji S, Shao J and Yang T Efficient Interaction-based Neural Ranking with Locality Sensitive Hashing The World Wide Web Conference, (2858-2864)
  65. Martín I and Hernández J (2019). CloneSpot, Future Generation Computer Systems, 94:C, (740-748), Online publication date: 1-May-2019.
  66. Banihashemi S, Li J and Abhari A Scalable machine learning algorithms for a Twitter followee recommender system Proceedings of the Communications & Networking Symposium, (1-8)
  67. ACM
    Mbah R, Rege M and Misra B Using Spark and Scala for Discovering Latent Trends in Job Markets Proceedings of the 2019 3rd International Conference on Compute and Data Analysis, (55-62)
  68. Ceccarello M, Pietracaprina A and Pucci G (2019). Solving k-center clustering (with outliers) in MapReduce and streaming, almost as accurately as sequentially, Proceedings of the VLDB Endowment, 12:7, (766-778), Online publication date: 1-Mar-2019.
  69. Zolnoori M, Fung K, Patrick T, Fontelo P, Kharrazi H, Faiola A, Wu Y, Eldredge C, Luo J, Conway M, Zhu J, Park S, Xu K, Moayyed H and Goudarzvand S (2022). A systematic approach for developing a corpus of patient reported adverse drug events, Journal of Biomedical Informatics, 90:C, Online publication date: 1-Feb-2019.
  70. Aydar M and Ayvaz S (2019). An improved method of locality-sensitive hashing for scalable instance matching, Knowledge and Information Systems, 58:2, (275-294), Online publication date: 1-Feb-2019.
  71. Charikar M, Chatziafratis V and Niazadeh R Hierarchical clustering better than average-linkage Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, (2291-2304)
  72. Levy Abitbol J, Fleury E, Karsai M and Huang X (2019). Optimal Proxy Selection for Socioeconomic Status Inference on Twitter, Complexity, 2019, Online publication date: 1-Jan-2019.
  73. Chan G, Xu P, Dai Z and Ren L (2018). ViBr: Visualizing Bipartite Relations at Scale with the Minimum Description Length Principle, IEEE Transactions on Visualization and Computer Graphics, 25:1, (321-330), Online publication date: 1-Jan-2019.
  74. Chatzieleftheriou L, Karaliopoulos M and Koutsopoulos I (2018). Jointly Optimizing Content Caching and Recommendations in Small Cell Networks, IEEE Transactions on Mobile Computing, 18:1, (125-138), Online publication date: 1-Jan-2019.
  75. Lee K, Jeong Y, Lee S and Lee K (2019). Bucket-size balancing locality sensitive hashing using the map reduce paradigm, Cluster Computing, 22:1, (1959-1971), Online publication date: 1-Jan-2019.
  76. Abuzaid F, Kraft P, Suri S, Gan E, Xu E, Shenoy A, Ananthanarayan A, Sheu J, Meijer E, Wu X, Naughton J, Bailis P and Zaharia M (2018). DIFF, Proceedings of the VLDB Endowment, 12:4, (419-432), Online publication date: 1-Dec-2018.
  77. Gultepe E and Makrehchi M (2018). Improving clustering performance using independent component analysis and unsupervised feature learning, Human-centric Computing and Information Sciences, 8:1, (1-19), Online publication date: 1-Dec-2018.
  78. ACM
    Roughgarden T, Vassilvitskii S and Wang J (2018). Shuffles and Circuits (On Lower Bounds for Modern Parallel Computation), Journal of the ACM, 65:6, (1-24), Online publication date: 26-Nov-2018.
  79. ACM
    Zhang J, Danescu-Niculescu-Mizil C, Sauper C and Taylor S (2018). Characterizing Online Public Discussions through Patterns of Participant Interactions, Proceedings of the ACM on Human-Computer Interaction, 2:CSCW, (1-27), Online publication date: 1-Nov-2018.
  80. ACM
    Martínez S, Gérard S and Cabot J Robust Hashing for Models Proceedings of the 21th ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, (312-322)
  81. Palma-Mendoza R, Rodriguez D and De-Marcos L (2018). Distributed ReliefF-based feature selection in Spark, Knowledge and Information Systems, 57:1, (1-20), Online publication date: 1-Oct-2018.
  82. ACM
    Pandurangan G, Robinson P and Scquizzato M (2018). Fast Distributed Algorithms for Connectivity and MST in Large Graphs, ACM Transactions on Parallel Computing, 5:1, (1-22), Online publication date: 19-Sep-2018.
  83. ACM
    Ertl O BagMinHash - Minwise Hashing Algorithm for Weighted Sets Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (1368-1377)
  84. ACM
    Harvey N, Liaw C and Liu P Greedy and Local Ratio Algorithms in the MapReduce Model Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, (43-52)
  85. Rong K, Yoon C, Bergen K, Elezabi H, Bailis P, Levis P and Beroza G (2018). Locality-sensitive hashing for earthquake detection, Proceedings of the VLDB Endowment, 11:11, (1674-1687), Online publication date: 1-Jul-2018.
  86. ACM
    Leclercq É and Savonnet M A Tensor Based Data Model for Polystore Proceedings of the 22nd International Database Engineering & Applications Symposium, (110-118)
  87. ACM
    Gupta R, Pujara J, Knoblock C, Sharanappa S, Pulavarti B, Hoberg G and Phillips G Feature Selection Methods For Understanding Business Competitor Relationships Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets, (1-6)
  88. ACM
    Ramanan P Six Pass MapReduce Implementation of Strassen's Algorithm for Matrix Multiplication Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, (1-6)
  89. ACM
    Liu J, Rahbarinia B, Perdisci R, Du H and Su L Augmenting Telephone Spam Blacklists by Mining Large CDR Datasets Proceedings of the 2018 on Asia Conference on Computer and Communications Security, (273-284)
  90. ACM
    Miranda B, Cruciani E, Verdecchia R and Bertolino A FAST approaches to scalable similarity-based test case prioritization Proceedings of the 40th International Conference on Software Engineering, (222-232)
  91. ACM
    Fafalios P, Kasturia V and Nejdl W Ranking Archived Documents for Structured Queries on Semantic Layers Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, (155-164)
  92. Singh A, Singh S and Yousef M (2018). A conceptual framework for designing a big data course, Journal of Computing Sciences in Colleges, 33:5, (192-198), Online publication date: 1-May-2018.
  93. Hsu Y, Matsuda K and Matsuoka M Self-aware workload forecasting in data center power prediction Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, (321-330)
  94. Budikova P, Batko M and Zezula P (2018). ConceptRank for search-based image annotation, Multimedia Tools and Applications, 77:7, (8847-8882), Online publication date: 1-Apr-2018.
  95. ACM
    Wegba K, Lu A, Li Y and Wang W Interactive Storytelling for Movie Recommendation through Latent Semantic Analysis Proceedings of the 23rd International Conference on Intelligent User Interfaces, (521-533)
  96. Nargesian F, Zhu E, Pu K and Miller R (2018). Table union search on open data, Proceedings of the VLDB Endowment, 11:7, (813-825), Online publication date: 1-Mar-2018.
  97. ACM
    Ceccarello M, Pietracaprina A and Pucci G Fast Coreset-based Diversity Maximization under Matroid Constraints Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, (81-89)
  98. ACM
    Bury M, Schwiegelshohn C and Sorella M Sketch 'Em All Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, (72-80)
  99. Chuang Y, Yu C and Wu Q (2018). DSLM, The Journal of Supercomputing, 74:2, (738-767), Online publication date: 1-Feb-2018.
  100. Chang N, Baranwal A, Zhuang H, Shih M, Rajan R, Jia Y, Liao H, Li Y, Ku T and Lin R Machine learning based generic violation waiver system with application on electromigration sign-off Proceedings of the 23rd Asia and South Pacific Design Automation Conference, (416-421)
  101. ACM
    S S and Kumar P Enriching domain ontologies using question-answer datasets Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, (329-332)
  102. ACM
    Guha S, Baumer E and Gay G Regrets, I've Had a Few Proceedings of the 2018 ACM International Conference on Supporting Group Work, (166-177)
  103. ACM
    Rastogi A, Narang N and Siddiqui Z Imbalanced big data classification Proceedings of the Workshop Program of the 19th International Conference on Distributed Computing and Networking, (1-6)
  104. Thanei G, Meinshausen N and Shah R (2018). The xyz algorithm for fast interaction search in high-dimensional data, The Journal of Machine Learning Research, 19:1, (1343-1384), Online publication date: 1-Jan-2018.
  105. Abdelhameed S, Moussa S and Khalifa M (2018). Privacy-preserving tabular data publishing, Computers and Security, 72:C, (74-95), Online publication date: 1-Jan-2018.
  106. ACM
    Qiu F, Ge W and Dai X Code Recommendation with Natural Language Tags and Other Heterogeneous Data Proceedings of the 2017 International Conference on Computer Science and Artificial Intelligence, (137-142)
  107. Bateni M, Behnezhad S, Derakhshan M, Hajiaghayi M, Kiveris R, Lattanzi S and Mirrokni V Affinity clustering Proceedings of the 31st International Conference on Neural Information Processing Systems, (6867-6877)
  108. Drew J, Hahsler M and Moore T (2017). Polymorphic malware detection using sequence classification methods and ensembles, EURASIP Journal on Information Security, 2017:1, (1-12), Online publication date: 1-Dec-2017.
  109. (2017). Minmax Circular Sector Arc for External Plagiarisms Heuristic Retrieval stage, Knowledge-Based Systems, 137:C, (1-18), Online publication date: 1-Dec-2017.
  110. Khan K, Dolgorsuren B, Anh T, Nawaz W and Lee Y (2017). Faster compression methods for a weighted graph using locality sensitive hashing, Information Sciences: an International Journal, 421:C, (237-253), Online publication date: 1-Dec-2017.
  111. ACM
    Christakopoulou K, Kawale J and Banerjee A Recommendation with Capacity Constraints Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (1439-1448)
  112. ACM
    Mondal S, Shukla M and Lodha S Privacy Aware Temporal Profiling of Emails in Distributed Setup Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (1229-1238)
  113. ACM
    Ruchansky N, Seo S and Liu Y CSI Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (797-806)
  114. Rafiq Y, Dickens L, Russo A, Bandara A, Yang M, Stuart A, Levine M, Calikli G, Price B and Nuseibeh B Learning to share: engineering adaptive decision-support for online social networks Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, (280-285)
  115. ACM
    Riazi M, Samragh M and Koushanfar F (2017). CAMsure, ACM Transactions on Embedded Computing Systems, 16:5s, (1-20), Online publication date: 10-Oct-2017.
  116. Singh R, Meduri V, Elmagarmid A, Madden S, Papotti P, Quiané-Ruiz J, Solar-Lezama A and Tang N (2017). Synthesizing entity matching rules by examples, Proceedings of the VLDB Endowment, 11:2, (189-202), Online publication date: 1-Oct-2017.
  117. ACM
    Theodosiou T, Karapiperis D and Verykios V Using Wavelets for Matching Records Privately Proceedings of the 21st Pan-Hellenic Conference on Informatics, (1-6)
  118. Naik N and Purohit S (2017). Comparative Study of Binary Classification Methods to Analyze a Massive Dataset on Virtual Machine, Procedia Computer Science, 112:C, (1863-1870), Online publication date: 1-Sep-2017.
  119. ACM
    Bultel X, Ciucanu R, Giraud M and Lafourcade P Secure Matrix Multiplication with MapReduce Proceedings of the 12th International Conference on Availability, Reliability and Security, (1-10)
  120. ACM
    ElMessiry A, Zhang Z, Cooper W, Catron T, Karrass J and Singh M Leveraging Sentiment Analysis for Classifying Patient Complaints Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics, (44-51)
  121. Vesdapunt N and Garcia-Molina H Link Prediction and Hybrid Strategies for Updating a Social Graph Snapshot via a Limited API 2017 IEEE International Conference on Information Reuse and Integration (IRI), (207-216)
  122. Frmal S and Lecron F (2017). Weighting strategies for a recommender system using item clustering based on genres, Expert Systems with Applications: An International Journal, 77:C, (105-113), Online publication date: 1-Jul-2017.
  123. Fafalios P, Kasturia V and Nejdl W Towards a ranking model for semantic layers over digital archives Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries, (336-337)
  124. ACM
    Deng M and Ramanan P MapReduce Implementation of Strassen's Algorithm for Matrix Multiplication Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, (1-10)
  125. ACM
    Liu Y and McBrien P SPOWL Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, (1-10)
  126. ACM
    Fujiwara Y, Marumo N, Blondel M, Takeuchi K, Kim H, Iwata T and Ueda N Scaling Locally Linear Embedding Proceedings of the 2017 ACM International Conference on Management of Data, (1479-1492)
  127. Damaiyanti T, Imawan A, Indikawati F, Choi Y and Kwon J (2017). A similarity query system for road traffic data based on a NoSQL document store, Journal of Systems and Software, 127:C, (28-51), Online publication date: 1-May-2017.
  128. Yang C, Zhong L, Li L and Jie L Bi-directional Joint Inference for User Links and Attributes on Large Social Graphs Proceedings of the 26th International Conference on World Wide Web Companion, (564-573)
  129. Sharma A, Seshadhri C and Goel A When Hashes Met Wedges Proceedings of the 26th International Conference on World Wide Web, (431-440)
  130. ACM
    Huang H, Youssef A and Debbabi M BinSequence Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, (155-166)
  131. Yu C, Nutanong S, Li H, Wang C and Yuan X (2017). A Generic Method for Accelerating LSH-Based Similarity Join Processing, IEEE Transactions on Knowledge and Data Engineering, 29:4, (712-726), Online publication date: 1-Apr-2017.
  132. Liu L, Wiliem A, Chen S and Lovell B (2017). What is the best way for extracting meaningful attributes from pictures?, Pattern Recognition, 64:C, (314-326), Online publication date: 1-Apr-2017.
  133. ACM
    C. S and Sherimon V A proposed onto-Apriori algorithm to mine frequent patterns of high quality seafood Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing, (1-6)
  134. Csar T, Lackner M, Pichler R and Sallinger E Winner determination in huge elections with MapReduce Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (451-458)
  135. Beame P and Rashtchian C Massively-parallel similarity join, edge-isoperimetry, and distance correlations on the hypercube Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, (289-306)
  136. Ceccarello M, Pietracaprina A, Pucci G and Upfal E (2017). MapReduce and streaming algorithms for diversity maximization in metric spaces of bounded doubling dimension, Proceedings of the VLDB Endowment, 10:5, (469-480), Online publication date: 1-Jan-2017.
  137. ACM
    Afrati F, Dolev S, Korach E, Sharma S and Ullman J (2016). Assignment Problems of Different-Sized Inputs in MapReduce, ACM Transactions on Knowledge Discovery from Data, 11:2, (1-35), Online publication date: 26-Dec-2016.
  138. ACM
    Tall A, Wang J and Han D Survey of data intensive computing technologies application to to security log data management Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, (268-273)
  139. Li Z and Ge T (2016). Stochastic data acquisition for answering queries as time goes by, Proceedings of the VLDB Endowment, 10:3, (277-288), Online publication date: 1-Nov-2016.
  140. Shakiba A and Hooshmandasl M (2016). Neighborhood system S-approximation spaces and applications, Knowledge and Information Systems, 49:2, (749-794), Online publication date: 1-Nov-2016.
  141. ACM
    Moshfeghi Y, Velinov K and Triantafillou P Improving Search Results with Prior Similar Queries Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, (1985-1988)
  142. Koutsopoulos I and Spentzouris P Native Advertisement Selection and Allocation in Social Media Post Feeds European Conference on Machine Learning and Knowledge Discovery in Databases - Volume 9851, (588-603)
  143. Arnaiz-González Á, Díez-Pastor J, Rodríguez J and García-Osorio C (2016). Instance selection of linear complexity for big data, Knowledge-Based Systems, 107:C, (83-95), Online publication date: 1-Sep-2016.
  144. Tzelepis C, Galanopoulos D, Mezaris V and Patras I (2016). Learning to detect video events from zero or very few video examples, Image and Vision Computing, 53:C, (35-44), Online publication date: 1-Sep-2016.
  145. Jovanovic P, Romero O and Abelló A A Unified View of Data-Intensive Flows in Business Intelligence Systems Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIX - Volume 10120, (66-107)
  146. ACM
    Hao Y, Choi K and Downie J Exploring J-DISC Proceedings of the 3rd International Workshop on Digital Libraries for Musicology, (41-44)
  147. Simonini G, Bergamaschi S and Jagadish H (2016). BLAST, Proceedings of the VLDB Endowment, 9:12, (1173-1184), Online publication date: 1-Aug-2016.
  148. ACM
    Smith C and Albarghouthi A (2016). MapReduce program synthesis, ACM SIGPLAN Notices, 51:6, (326-340), Online publication date: 1-Aug-2016.
  149. Shakiba A and Hooshmandasl M (2016). Data volume reduction in covering approximation spaces with respect to twenty-two types of covering based rough sets, International Journal of Approximate Reasoning, 75:C, (13-38), Online publication date: 1-Aug-2016.
  150. ACM
    Belhajjame K and Bonifati A Data Exchange with MapReduce Proceedings of the 28th International Conference on Scientific and Statistical Database Management, (1-4)
  151. ACM
    Bonenfant M, Desai B, Desai D, Fung B, Özsu M and Ullman J Panel Proceedings of the 20th International Database Engineering & Applications Symposium, (2-11)
  152. ACM
    Roughgarden T, Vassilvitskii S and Wang J Shuffles and Circuits Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, (1-12)
  153. ACM
    Pandurangan G, Robinson P and Scquizzato M Fast Distributed Algorithms for Connectivity and MST in Large Graphs Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, (429-438)
  154. ACM
    Koutsopoulos I Optimal advertisement allocation in online social media feeds Proceedings of the 8th ACM International Workshop on Hot Topics in Planet-scale mObile computing and online Social neTworking, (43-48)
  155. ACM
    Ramanan P and Nagar A Tight bounds on one- and two-pass MapReduce algorithms for matrix multiplication Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, (1-9)
  156. ACM
    Grahne G, Harrafi S, Hedayati I and Moallemi A DFA minimization in map-reduce Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, (1-10)
  157. ACM
    Smith C and Albarghouthi A MapReduce program synthesis Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, (326-340)
  158. Modarresi K (2016). Recommendation System Based on Complete Personalization, Procedia Computer Science, 80:C, (2190-2204), Online publication date: 1-Jun-2016.
  159. ACM
    Qiu S, Wang B, Li M, Victors J, Liu J, Shi Y and Wang W Fast, Private and Verifiable Proceedings of the 4th ACM International Workshop on Security in Cloud Computing, (29-36)
  160. ACM
    Ranneries S, Kalør M, Nielsen S, Dalgaard L, Christensen L and Kanhabua N Wisdom of the local crowd Proceedings of the 8th ACM Conference on Web Science, (352-354)
  161. Menezes S and Parpinelli R Quality Analysis of Different Metrics for Data Clustering Using Bio-Inspired Algorithm and MapReduce Architecture Proceedings of the XII Brazilian Symposium on Information Systems on Brazilian Symposium on Information Systems: Information Systems in the Cloud Computing Era - Volume 1, (176-183)
  162. ACM
    Soto A, Mohammad A, Albert A, Islam A, Milios E, Doyle M, Minghim R and Ferreira de Oliveira M Similarity-Based Support for Text Reuse in Technical Writing Proceedings of the 2015 ACM Symposium on Document Engineering, (97-106)
  163. Silvestre G, Sauvanaud C, Kaâniche M and Kanoun K Tejo Proceedings of the 7th International Workshop on Software Engineering for Resilient Systems - Volume 9274, (114-127)
  164. ACM
    Coppa E and Finocchi I On data skewness, stragglers, and MapReduce progress indicators Proceedings of the Sixth ACM Symposium on Cloud Computing, (139-152)
  165. ACM
    Fujiwara Y, Nakatsuji M, Shiokawa H, Ida Y and Toyoda M Adaptive Message Update for Fast Affinity Propagation Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (309-318)
  166. ACM
    Helmer S and Ngo V A Similarity Measure for Weaving Patterns in Textiles Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, (163-172)
  167. Ziyuan Gu , Saberi M, Sarvi M and Zhiyuan Liu Calibration of traffic flow fundamental diagrams for network simulation applications: A two-stage clustering approach 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), (1348-1353)
Contributors
  • Stanford University
  • Stanford University
  • Stanford University

Recommendations