Real-world physical and abstract data objects are interconnected, forming gigantic, interconnected networks. By structuring these data objects and interactions between these objects into multiple types, such networks become semi-structured heterogeneous information networks. Most real-world applications that handle big data, including interconnected social media and social networks, scientific, engineering, or medical information systems, online e-commerce systems, and most database systems, can be structured into heterogeneous information networks. Therefore, effective analysis of large-scale heterogeneous information networks poses an interesting but critical challenge. In this book, we investigate the principles and methodologies of mining heterogeneous information networks. Departing from many existing network models that view interconnected data as homogeneous graphs or networks, our semi-structured heterogeneous information network model leverages the rich semantics of typed nodes and links in a network and uncovers surprisingly rich knowledge from the network. This semi-structured heterogeneous network modeling leads to a series of new principles and powerful methodologies for mining interconnected data, including: (1) rank-based clustering and classification; (2) meta-path-based similarity search and mining; (3) relation strength-aware mining, and many other potential developments. This book introduces this new research frontier and points out some promising research directions. Table of Contents: Introduction / Ranking-Based Clustering / Classification of Heterogeneous Information Networks / Meta-Path-Based Similarity Search / Meta-Path-Based Relationship Prediction / Relation Strength-Aware Clustering with Incomplete Attributes / User-Guided Clustering via Meta-Path Selection / Research Frontiers
Cited By
- Khokhar R, Fung B, Iqbal F, Al-Hussaeni K and Hussain M (2023). Differentially Private Release of Heterogeneous Network for Managing Healthcare Data, ACM Transactions on Knowledge Discovery from Data, 17:6, (1-30), Online publication date: 31-Dec-2024.
- Li Y, Gao H, Gao Y, Guo J and Wu W (2023). A Survey on Influence Maximization: From an ML-Based Combinatorial Optimization, ACM Transactions on Knowledge Discovery from Data, 17:9, (1-50), Online publication date: 30-Nov-2023.
- Gong J, Wan Y, Liu Y, Li X, Zhao Y, Wang C, Lin Y, Fang X, Feng W, Zhang J and Tang J (2023). Reinforced MOOCs Concept Recommendation in Heterogeneous Information Networks, ACM Transactions on the Web, 17:3, (1-27), Online publication date: 31-Aug-2023.
- Jin B, Zhang Y, Zhu Q and Han J Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, (1020-1031)
- Zhang J, Wang S, Sun Y and Peng Z (2023). Prerequisite-driven Fair Clustering on Heterogeneous Information Networks, Proceedings of the ACM on Management of Data, 1:2, (1-27), Online publication date: 13-Jun-2023.
- Zhang Y, Wu M, Zhang G and Lu J (2023). Stepping beyond your comfort zone, Journal of the Association for Information Science and Technology, 74:7, (775-790), Online publication date: 5-Jun-2023.
- Mao Q, Liu Z, Liu C and Sun J HINormer: Representation Learning On Heterogeneous Information Networks with Graph Transformer Proceedings of the ACM Web Conference 2023, (599-610)
- Guo J, Du L, Bi W, Fu Q, Ma X, Chen X, Han S, Zhang D and Zhang Y Homophily-oriented Heterogeneous Graph Rewiring Proceedings of the ACM Web Conference 2023, (511-522)
- Nguyen T, Liu Z and Fang Y Link Prediction on Latent Heterogeneous Graphs Proceedings of the ACM Web Conference 2023, (263-273)
- Li J, Shao H, Sun D, Wang R, Yan Y, Li J, Liu S, Tong H and Abdelzaher T Unsupervised Belief Representation Learning with Information-Theoretic Variational Graph Auto-Encoders Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, (1728-1738)
- Guan W, Jiao F, Song X, Wen H, Yeh C and Chang X Personalized Fashion Compatibility Modeling via Metapath-guided Heterogeneous Graph Learning Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, (482-491)
- Zhang S, Wang Y, Li H and Zhang W Who Will Support My Project? Interactive Search of Potential Crowdfunding Investors Through inSearch CHI Conference on Human Factors in Computing Systems Extended Abstracts, (1-6)
- Samy A, Giaretta L, Kefato Z and Girdzijauskas Š SchemaWalk: Schema Aware Random Walks for Heterogeneous Graph Embedding Companion Proceedings of the Web Conference 2022, (1157-1166)
- Zhang Y, Shen Z, Wu C, Xie B, Hao J, Wang Y, Wang K and Han J Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification Proceedings of the ACM Web Conference 2022, (3162-3173)
- Zhang Y, Garg S, Meng Y, Chen X and Han J MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, (1357-1367)
- Hu C, Yin M, Liu B, Li X and Ye Y (2021). Identifying Illicit Drug Dealers on Instagram with Large-scale Multimodal Data Fusion, ACM Transactions on Intelligent Systems and Technology, 12:5, (1-23), Online publication date: 31-Oct-2021.
- Peng H, Li J, Song Y, Yang R, Ranjan R, Yu P and He L (2021). Streaming Social Event Detection and Evolution Discovery in Heterogeneous Information Networks, ACM Transactions on Knowledge Discovery from Data, 15:5, (1-33), Online publication date: 31-Oct-2021.
- Xu Z, Zhang S, Xia Y, Xiong L, Xu J and Tong H DESTINE Proceedings of the 30th ACM International Conference on Information & Knowledge Management, (3558-3562)
- Hamidi Rad R, Bagheri E, Kargar M, Srivastava D and Szlichta J Retrieving Skill-Based Teams from Collaboration Networks Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2015-2019)
- Romanou A, Smeros P and Aberer K On Representation Learning for Scientific News Articles Using Heterogeneous Knowledge Graphs Companion Proceedings of the Web Conference 2021, (422-425)
- Sarıyüce A Motif-driven Dense Subgraph Discovery in Directed and Labeled Networks Proceedings of the Web Conference 2021, (379-390)
- Zhang Y, Qian Y, Fan Y, Ye Y, Li X, Xiong Q and Shao F dStyle-GAN: Generative Adversarial Network based on Writing and Photography Styles for Drug Identification in Darknet Markets Proceedings of the 36th Annual Computer Security Applications Conference, (669-680)
- Li Z, Shi Z, Huang X, Jin D and Liu C An Approach for Constructing Service Developers Oriented Heterogeneous Information Network Proceedings of the 2020 International Conference on Cyberspace Innovation of Advanced Technologies, (236-243)
- Liu Z, Hui Y and Huang L Query-Based Recommendation by HIN Embedding with PRE-LSTM Advanced Data Mining and Applications, (515-529)
- Li B and Pi D (2020). Network representation learning: a systematic literature review, Neural Computing and Applications, 32:21, (16647-16679), Online publication date: 1-Nov-2020.
- Jiang J, Li Z, Ju C and Wang W MARU Proceedings of the 29th ACM International Conference on Information & Knowledge Management, (575-584)
- Xiao Z, Song W, Xu H, Ren Z and Sun Y TIMME Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2258-2268)
- Qu L, Zhu H and Shi Y BSOGCN: Brain Storm Optimization Graph Convolutional Networks Based Heterogeneous Information Networks Embedding 2020 IEEE Congress on Evolutionary Computation (CEC), (1-7)
- Tang L, Liu Z, Zhao Y, Duan Z and Jia J (2020). Efficient Ridesharing Framework for Ride-matching via Heterogeneous Network Embedding, ACM Transactions on Knowledge Discovery from Data, 14:3, (1-24), Online publication date: 30-Jun-2020.
- Zhong Q, Liu Y, Ao X, Hu B, Feng J, Tang J and He Q Financial Defaulter Detection on Online Credit Payment via Multi-view Attributed Heterogeneous Information Network Proceedings of The Web Conference 2020, (785-795)
- Xiao P, Toivonen H, Gross O, Cardoso A, Correia J, Machado P, Martins P, Oliveira H, Sharma R, Pinto A, Díaz A, Francisco V, Gervás P, Hervás R, León C, Forth J, Purver M, Wiggins G, Miljković D, Podpečan V, Pollak S, Kralj J, Žnidaršič M, Bohanec M, Lavrač N, Urbančič T, Velde F and Battersby S (2019). Conceptual Representations for Computational Concept Creation, ACM Computing Surveys, 52:1, (1-33), Online publication date: 31-Jan-2020.
- Chandra D, Wang P, Leopold J and Fu Y Collective Representation Learning on Spatiotemporal Heterogeneous Information Networks Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, (319-328)
- He Y, Song Y, Li J, Ji C, Peng J and Peng H HeteSpaceyWalk Proceedings of the 28th ACM International Conference on Information and Knowledge Management, (639-648)
- Zheng Y, Shi C, Kong X and Ye Y Author Set Identification via Quasi-Clique Discovery Proceedings of the 28th ACM International Conference on Information and Knowledge Management, (771-780)
- Zhang Y, Fan Y, Ye Y, Zhao L and Shi C Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework Proceedings of the 28th ACM International Conference on Information and Knowledge Management, (549-558)
- Ye Y, Hou S, Chen L, Lei J, Wan W, Wang J, Xiong Q and Shao F Out-of-sample node representation learning for heterogeneous graph in real-time android malware detection Proceedings of the 28th International Joint Conference on Artificial Intelligence, (4150-4156)
- Hu J, Qian S, Fang Q, Liu X and Xu C (2019). A2CMHNE, ACM Transactions on Multimedia Computing, Communications, and Applications, 15:2, (1-17), Online publication date: 14-Jun-2019.
- Liu Y, Safavi T, Dighe A and Koutra D (2018). Graph Summarization Methods and Applications, ACM Computing Surveys, 51:3, (1-34), Online publication date: 31-May-2019.
- Tam N, Weidlich M, Zheng B, Yin H, Hung N and Stantic B (2019). From anomaly detection to rumour detection using data streams of social platforms, Proceedings of the VLDB Endowment, 12:9, (1016-1029), Online publication date: 1-May-2019.
- Zhou J and Fan J TransLink: User Identity Linkage across Heterogeneous Social Networks via Translating Embeddings IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, (2116-2124)
- Shi C, Hu B, Zhao W and Yu P (2019). Heterogeneous Information Network Embedding for Recommendation, IEEE Transactions on Knowledge and Data Engineering, 31:2, (357-370), Online publication date: 1-Feb-2019.
- Hu B, Zhang Z, Shi C, Zhou J, Li X and Qi Y Cash-out user detection based on attributed heterogeneous information network with a hierarchical attention mechanism Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, (946-953)
- Li Z, Jiang J, Sun Y and Wang W Personalized question routing via heterogeneous network embedding Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, (192-199)
- Kralj J, Robnik-Šikonja M and Lavrač N (2021). NetSDM, The Journal of Machine Learning Research, 20:1, (1086-1135), Online publication date: 1-Jan-2019.
- Ye Y, Hou S, Chen L, Li X, Zhao L, Xu S, Wang J and Xiong Q ICSD Proceedings of the 34th Annual Computer Security Applications Conference, (542-552)
- Zhang Y, Saberi M and Chang E (2018). A semantic-based knowledge fusion model for solution-oriented information network development, Scientometrics, 117:2, (857-886), Online publication date: 1-Nov-2018.
- Yu J, Gao M, Li J, Yin H and Liu H Adaptive Implicit Friends Identification over Heterogeneous Network for Social Recommendation Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (357-366)
- Yin S, Kang H, Chen Z and Kim S A malware detection system based on heterogeneous information network Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems, (154-159)
- Pliakos K and Vens C (2018). Network representation with clustering tree features, Journal of Intelligent Information Systems, 51:2, (341-365), Online publication date: 1-Oct-2018.
- Pliakos K, Geurts P and Vens C (2018). Global multi-output decision trees for interaction prediction, Machine Language, 107:8-10, (1257-1281), Online publication date: 1-Sep-2018.
- Luo C, Chen Z, Tang L, Shrivastava A, Li Z, Chen H and Ye J TINET Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (1890-1899)
- Fan Y, Hou S, Zhang Y, Ye Y and Abdulhayoglu M Gotcha - Sly Malware! Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (253-262)
- Hou S, Ye Y, Song Y and Abdulhayoglu M Make evasion harder Proceedings of the 27th International Joint Conference on Artificial Intelligence, (5279-5283)
- Seo J, Choi S, Kim Y, Yoo K and Han S (2018). Word embedding-based relation modeling in a heterogeneous information network, Multimedia Tools and Applications, 77:14, (18529-18543), Online publication date: 1-Jul-2018.
- Spitz A, Costa D, Chen K, Greulich J, Geiß J, Wiesberg S and Gertz M Heterogeneous subgraph features for information networks Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), (1-9)
- Zhang Y, Xiong Y, Kong X, Li S, Mi J and Zhu Y Deep Collective Classification in Heterogeneous Information Networks Proceedings of the 2018 World Wide Web Conference, (399-408)
- Minervini P, Tresp V, D'amato C and Fanizzi N (2017). Adaptive Knowledge Propagation in Web Ontologies, ACM Transactions on the Web, 12:1, (1-28), Online publication date: 5-Feb-2018.
- Huo Z, Huang X and Hu X Link prediction with personalized social influence Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, (2289-2296)
- Nandanwar S, Moroney A and Murty M Fusing Diversity in Recommendations in Heterogeneous Information Networks Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, (414-422)
- Wang X, Song J, Lu K and Wang X (2017). Community detection in attributed networks based on heterogeneous vertex interactions, Applied Intelligence, 47:4, (1270-1281), Online publication date: 1-Dec-2017.
- Wang C, Song Y, Li H, Sun Y, Zhang M and Han J Distant Meta-Path Similarities for Text-Based Heterogeneous Information Networks Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (1629-1638)
- Ma Y, Yang N, Zhang L and Yu P (2017). Predicting neighbor label distributions in dynamic heterogeneous information networks, World Wide Web, 20:6, (1269-1291), Online publication date: 1-Nov-2017.
- Shao Y, Lei K, Chen L, Huang Z, Cui B, Liu Z, Tong Y and Xu J (2017). Fast Parallel Path Concatenation for Graph Extraction, IEEE Transactions on Knowledge and Data Engineering, 29:10, (2210-2222), Online publication date: 1-Oct-2017.
- Jiang L and Yang C (2017). User recommendation in healthcare social media by assessing user similarity in heterogeneous network, Artificial Intelligence in Medicine, 81:C, (63-77), Online publication date: 1-Sep-2017.
- Jiang H, Song Y, Wang C, Zhang M and Sun Y Semi-supervised learning over heterogeneous information networks by ensemble of meta-graph guided random walks Proceedings of the 26th International Joint Conference on Artificial Intelligence, (1944-1950)
- Vahedian F, Burke R and Mobasher B (2017). Multirelational Recommendation in Heterogeneous Networks, ACM Transactions on the Web, 11:3, (1-34), Online publication date: 12-Jul-2017.
- Jana A, Mooriyath S, Mukherjee A and Goyal P WikiM Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries, (21-30)
- Mahajan D, Kolathur V, Bansal C, Parthasarathy S, Sellamanickam S, Keerthi S and Gehrke J Hashtag Recommendation for Enterprise Applications Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, (893-902)
- Kahng M, Navathe S, Stasko J and Chau D (2016). Interactive browsing and navigation in relational databases, Proceedings of the VLDB Endowment, 9:12, (1017-1028), Online publication date: 1-Aug-2016.
- Cheng W, Guo Z, Zhang X and Wang W (2016). CGC, ACM Transactions on Knowledge Discovery from Data, 10:4, (1-27), Online publication date: 27-Jul-2016.
- Chen T, Tang L, Sun Y, Chen Z and Zhang K Entity embedding-based anomaly detection for heterogeneous categorical events Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, (1396-1403)
- Wang C, Song Y, Li H, Zhang M and Han J Text classification with heterogeneous information network kernels Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, (2130-2136)
- Potkonjak M, Chen D, Kalla P and Levitan S DA Vision 2015 Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, (271-277)
- Piškorec M, Sluban B and Šmuc T MultiNets Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III, (298-302)
- Vu T and Parker D Node Embeddings in Social Network Analysis Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, (326-329)
- Zhao Y, Liu Z and Sun M Representation learning for measuring entity relatedness with rich information Proceedings of the 24th International Conference on Artificial Intelligence, (1412-1418)
- Lee S, Kahng M and Lee S (2015). Constructing compact and effective graphs for recommender systems via node and edge aggregations, Expert Systems with Applications: An International Journal, 42:7, (3396-3409), Online publication date: 1-May-2015.
- Roy S, Eliassi-Rad T and Papadimitriou S (2015). Fast Best-Effort Search on Graphs with Multiple Attributes, IEEE Transactions on Knowledge and Data Engineering, 27:3, (755-768), Online publication date: 1-Mar-2015.
- Liu Y, Xu S and Duan L Relationship Emergence Prediction in Heterogeneous Networks through Dynamic Frequent Subgraph Mining Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, (1649-1658)
- Vahedian F Weighted hybrid recommendation for heterogeneous networks Proceedings of the 8th ACM Conference on Recommender systems, (429-432)
- Liu L and Peng T (2014). Clustering-based topical Web crawling using CFu-tree guided by link-context, Frontiers of Computer Science: Selected Publications from Chinese Universities, 8:4, (581-595), Online publication date: 1-Aug-2014.
- Tao F, Brova G, Han J, Ji H, Wang C, Norick B, El-Kishky A, Liu J, Ren X and Sun Y NewsNetExplorer Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, (1091-1094)
- Luo C, Guan R, Wang Z and Lin C HetPathMine Proceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 8416, (210-221)
- Meng Q and Kennedy P Discovering influential authors in heterogeneous academic networks by a co-ranking method Proceedings of the 22nd ACM international conference on Information & Knowledge Management, (1029-1036)
- Cheng W, Zhang X, Guo Z, Wu Y, Sullivan P and Wang W Flexible and robust co-regularized multi-domain graph clustering Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, (320-328)
- Qian Z and Schulte O Learning Bayes Nets for Relational Data with Link Uncertainty Revised Selected Papers of the Third International Workshop on Graph Structures for Knowledge Representation and Reasoning - Volume 8323, (123-137)
- Tao F, Yu X, Lei K, Brova G, Cheng X, Han J, Kanade R, Sun Y, Wang C, Wang L and Weninger T Research-insight Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, (1093-1096)
- Lim E, Chen H and Chen G (2013). Business Intelligence and Analytics, ACM Transactions on Management Information Systems, 3:4, (1-10), Online publication date: 1-Jan-2013.
- Han J Mining heterogeneous information networks Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, (2-3)
- Potkonjak M, Chen D, Kalla P and Levitan S DA vision 2015: From here to eternity 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), (271-277)
- Liu C, Yu J, Liu Y, Yu M, Yu R, Li X, Zhao M, Xu T, Liu H and Xu L RL4HIN: Representation Learning for Heterogeneous Information Networks 2019 IEEE Global Communications Conference (GLOBECOM), (1-6)
Index Terms
- Mining Heterogeneous Information Networks: Principles and Methodologies
Recommendations
Mining heterogeneous information networks: the next frontier
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningReal world physical and abstract data objects are interconnected, forming gigantic, interconnected networks. By structuring these data objects into multiple types, such networks become semi-structured heterogeneous information networks. Most real world ...
Mining heterogeneous information networks: a structural analysis approach
Most objects and data in the real world are of multiple types, interconnected, forming complex, heterogeneous but often semi-structured information networks. However, most network science researchers are focused on homogeneous networks, without ...
Sparse representation for heterogeneous information networks
Highlights- A sparse representation method of heterogeneous information networks is proposed.
AbstractA complex network is a fundamental tool to describe real-world complex systems, with most real-world systems containing multiple object types and relationships that can be described as heterogeneous information networks. However, with ...