research-article

Public Access

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

Authors:
Chao Zhang

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Fangbo Tao

Facebook Inc., Menlo Park, CA, USA

Facebook Inc., Menlo Park, CA, USA
View Profile

,
Xiusi Chen

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Jiaming Shen

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Meng Jiang

University of Notre Dame, Notre Dame, IN, USA

University of Notre Dame, Notre Dame, IN, USA
View Profile

,
Brian Sadler

U.S. Army Research Laboratory, Adelphi, MD, USA

U.S. Army Research Laboratory, Adelphi, MD, USA
View Profile

,
Michelle Vanni

U.S. Army Research Laboratory, Adelphi, MD, USA

U.S. Army Research Laboratory, Adelphi, MD, USA
View Profile

,
Jiawei Han

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2018Pages 2701–2709https://doi.org/10.1145/3219819.3220064

Published:19 July 2018Publication History

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 2701–2709

ABSTRACT

Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they overlook the topical proximity and the semantic correlations among terms. In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion. To ensure the quality of the recursive process, it consists of: (1) an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones; (2) a local embedding module for learning term embeddings that maintain strong discriminative power at different levels of the taxonomy. Our experiments on two real datasets demonstrate the effectiveness of TaxoGen compared with baseline methods.

Supplemental Material

zhang_taxogen_construction.mp4

mp4

421 MB

Download

References

E. Agichtein and L. Gravano. Snowball: Extracting relations from large plain-text collections. In ACM DL, pages 85--94, 2000. Google ScholarDigital Library
L. E. Anke, J. Camacho-Collados, C. D. Bovi, and H. Saggion. Supervised distributional hypernym discovery via domain adaptation. In EMNLP, pages 424--435, 2016.Google Scholar
M. Bansal, D. Burkett, G. de Melo, and D. Klein. Structured learning for taxonomy induction with belief propagation. In ACL, pages 1041--1051, 2014.Google ScholarCross Ref
D. M. Blei, T. L. Griffiths, M. I. Jordan, and J. B. Tenenbaum. Hierarchical topic models and the nested chinese restaurant process. In NIPS, pages 17--24, 2003. Google ScholarDigital Library
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka Jr, and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, volume 5, page 3, 2010. Google ScholarDigital Library
P. Cimiano, A. Hotho, and S. Staab. Comparing conceptual, divisive and agglomerative clustering for learning taxonomies from text. In ECAI, pages 435--439, 2004. Google ScholarDigital Library
B. Cui, J. Yao, G. Cong, and Y. Huang. Evolutionary taxonomy construction from dynamic tag space. In WISE, pages 105--119, 2010. Google ScholarDigital Library
D. L. Davies and D. W. Bouldin. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell., 1(2):224--227, 1979. Google ScholarDigital Library
I. S. Dhillon and D. S. Modha. Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1/2):143--175, 2001.Google ScholarDigital Library
D. Downey, C. Bhagavatula, and Y. Yang. Efficient methods for inferring large sparse topic hierarchies. In ACL, 2015.Google ScholarCross Ref
R. Fu, J. Guo, B. Qin, W. Che, H. Wang, and T. Liu. Learning semantic hierarchies via word embeddings. In ACL, pages 1199--1209, 2014.Google ScholarCross Ref
G. Grefenstette. Inriasac: Simple hypernym extraction methods. In SemEval@NAACL-HLT, 2015.Google ScholarCross Ref
M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In COLING, pages 539--545, 1992. Google ScholarDigital Library
M. Jiang, J. Shang, T. Cassidy, X. Ren, L. M. Kaplan, T. P. Hanratty, and J. Han. Metapad: Meta pattern discovery from massive text corpora. In KDD, 2017. Google ScholarDigital Library
Z. Kozareva and E. H. Hovy. A semi-supervised method to learn and construct taxonomies using the web. In ACL, pages 1110--1118, 2010. Google ScholarDigital Library
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. On semi-automated web taxonomy construction. In WebDB, pages 91--96, 2001.Google Scholar
X. Liu, Y. Song, S. Liu, and H. Wang. Automatic taxonomy construction from keywords. In KDD, pages 1433--1441, 2012. Google ScholarDigital Library
A. T. Luu, J. Kim, and S. Ng. Taxonomy construction using syntactic contextual evidence. In EMNLP, pages 810--819, 2014.Google Scholar
A. T. Luu, Y. Tay, S. C. Hui, and S. Ng. Learning term embeddings for taxonomic relation identification using dynamic weighting neural network. In EMNLP, pages 403--413, 2016.Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013. Google ScholarDigital Library
D. M. Mimno, W. Li, and A. McCallum. Mixtures of hierarchical topics with pachinko allocation. In ICML, pages 633--640, 2007. Google ScholarDigital Library
N. Nakashole, G. Weikum, and F. Suchanek. Patty: A taxonomy of relational patterns with semantic types. In EMNLP, pages 1135--1145, 2012. Google ScholarDigital Library
A. Panchenko, S. Faralli, E. Ruppert, S. Remus, H. Naets, C. Fairon, S. P. Ponzetto, and C. Biemann. Taxi at semeval-2016 task 13: a taxonomy induction method based on lexico-syntactic patterns, substrings and focused crawling. In SemEval@NAACL-HLT, 2016.Google ScholarCross Ref
S. P. Ponzetto and M. Strube. Deriving a large-scale taxonomy from wikipedia. In AAAI, 2007. Google ScholarDigital Library
J. Seitner, C. Bizer, K. Eckert, S. Faralli, R. Meusel, H. Paulheim, and S. P. Ponzetto. A large database of hypernymy relations extracted from the web. In LREC, 2016.Google Scholar
R. Shearer and I. Horrocks. Exploiting partial information in taxonomy construction. The Semantic Web-ISWC 2009, pages 569--584, 2009. Google ScholarDigital Library
C. Wang, M. Danilevsky, N. Desai, Y. Zhang, P. Nguyen, T. Taula, and J. Han. A phrase mining framework for recursive construction of a topical hierarchy. In KDD, 2013. Google ScholarDigital Library
J. Weeds, D. Clarke, J. Reffin, D. J. Weir, and B. Keller. Learning to distinguish hypernyms and co-hyponyms. In COLING, 2014.Google Scholar
W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: A probabilistic taxonomy for text understanding. In SIGMOD, pages 481--492. ACM, 2012. Google ScholarDigital Library
H. Yang and J. Callan. A metric-based framework for automatic taxonomy induction. In ACL, pages 271--279, 2009. Google ScholarDigital Library
Z. Yu, H. Wang, X. Lin, and M. Wang. Learning term embeddings for hypernymy identification. In IJCAI, 2015. Google ScholarDigital Library
Y. Zhang, A. Ahmed, V. Josifovski, and A. J. Smola. Taxonomy discovery for personalized recommendation. In WSDM, 2014. Google ScholarDigital Library
J. Zhu, Z. Nie, X. Liu, B. Zhang, and J.-R. Wen. Statsnowball: a statistical approach to extracting entity relationships. In WWW, 2009. Google ScholarDigital Library

Index Terms

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

CoRel: Seed-Guided Topical Taxonomy Construction by Concept Learning and Relation Transferring
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Taxonomy is not only a fundamental form of knowledge representation, but also crucial to vast knowledge-rich applications, such as question answering and web search. Most existing taxonomy construction methods extract hypernym-hyponym entity pairs to ...
Read More
SemRe-Rank: Improving Automatic Term Extraction by Incorporating Semantic Relatedness with Personalised PageRank

Automatic Term Extraction (ATE) deals with the extraction of terminology from a domain specific corpus, and has long been an established research area in data and knowledge acquisition. ATE remains a challenging task as it is known that there is no ...
Read More
Mining coherent topics in documents using word embeddings and large-scale text data

Probabilistic topic models have been extensively used to extract low-dimension aspects from document collections. However, such models without any human knowledge often generate topics that are not interpretable. Recently, a number of knowledge-based ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
taxonomy construction
text mining
word embedding
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 2,249
  Total Downloads
- Downloads (Last 12 months)263
- Downloads (Last 6 weeks)69
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

CoRel: Seed-Guided Topical Taxonomy Construction by Concept Learning and Relation Transferring

SemRe-Rank: Improving Automatic Term Extraction by Incorporating Semantic Relatedness with Personalised PageRank

Mining coherent topics in documents using word embeddings and large-scale text data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

CoRel: Seed-Guided Topical Taxonomy Construction by Concept Learning and Relation Transferring

SemRe-Rank: Improving Automatic Term Extraction by Incorporating Semantic Relatedness with Personalised PageRank

Mining coherent topics in documents using word embeddings and large-scale text data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media