ABSTRACT
Collaborative knowledge bases that make their data freely available in a machine-readable form are central for the data strategy of many projects and organizations. The two major collaborative knowledge bases are Wikimedia's Wikidata and Google's Freebase. Due to the success of Wikidata, Google decided in 2014 to offer the content of Freebase to the Wikidata community. In this paper, we report on the ongoing transfer efforts and data mapping challenges, and provide an analysis of the effort so far. We describe the Primary Sources Tool, which aims to facilitate this and future data migrations. Throughout the migration, we have gained deep insights into both Wikidata and Freebase, and share and discuss detailed statistics on both knowledge bases.
- P. Ayers, C. Matthews, and B. Yates. How Wikipedia Works: And How You Can Be a Part of It. No Starch Press, Sept. 2008.Google Scholar
- R. Bennett, C. Hengel-Dittrich, E. T. O'Neill, and B. B. Tillett. VIAF (Virtual International Authority File): Linking die Deutsche Bibliothek and Library of Congress Name Authority Files. In World Library and Information Congress:nth72 IFLA General Conference and Council, 2006.Google Scholar
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD'08, pages 1247--1250, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- C. E. Campbell, A. Eisenberg, and J. Melton. XML Schema. SIGMOD Rec., 32(2):96--101, June 2003. Google ScholarDigital Library
- N. Choi, I.-Y. Song, and H. Han. A Survey on Ontology Mapping. ACM Sigmod Record, 35(3):34--41, 2006. Google ScholarDigital Library
- R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1 Concepts and Abstract Syntax. World Wide Web Consortium, Feb. 2014. https://www.w3.org/TR/rdf11-concepts/.Google Scholar
- H.-J. Dai, C.-Y. Wu, R. Tsai, W. Hsu, et al. From Entity Recognition to Entity Linking: A Survey of Advanced Entity Linking Techniques. In The 26th Annual Conference of the Japanese Society for Artificial Intelligence, pages 1--10, 2012.Google Scholar
- A. Doan and A. Y. Halevy. Semantic Integration Research in the Database Community: A Brief Survey. AI Magazine, 26(1):83, 2005. Google ScholarDigital Library
- X. Dong, E. Gabrilovich, et al. Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 601--610. ACM, 2014. Google ScholarDigital Library
- J. Douglas. Announcement: From Freebase to Wikidata, Dec 2014. https://groups.google.com/d/msg/freebase-discuss/s_BPoL92edc/Y585r7_2E1YJ.Google Scholar
- F. Flöck, D. Laniado, F. Stadthaus, and M. Acosta. Towards Better Visual Tools for Exploring Wikipedia Article Development--The Use Case of "Gamergate Controversy". In Ninth International AAAI Conference on Web and Social Media, 2015.Google Scholar
- M. Färber, B. Ell, C. Menne, and A. Rettinger. A Comparative Survey of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semantic Web Journal, July 2015. http://www.semantic-web-journal.net/content/comparative-survey-dbpedia-freebase-opencyc-wikidata-and-yago (submitted, pending major revision).Google Scholar
- A. Gesmundo and K. Hall. Projecting the Knowledge Graph to Syntactic Parsing. In G. Bouma and Y. Parmentier, editors, Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014, April 26--30, 2014, Gothenburg, Sweden, pages 28--32. The Association for Computer Linguistics, 2014. Google ScholarCross Ref
- B. Hachey, W. Radford, J. Nothman, M. Honnibal, and J. R. Curran. Evaluating Entity Linking with Wikipedia. Artificial intelligence, 194:130--150, 2013. Google ScholarDigital Library
- D. Harris. Google is Shutting Down its Freebase Knowledge Base. GigaOM, Dec. 2014. https://gigaom.com/2014/12/16/google-is-shutting-down-its-freebase-knowledge-base/.Google Scholar
- T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology. Morgan and Claypool, 2011. Google ScholarDigital Library
- D. Hernández, A. Hogan, and M. Krötzsch. Reifying RDF: What Works Well With Wikidata? In T. Liebig and A. Fokoue, editors, Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems, volume 1457 of CEUR, pages 32--47. CEUR-WS.org, 2015.Google Scholar
- M. Horridge, T. Tudorache, C. Nuylas, J. Vendetti, N. F. Noy, and M. A. Musen. WebProtege: A Collaborative Web Based Platform for Editing Biomedical Ontologies. Bioinformatics, pages 1--2, May 2014. Google ScholarCross Ref
- B. Hyland, G. Atemezing, and B. Villazon-Terrazas. Best Practices for Publishing Linked Data. W3C Working Group Note. World Wide Web Consortium, Jan. 2014. http://www.w3.org/TR/ld-bp/.Google Scholar
- J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, et al. DBpedia--A Large-Scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal, 5:1--29, 2014.Google Scholar
- P. Miller, R. Styles, and T. Heath. Open Data Commons, a License for Open Data. In Proceedings of the Linked Data on the Web workshop, Beijing, China, Apr. 2008.Google Scholar
- D. Milne and I. H. Witten. An Open-Source Toolkit for Mining Wikipedia. Artificial Intelligence, 194:222--239, 2013. Google ScholarDigital Library
- J. Moskaliuk, J. Kimmerle, and U. Cress. Collaborative Knowledge Building with Wikis: The Impact of Redundancy and Polarity. Computers & Education, 58(4):1049--1057, 2012. Google ScholarDigital Library
- D. Peters. Expanding the Public Domain: Part Zero. Creative Commons, Mar. 2009. http://creativecommons.org/weblog/entry/13304.Google Scholar
- R. Press. Ontology and Database Mapping: A Survey of Current Implementations and Future Directions. Journal of Web Engineering, 7(1):001--024, 2008. Google ScholarDigital Library
- V. Rodríguez-Doncel, M. C. Suárez-Figueroa, A. Gómez-Pérez, and M. Poveda. License Linked Data Resources Pattern. In Proceedings of thenth4 International Workshop on Ontology Patterns, Sydney, Australia, 2013. Google ScholarDigital Library
- M. Schindler and D. Vrandevcić. Introducing New Features to Wikipedia: Case Studies for Web Science. IEEE Intelligent Systems, (1):56--61, 2011. Google ScholarDigital Library
- W. Shen, J. Wang, and J. Han. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. Knowledge and Data Engineering, IEEE Transactions on, 27(2):443--460, 2015.Google Scholar
- A. Singhal. Introducing the Knowledge Graph: Things, not Strings. Official Google Blog, May 2012. http://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html.Google Scholar
- A. Swartz. MusicBrainz: A Semantic Web Service. IEEE Intelligent Systems, 17:76--77, Jan. 2002. Google ScholarDigital Library
- D. Vrandevcić and M. Krötzsch. Wikidata: A Free Collaborative Knowledgebase. Communications of the ACM, 57(10):78--85, 2014. Google ScholarDigital Library
- D. Vrandečić. Wikidata Requirements. Wikimedia Foundation, Apr. 2012. https://meta.wikimedia.org/w/index.php?title=Wikidata/Notes/Requirements&oldid=3646045.Google Scholar
- R. West, E. Gabrilovich, K. Murphy, S. Sun, R. Gupta, and D. Lin. Knowledge Base Completion via Search-based Question Answering. In Proceedings of the 23rd International Conference on World Wide Web, WWW'14, pages 515--526, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- E. Zachte. Statistics Wikidata. Wikimedia Foundation, Sept. 2015. http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaWIKIDATA.htm.Google Scholar
Index Terms
- From Freebase to Wikidata: The Great Migration
Recommendations
Wikidata: a new platform for collaborative data collection
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebThis year, Wikimedia starts to build a new platform for the collaborative acquisition and maintenance of structured data: Wikidata. Wikidata's prime purpose is to be used within the other Wikimedia projects, like Wikipedia, to provide well-maintained, ...
Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO
Quality management of Semantic Web assets (data, services and systems)In recent years, several noteworthy large, cross-domain, and openly available knowledge graphs (KGs) have been created. These include DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Although extensively in use, these KGs have not been subject to an in-...
Entity Disambiguation with Freebase
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01Entity disambiguation with a knowledge base becomes increasingly popular in the NLP community. In this paper, we employ Freebase as the knowledge base, which contains significantly more entities than Wikipedia and others. While huge in size, Freebase ...
Comments