research-article

Substructure similarity measurement in chinese recipes

Authors:
Liping Wang

City University of Hong Kong, Hong Kong, Hong Kong

City University of Hong Kong, Hong Kong, Hong Kong
View Profile

,
Qing Li

City University of Hong Kong, Hong Kong, Hong Kong

City University of Hong Kong, Hong Kong, Hong Kong
View Profile

,
Na Li

City University of Hong Kong, Hong Kong, Hong Kong

City University of Hong Kong, Hong Kong, Hong Kong
View Profile

,
Guozhu Dong

Wright State University, Dayton, OH, USA

Wright State University, Dayton, OH, USA
View Profile

,
Yu Yang

City University of Hong Kong, Hong Kong, Hong Kong

City University of Hong Kong, Hong Kong, Hong Kong
View Profile

WWW '08: Proceedings of the 17th international conference on World Wide WebApril 2008Pages 979–988https://doi.org/10.1145/1367497.1367629

Published:21 April 2008Publication History

WWW '08: Proceedings of the 17th international conference on World Wide Web

Pages 979–988

ABSTRACT

Improving the precision of information retrieval has been a challenging issue on Chinese Web. As exemplified by Chinese recipes on the Web, it is not easy/natural for people to use keywords (e.g. recipe names) to search recipes, since the names can be literally so abstract that they do not bear much, if any, information on the underlying ingredients or cooking methods. In this paper, we investigate the underlying features of Chinese recipes, and based on workflow-like cooking procedures, we model recipes as graphs. We further propose a novel similarity measurement based on the frequent patterns, and devise an effective filtering algorithm to prune unrelated data so as to support efficient on-line searching. Benefiting from the characteristics of graphs, frequent common patterns can be mined from a cooking graph database. So in our prototype system called RecipeView, we extend the subgraph mining algorithm FSG to cooking graphs and combine it with our proposed similarity measurement, resulting in an approach that well caters for specific users' needs. Our initial experimental studies show that the filtering algorithm can efficiently prune unrelated cooking graphs without affecting the retrieval performance and the similarity measurement gets a relatively higher precision/recall against its counterparts

References

Bunke, H., and Shearer, K. A graph distance metric based on the maximal common subgraph. Pattern Recogn. Lett. 19, 3-4 (1998), 255--259. Google ScholarDigital Library
Cook, D. J., and Holder, L. B. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research 1 (1994), 231--255.Google ScholarDigital Library
Conte, D., Guidobaldi, C., and Sansone, C. A comparison of three maximum common subgraph algorithms on a large database of labeled graphs. In Proc. of the 4th IAPR International Workshop on Graph Based Representations in Pattern Recognition (GbRPR), York, UK, 2003, pp. 589--607. Google ScholarDigital Library
Djoko, S., Cook, D. J., and Holder, L. B. An empirical study of domain knowledge and its benefits to substructure discovery. IEEE Transactions on Knowledge and Data Engineering 9, 4 (1997), 575--586. Google ScholarDigital Library
Government News. http://www.cq.xinhua.org/food/200801/15/content_12221389.htmGoogle Scholar
Homepage ChemIDPlus. http://chem.sis.nlm.nih.gov/chemidplus/.Google Scholar
Homepage Simpack. http://www.ifi.unizh.ch/ddis/simpack.html.Google Scholar
Inokuchi, A., Washio, T., and Motoda, H. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), London, UK, 2000, pp. 13--23. Google ScholarDigital Library
Karakoc, E., Cherkasov, A., and Sahinalp, S. C. Novel approaches for small biomolecule classification and structural similarity search. SIGKDD Explor. Newsl. 9, 1 (2007), 14--21. Google ScholarDigital Library
Kuramochi, M., and Karypis, G. Frequent subgraph discovery. In Proc. of the IEEE International Conference on Data Mining (ICDM), San Jose, USA, 2001, pp. 313--320. Google ScholarDigital Library
Li, Y., Meng, X., Wang, L., and Li, Q. RecipeCrawler: collecting recipe data from www incrementally. In Proc. of the 7th International Conference on Web-Age Information Management (WAIM), Hong Kong, China, 2006, pp. 263--274. Google ScholarDigital Library
Wang, L., Li, Q. A personalized recipe database system with user-centered adaptation and tutoring support. In ACM SIGMOD Ph.D. workshop on Innovative database research (IDAR), 2007.Google Scholar
Messmer, B. T., and Bunke, H. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Trans. Pattern Anal. Mach. Intell. 20, 5 (1998), 493--504. Google ScholarDigital Library
Salton, G., Wong, A., and Yang, C. S. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613--620. Google ScholarDigital Library
Sanfeliu, A., and Fu, K. S. A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man and Cybernetics 13, 5 (1983), 353--362.Google Scholar
Shasha, D., Wang, J. T. L., and Giugno, R. Algorithmics and applications of tree and graph searching. In Proc. of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), New York, USA, 2002, pp. 39--52. Google ScholarDigital Library
Ukkonen, E. Approximate string-matching with q-grams and maximal matches. Theor. Comput. Sci. 92, 1 (1992), 191--211. Google ScholarDigital Library
Ullmann, J. R. An algorithm for subgraph isomorphism. J. ACM 23, 1 (1976), 31--42. Google ScholarDigital Library
Wang, L. CookRecipe - towards a versatile and fully-fledged recipe analysis and learning system. Ph.D. thesis, Department of Computer Science, City University of Hong Kong, Hong Kong (Jan. 2008).Google Scholar
Yan, X., and Han, J. gSpan: Graph-based substructure pattern mining. In Proc. of the IEEE International Conference on Data Mining (ICDM), Washington DC, USA, 2002, p. 721. Google ScholarDigital Library
Yan, X., Yu, P. S., and Han, J. Graph indexing based on discriminative frequent structure analysis. ACM Trans. Database Syst. 30, 4 (2005), 960--993. Google ScholarDigital Library
Yan, X., Yu, P. S., and Han, J. Substructure similarity search in graph databases. In Proc. of the ACM SIGMOD International Conference on Management of Data (SIGMOD), New York, USA, 2005, pp. 766--777. Google ScholarDigital Library
Yang, R., Kalnis, P., and Tung, A. K. H. Similarity evaluation on tree-structured data. In Proc. of the ACM SIGMOD International Conference on Management of Data (SIGMOD), New York, USA, 2005, pp. 754--765. Google ScholarDigital Library

Index Terms

Substructure similarity measurement in chinese recipes
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction
  2. Information systems applications
    1. Data mining

Recommendations

Node.js Recipes: A Problem-Solution Approach
Read More
ASP.NET Core Recipes: A Problem-Solution Approach
Read More
An Optimization of Closed Frequent Subgraph Mining Algorithm
Abstract
Graph mining isamajor area of interest within the field of data mining in recent years. Akey aspect of graph mining is frequent subgraph mining. Central to the entire discipline of frequent subgraph mining is the concept of subgraph ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '08: Proceedings of the 17th international conference on World Wide Web
April 2008
1326 pages
ISBN:9781605580852
DOI:10.1145/1367497
General Chairs:
Jinpeng Huai
Beihang University, China
,
Robin Chen
AT&T Labs, USA
,
Hsiao-Wuen Hon
Microsoft Research Asia, China
,
Yunhao Liu
HK University of Science and Technology, Hong Kong
,
Program Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Andrew Tomkins
Yahoo! Research, USA
,
Xiaodong Zhang
The Ohio State University, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 April 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cooking graph
filtering
recipes
similarity measurement
subgraph mining
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 62
  Total Citations
  View Citations
- 734
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Substructure similarity measurement in chinese recipes

WWW '08: Proceedings of the 17th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Node.js Recipes: A Problem-Solution Approach

ASP.NET Core Recipes: A Problem-Solution Approach

An Optimization of Closed Frequent Subgraph Mining Algorithm