Building a large-scale testing dataset for conceptual semantic annotation of text

One major obstacle facing the research on semantic annotation is lack of large-scale testing datasets. In this paper, we develop a systematic approach to constructing such datasets. This approach is based on guided ontology auto-construction and annotation methods which use little priori domain knowledge and little user knowledge in documents. We demonstrate the efficacy of the proposed approach by developing a large-scale testing dataset using information available from MeSH and PubMed. The developed testing dataset consists of a large-scale ontology, a large-scale set of annotated documents, and the baselines to evaluate the target algorithm, which can be employed to evaluate both the ontology construction algorithms and semantic annotation algorithms.

[1]  Anna V. Zhdanova,et al.  Community-driven ontology construction in social networking portals , 2008 .

[2]  Iadh Ounis,et al.  Query reformulation using automatically generated query concepts from a document space , 2006, Inf. Process. Manag..

[3]  Lay-Ki Soon,et al.  Automatic Ontology Construction in Fiction-Based Domain , 2011, Int. J. Softw. Eng. Knowl. Eng..

[4]  Pablo Castells,et al.  An Ontology-Based Information Retrieval Model , 2005, ESWC.

[5]  David E. Millard,et al.  Automatic Ontology-Based Knowledge Extraction from Web Documents , 2003, IEEE Intell. Syst..

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  Xiangfeng Luo,et al.  Concept Extraction based on Association Linked Network , 2010, 2010 Sixth International Conference on Semantics, Knowledge and Grids.

[8]  Yuh-Min Chen,et al.  Enhancement of domain ontology construction using a crystallizing approach , 2011, Expert Syst. Appl..

[9]  Paul Buitelaar,et al.  RelExt: A Tool for Relation Extraction from Text in Ontology Extension , 2005, SEMWEB.

[10]  Jun Zhang,et al.  Guided Game-Based Learning Using Fuzzy Cognitive Maps , 2010, IEEE Transactions on Learning Technologies.

[11]  Stefano Levialdi,et al.  Facilitating interaction and retrieval for annotated documents , 2010, Int. J. Comput. Sci. Eng..

[12]  Xiangfeng Luo,et al.  Automatic Facet Extraction Based on Multidimensional Semantic Index , 2012, 2012 Eighth International Conference on Semantics, Knowledge and Grids.

[13]  Jun Zhang,et al.  Online Comment-Based Hotel Quality Automatic Assessment Using Improved Fuzzy Comprehensive Evaluation and Fuzzy Cognitive Map , 2015, IEEE Transactions on Fuzzy Systems.

[14]  Chien-Hung Liu,et al.  Data flow analysis and testing for OWL-S semantic web service compositions , 2013, Int. J. Comput. Sci. Eng..

[15]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[16]  Xue Chen,et al.  Building Association Link Network for Semantic Link on Web Resources , 2011, IEEE Transactions on Automation Science and Engineering.

[17]  Wernhuar Tarng,et al.  A virtual reality design for learning the basic concepts of synchrotron light source , 2011, Int. J. Comput. Sci. Eng..

[18]  G. Sudha Sadasivam,et al.  Annotation-based document classification using shuffled frog leaping algorithm , 2014, Int. J. Comput. Sci. Eng..

[19]  Shijun Liu,et al.  Generating Associated Relation between Documents , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.