More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditi...More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.展开更多
Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not ref...Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not refer how to make extracted information available for knowledge system. This paper describes a novel approach to build a domain-specific knowledge service with the data retrieved from Hidden Web. Ontology serves to model the domain knowledge. Queries forms of different Web sites are translated into machine-understandable format, defined knowledge concepts, so that they can be accessed automatically. Also knowledge data are extracted from Web pages and organized in ontology format knowledge. The experiment proves the algorithm achieves high accuracy and the system facilitates constructing knowledge services greatly.展开更多
针对现有数学表达式检索系统中待检索表达式与目标文档之间的语义关联问题,在使用序列化特征提取方法解析La Te X表达式的基础上,提出一种基于Ontology的数学表达式检索方法。运用Ontology建立数学表达式及其概念之间的联系并构建数学...针对现有数学表达式检索系统中待检索表达式与目标文档之间的语义关联问题,在使用序列化特征提取方法解析La Te X表达式的基础上,提出一种基于Ontology的数学表达式检索方法。运用Ontology建立数学表达式及其概念之间的联系并构建数学表达式语义本体库,以达到输入关键词、概念、短语和数学名词可检索数学表达式语义相关文献的目的。实验结果表明,基于Ontology的数学表达式检索方法运用本体概念扩展查询结果集,使得查全率、查准率和扩展率均有一定程度提高。展开更多
本文以一个地理特征词表(Feature Type Thesaurus,FTT)为研究实例,提出了一种对领域Ontology进行自动丰富的方法。FTT描述了200多种地理特征类型,依照等级结构组织,用于标引和组织美国亚历山大数字图书馆地名表(ADL Gazetteer)中的6百...本文以一个地理特征词表(Feature Type Thesaurus,FTT)为研究实例,提出了一种对领域Ontology进行自动丰富的方法。FTT描述了200多种地理特征类型,依照等级结构组织,用于标引和组织美国亚历山大数字图书馆地名表(ADL Gazetteer)中的6百万个地名。为了对FTT进行自动丰富,(1)首先从地名中抽取和发现有检索价值的、表示地理特征类型的通用词;(2)根据它们和标引主题词间的同现关系,在相同词族词汇的聚类过程中,确定与之相对应的主题词,进而将提取出的通用词定位到FTT的等级结构中。充分利用已经存在的大量标引语料,实现通用词的定位分析是核心内容,并且实验结果证明有效性达到82.7%。这项研究的实质是从Ontology标引的语料库中自动提取领域知识和标引知识,达到对Ontology的自动丰富。这一方法可以应用到类似的语料库和知识库上,实现新术语的发现、Ontology自丰富及其互操作。展开更多
A new ontology-based question expansion (OBQE) method is proposed for question similarity calculation in a frequently asked question (FAQ) answering system. Traditional question similarity calculation methods use ...A new ontology-based question expansion (OBQE) method is proposed for question similarity calculation in a frequently asked question (FAQ) answering system. Traditional question similarity calculation methods use "word" to compose question vector, that the semantic relations between words are ignored. OBQE takes the relation as an important part. The process of the new system is:① to build two-layered domain ontology referring to WordNet and domain corpse;② to expand question trunks into domain cases;③ to use domain case composed vector to calculate question similarity. The experimental result shows that the performance of question similarity calculation with OBQE is being improved.展开更多
The information integration method of semantic web based on agent ontology(SWAO method) was put forward aiming at the problems in current network environment,which integrates,analyzes and processes enormous web inform...The information integration method of semantic web based on agent ontology(SWAO method) was put forward aiming at the problems in current network environment,which integrates,analyzes and processes enormous web information and extracts answers on the basis of semantics. With SWAO method as the clue,the following technologies were studied:the method of concept extraction based on semantic term mining,agent ontology construction method on account of multi-points and the answer extraction in view of semantic inference. Meanwhile,the structural model of the question answering system applying ontology was presented,which adopts OWL language to describe domain knowledge from where QA system infers and extracts answers by Jena inference engine. In the system testing,the precision rate reaches 86%,and the recalling rate is 93%. The experimental results prove that it is feasible to use the method to develop a question answering system,which is valuable for further study in more depth.展开更多
基金supported by the Knowledge Innovation Program of the Chinese Academy of Sciencesthe National High-Tech R&D Program of China(2008BAK49B05)
文摘More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.
基金This project is supported by Major International Cooperation Program of NSFC Grant 60221120145 Chinese Folk Music Digital Library.
文摘Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not refer how to make extracted information available for knowledge system. This paper describes a novel approach to build a domain-specific knowledge service with the data retrieved from Hidden Web. Ontology serves to model the domain knowledge. Queries forms of different Web sites are translated into machine-understandable format, defined knowledge concepts, so that they can be accessed automatically. Also knowledge data are extracted from Web pages and organized in ontology format knowledge. The experiment proves the algorithm achieves high accuracy and the system facilitates constructing knowledge services greatly.
文摘针对现有数学表达式检索系统中待检索表达式与目标文档之间的语义关联问题,在使用序列化特征提取方法解析La Te X表达式的基础上,提出一种基于Ontology的数学表达式检索方法。运用Ontology建立数学表达式及其概念之间的联系并构建数学表达式语义本体库,以达到输入关键词、概念、短语和数学名词可检索数学表达式语义相关文献的目的。实验结果表明,基于Ontology的数学表达式检索方法运用本体概念扩展查询结果集,使得查全率、查准率和扩展率均有一定程度提高。
文摘本文以一个地理特征词表(Feature Type Thesaurus,FTT)为研究实例,提出了一种对领域Ontology进行自动丰富的方法。FTT描述了200多种地理特征类型,依照等级结构组织,用于标引和组织美国亚历山大数字图书馆地名表(ADL Gazetteer)中的6百万个地名。为了对FTT进行自动丰富,(1)首先从地名中抽取和发现有检索价值的、表示地理特征类型的通用词;(2)根据它们和标引主题词间的同现关系,在相同词族词汇的聚类过程中,确定与之相对应的主题词,进而将提取出的通用词定位到FTT的等级结构中。充分利用已经存在的大量标引语料,实现通用词的定位分析是核心内容,并且实验结果证明有效性达到82.7%。这项研究的实质是从Ontology标引的语料库中自动提取领域知识和标引知识,达到对Ontology的自动丰富。这一方法可以应用到类似的语料库和知识库上,实现新术语的发现、Ontology自丰富及其互操作。
文摘A new ontology-based question expansion (OBQE) method is proposed for question similarity calculation in a frequently asked question (FAQ) answering system. Traditional question similarity calculation methods use "word" to compose question vector, that the semantic relations between words are ignored. OBQE takes the relation as an important part. The process of the new system is:① to build two-layered domain ontology referring to WordNet and domain corpse;② to expand question trunks into domain cases;③ to use domain case composed vector to calculate question similarity. The experimental result shows that the performance of question similarity calculation with OBQE is being improved.
基金Projects(60773462, 60672171) supported by the National Natural Science Foundation of ChinaProjects(2009AA12143, 2009AA012136) supported by the National High-Tech Research and Development Program of ChinaProject(20080430250) supported by the Foundation of Post-Doctor in China
文摘The information integration method of semantic web based on agent ontology(SWAO method) was put forward aiming at the problems in current network environment,which integrates,analyzes and processes enormous web information and extracts answers on the basis of semantics. With SWAO method as the clue,the following technologies were studied:the method of concept extraction based on semantic term mining,agent ontology construction method on account of multi-points and the answer extraction in view of semantic inference. Meanwhile,the structural model of the question answering system applying ontology was presented,which adopts OWL language to describe domain knowledge from where QA system infers and extracts answers by Jena inference engine. In the system testing,the precision rate reaches 86%,and the recalling rate is 93%. The experimental results prove that it is feasible to use the method to develop a question answering system,which is valuable for further study in more depth.