More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditi...More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.展开更多
Due to its openness and timeliness,the S&T Web information has become one of the most important resources for strategic intelligence monitoring.However,since S&T Web information is unstructured and lack of sem...Due to its openness and timeliness,the S&T Web information has become one of the most important resources for strategic intelligence monitoring.However,since S&T Web information is unstructured and lack of semantic description,it is a challenge to transfer the unstructured Web information into structured semantic knowledge.To solve this problem,the authors propose a method for structural monitoring of the S&T Web information resources.By using the knowledge extraction technologies,the authors firstly extract the knowledge objects as well as the relationship between objects from the Web resources and convert the free text into calculable structured knowledge units.Based on those extracted structured information,the authors build various kinds of monitoring models to realize research profiling for specific research fields.Based on those ideas,the authors implement the automated Web information monitoring system suitable for research field monitoring.A research profiling experiment also is carried out based on the semantic resources which are converted from the monitored Web data.展开更多
With the tremendous amount of information available on the Web, the ability to quickly obtain information has become a crucial problem. It is not enough for us to acquire information only with Web information retrieva...With the tremendous amount of information available on the Web, the ability to quickly obtain information has become a crucial problem. It is not enough for us to acquire information only with Web information retrieval technology. Therefore more and more people pay attention to Web information extraction technology. This paper first in- troduces some concepts of information extraction technology, then introduces and analyzes several typical Web information extraction methods based on the differences in extraction patterns.展开更多
基金supported by the Knowledge Innovation Program of the Chinese Academy of Sciencesthe National High-Tech R&D Program of China(2008BAK49B05)
文摘More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.
基金an outcome of the project "The computing method of subject centrality of texts based on language network"(No.61075047) supported by National Natural Science Foundation of China
文摘Due to its openness and timeliness,the S&T Web information has become one of the most important resources for strategic intelligence monitoring.However,since S&T Web information is unstructured and lack of semantic description,it is a challenge to transfer the unstructured Web information into structured semantic knowledge.To solve this problem,the authors propose a method for structural monitoring of the S&T Web information resources.By using the knowledge extraction technologies,the authors firstly extract the knowledge objects as well as the relationship between objects from the Web resources and convert the free text into calculable structured knowledge units.Based on those extracted structured information,the authors build various kinds of monitoring models to realize research profiling for specific research fields.Based on those ideas,the authors implement the automated Web information monitoring system suitable for research field monitoring.A research profiling experiment also is carried out based on the semantic resources which are converted from the monitored Web data.
文摘With the tremendous amount of information available on the Web, the ability to quickly obtain information has become a crucial problem. It is not enough for us to acquire information only with Web information retrieval technology. Therefore more and more people pay attention to Web information extraction technology. This paper first in- troduces some concepts of information extraction technology, then introduces and analyzes several typical Web information extraction methods based on the differences in extraction patterns.