In order to improve the clustering results and select in the results, the ontology semantic is combined with document clustering. A new document clustering algorithm based WordNet in the phrase of document processing ...In order to improve the clustering results and select in the results, the ontology semantic is combined with document clustering. A new document clustering algorithm based WordNet in the phrase of document processing is proposed. First, every word vector by new entities is extended after the documents are represented by tf-idf. Then the feature extracting algorithm is applied for the documents. Finally, the algorithm of ontology aggregation clustering (OAC) is proposed to improve the result of document clustering. Experiments are based on the data set of Reuters 20 News Group, and experimental results are compared with the results obtained by mutual information(MI). The conclusion draws that the proposed algorithm of document clustering based on ontology is better than the other existed clustering algorithms such as MNB, CLUTO, co-clustering, etc.展开更多
In order to solve the problem of information retrieval on the semantic web, a new semantic information retrieval (SIR) model for searching ontologies on the semantic web is proposed. First, SIR transformed domain on...In order to solve the problem of information retrieval on the semantic web, a new semantic information retrieval (SIR) model for searching ontologies on the semantic web is proposed. First, SIR transformed domain ontologies into global ontologies. Then semantic index terms were extracted from these global ontologies. Based on semantic index terms, logical inferences can be performed and the logical views of the concept can be obtained. These logical views represent the expanded meaning of the concept. Using logical views, SIR can perform the information retrieval and inferences based on the semantic relationships in the documents, not only on the syntactic analysis of the documents. SIR can significantly enhance the recall and precision of the information retrieval by the semantic inference. Finally, the practicability of the SIR model is analyzed.展开更多
To alleviate the amount of work involved in constructing a domain ontology, starting with the base of an existing terminological-rich thesaurus is better than starting from scratch. With a case study of reengineering ...To alleviate the amount of work involved in constructing a domain ontology, starting with the base of an existing terminological-rich thesaurus is better than starting from scratch. With a case study of reengineering the Defense Science and Technology Thesaurus into a prototype military aircraft ontology, a four-phase thesaurus-based methodology is introduced and investigated, which consists of identifying the application purpose, overall design, designing in detail and evaluation. Designing in detail is the core step, converting the terms and semantic relationships of the thesaurus into an ontology and supplementing richer semantic relationships. The resulting prototype ontology includes 87 concepts and 34 relationships, and can be extended and scaled up to a full-fledged domain ontology in the future. Eight universal genres of relationships of this ontology are preliminarily summarized and analyzed, including equivalent relationships, approximate relationships, generic/abstract relationships, part/whole relationships, cause/effect relationships, entity/location relationships etc., and the normalization of semantic relationships is critical to the merging and reusing of follow-up multiple ontologies.展开更多
This paper discribes a data representation for WordNet 2.1 based on Web Ontology Language (OWL). The main components of WordNet database are transformed as classes in OWL, and the relations between synsets or lexcial ...This paper discribes a data representation for WordNet 2.1 based on Web Ontology Language (OWL). The main components of WordNet database are transformed as classes in OWL, and the relations between synsets or lexcial words are transformed as OWL properties. Our conversion is based on the data file of WordNet instead of the Prolog database. This work can be used to enrich the work in progress of standard conversion of WordNet to the RDF/OWL representation at W3C.展开更多
Both a general domain-independent bottom-up multi-level model and an algorithm for establishing the taxonomic relation of Chinese ontology are proposed.The model consists of extracting domain vocabularies and establis...Both a general domain-independent bottom-up multi-level model and an algorithm for establishing the taxonomic relation of Chinese ontology are proposed.The model consists of extracting domain vocabularies and establishing taxonomic relation,with the consideration of characteristics unique to Chinese natural language.By establishing the semantic forests of domain vocabularies and then using the existing semantic dictionary or machine-readable dictionary(MRD),the proposed algorithm can integrate these semantic forests together to establish the taxonomic relation.Experimental results show that the proposed algorithm is feasible and effective in establishing the integrated taxonomic relation among domain vocabularies and concepts.展开更多
基金The National Natural Science Foundation of China(No.60373099),the Natural Science Foundation for Young Scholars of Northeast Normal University (No.20061005)
文摘In order to improve the clustering results and select in the results, the ontology semantic is combined with document clustering. A new document clustering algorithm based WordNet in the phrase of document processing is proposed. First, every word vector by new entities is extended after the documents are represented by tf-idf. Then the feature extracting algorithm is applied for the documents. Finally, the algorithm of ontology aggregation clustering (OAC) is proposed to improve the result of document clustering. Experiments are based on the data set of Reuters 20 News Group, and experimental results are compared with the results obtained by mutual information(MI). The conclusion draws that the proposed algorithm of document clustering based on ontology is better than the other existed clustering algorithms such as MNB, CLUTO, co-clustering, etc.
基金The National Natural Science Foundation of China (No.60273072),the National High Technology Research and Development Program of China (863Program)(No.2002AA423450).
文摘In order to solve the problem of information retrieval on the semantic web, a new semantic information retrieval (SIR) model for searching ontologies on the semantic web is proposed. First, SIR transformed domain ontologies into global ontologies. Then semantic index terms were extracted from these global ontologies. Based on semantic index terms, logical inferences can be performed and the logical views of the concept can be obtained. These logical views represent the expanded meaning of the concept. Using logical views, SIR can perform the information retrieval and inferences based on the semantic relationships in the documents, not only on the syntactic analysis of the documents. SIR can significantly enhance the recall and precision of the information retrieval by the semantic inference. Finally, the practicability of the SIR model is analyzed.
文摘To alleviate the amount of work involved in constructing a domain ontology, starting with the base of an existing terminological-rich thesaurus is better than starting from scratch. With a case study of reengineering the Defense Science and Technology Thesaurus into a prototype military aircraft ontology, a four-phase thesaurus-based methodology is introduced and investigated, which consists of identifying the application purpose, overall design, designing in detail and evaluation. Designing in detail is the core step, converting the terms and semantic relationships of the thesaurus into an ontology and supplementing richer semantic relationships. The resulting prototype ontology includes 87 concepts and 34 relationships, and can be extended and scaled up to a full-fledged domain ontology in the future. Eight universal genres of relationships of this ontology are preliminarily summarized and analyzed, including equivalent relationships, approximate relationships, generic/abstract relationships, part/whole relationships, cause/effect relationships, entity/location relationships etc., and the normalization of semantic relationships is critical to the merging and reusing of follow-up multiple ontologies.
基金Project supported by the National Natural Science Foundation of China (No. 60373080)the 985 Project of Zhejiang University, China
文摘This paper discribes a data representation for WordNet 2.1 based on Web Ontology Language (OWL). The main components of WordNet database are transformed as classes in OWL, and the relations between synsets or lexcial words are transformed as OWL properties. Our conversion is based on the data file of WordNet instead of the Prolog database. This work can be used to enrich the work in progress of standard conversion of WordNet to the RDF/OWL representation at W3C.
基金Sponsored by the National Natural Science Foundation of China(Grant No.60496326 and No.10671045)
文摘Both a general domain-independent bottom-up multi-level model and an algorithm for establishing the taxonomic relation of Chinese ontology are proposed.The model consists of extracting domain vocabularies and establishing taxonomic relation,with the consideration of characteristics unique to Chinese natural language.By establishing the semantic forests of domain vocabularies and then using the existing semantic dictionary or machine-readable dictionary(MRD),the proposed algorithm can integrate these semantic forests together to establish the taxonomic relation.Experimental results show that the proposed algorithm is feasible and effective in establishing the integrated taxonomic relation among domain vocabularies and concepts.