摘要
为了利用文本挖掘技术构建中文领域本体的方法,通过词性标注、依存句法分析、模式匹配等方法从非结构化的文本中自动抽取术语和关系。实验表明,本文的研究方法能够有效降低本体构建的复杂度,术语抽取更加全面,节省人力和时间成本。通过词形规则匹配和句法分析,自动抽取语义关系,包含整体部分关系、类属关系、关联关系以及用关联词表示的组合关系,语义更加丰富,构建的本体也能更好的反映领域的知识结构。
This paper proposes a construction method of Chinese domain ontology based on text mining technology. By using part-of-speech tagging, dependency parsing and pattern matching, we can automati- cally extract terms and relations from unstructured text. Experiment results show that the method can ef- fectively reduce the complexity of ontology construction, extract terms more comprehensive, save manpow- er and time cost. Through morphological rules matching and syntactic analysis,semantic relations can be automatically extracted which contains whole-part relations, generalizatin relations, association relations and syntagmatic relations represented by associated words.Semantics of the ontology is richer and the knowledge structure can be better reflected.
出处
《情报科学》
CSSCI
北大核心
2015年第6期3-10,共8页
Information Science
基金
国家自然科学基金项目(71173121)
国家社会科学重大项目(14ZDA063)
关键词
领域本体
文本挖掘
本体构建
模式匹配
句法分析
domain ontology
text mining
ontology construction
pattern matching
syntactic analysis