摘要
为提高领域本体概念及概念之间关系提取效率和准确率,提出基于中文文本的领域本体学习模型。在提取候选概念的过程中,采用修改后的关联规则频繁项计算方法对合成词进行处理,并结合位图存储分词处理后术语间的物理相邻关系,再通过计算领域相关度和领域一致度对候选概念进行筛选,最后利用关联规则可信度和层次聚类的方法分别提取概念间的非分类关系和分类关系。实验结果表明,该模型对领域本体学习具有合理性,提出的算法与基于互信息的本体学习相比较,在概念和关系的提取上具有较高的准确性。
To improve the efficiency and accuracy in choosing concepts and relations of domain ontology,we present an unstructured data based ontology learning model.In the process of extracting the candidate concepts for synthetic word processing,we modified calculation method of frequent item of association rules,and combined with a bitmap to store physically adjacent relationship between the terms after word processing.We filter candidate concepts by calculating areas correlation and areas consistent degree.The association rule credibility and hierarchical clustering methods were used to extract non-taxonomic relations between concepts and classification relationships.Experimental results show that this model is rational in the aspect of domain ontology learning and this algorithm is efficient and accurate in the aspect of extracting concepts and relationships.
出处
《吉林大学学报(信息科学版)》
CAS
2014年第1期76-81,共6页
Journal of Jilin University(Information Science Edition)
基金
吉林省科技厅自然科学基金资助项目(20130101060JC)
关键词
本体学习
非结构化数据
关联规则
位图
层次聚类
ontology learning
unstructured data
association rules
bitmap
hierarchical clustering