摘要
【目的】利用文献的主题标引结果,发现其中隐含的重要语义关系。【方法】基于MEDLINE数据库中的生物医学主题标引文献,提出一种语义关系发现算法,涉及主题词组配原则、主题标引规则以及基于加权标引词和关系出现频次的优化方法等多个环节。【结果】收集疾病与症状方面的实验数据对算法进行实验验证,并结合领域专家审核,结果表明本文所发现语义关系的准确率可达到95%以上。【局限】本文所研究的语义关系发现算法仅适用于具有主题标引结果的文献。【结论】从大规模生物医学主题标引文献中发现中英文两种语言的语义关系是有效可行的,对其他领域语义关系的发现具有极高的借鉴意义。
[Objective] This paper tries to identify important and implicit semantic relations among the subject indexed papers. [Methods] Based on the subject indexed biomedical papers from MEDLINE, we proposed an algorithm consisting of subjects coordinating and indexing rules, as well as optimization rules for weighted indexing results and relation occurrences. The new algorithm was then exalnined with experimental disease data. [Results] With the help of domain experts' verification, the precision of the new algorithm was higher than 95%. [Limitations] The proposed method was only appropriate for papers with subject indexing. [Conclusions] The proposed algorithm can be used to identify semantic relations among English and Chinese subjects indexed biomedical papers, and help us develop algorithms in other areas.
出处
《现代图书情报技术》
CSSCI
2016年第7期87-93,共7页
New Technology of Library and Information Service
基金
国家社会科学基金项目"基于复杂网络的公众健康知识网络构建研究"(项目编号:15CTQ020)
中央级公益性科研院所基本科研业务费项目"生物医学术语服务系统建设关键问题研究"(项目编号:15R0109)的研究成果之一
关键词
语义关系发现
标引文献
组配原则
阈值
Finding semantic relations Indexed papers Coordinating rules Threshold