摘要
文章对中文非结构化文本中(半)自动获取学科术语的相关语义关系进行了探讨,以寻找行之有效的获取方法。从CNKI获取"数字图书馆"学科领域文献,通过术语抽取、动词抽取、向量空间模型构建、双重关联规则分析和规则评价获得了具有较强关联的术语对以及作为关联标签的动词,从而获取了学科术语的相关语义关系。该获取方法与其他方法相比,具有较高的可行性和有效性,并对术语的相关语义关系进行了有效性和实用性的评价,提高了获取的准确率。但文章也存在一定的局限性,在对术语相关语义关系的有效性和实用性进行评价时,指标的选择和阈值的确定存在人工干预,具有一定的主观性。
This paper discusses how to (semi-)automatically extract non-taxonomic relation of discipline terms from Chinese unstructured text so as to find feasible and effective extracting methods. First, papers of Digital Library are re- trieved from CNKI; then terms and transitive verbs are extracted; third, vector space models are constructed; fourth, asso- ciation rules are analyzed and evaluated; and last, the term pairs with stronger relation are acquired and the transitive verbs used as the labels of relation, thus the non-taxonomic relation of Chinese discipline terms is extracted. The above method is more feasible and effective than other methods, and it can improve the extracting accuracy by evaluating the effectiveness and practicality. This paper of course has limits. When evaluating the effectiveness and practicality of as- sociation rules, the indicators and thresholds are determined by manual intervention, so the method has subjectivity to some extent.
出处
《图书与情报》
CSSCI
北大核心
2017年第2期125-132,5,共9页
Library & Information
基金
江苏省社会科学基金一般项目"领域术语语义关系自动获取研究"(项目编号:15TQB009)
国家自然科学基金青年项目"面向学术资源的TSD与TDC测度及分析研究"(项目编号:71503121)研究成果之一
关键词
学科术语
相关语义关系
数据挖掘
关联规则
规则评价
discipline terms
non-taxonomic relation
data mining
association rules
rules evaluating