摘要
【目的】对如何从中文非结构化文本获取术语的层次关系进行探讨。【方法】从CNKI获取数字图书馆学科领域文献,通过术语抽取、术语向量空间模型构建、BIRCH算法聚类和聚类标签确定构建术语的语义层次结构。【结果】构建数字图书馆领域术语的层次结构,并对构建结果进行验证,聚类正确率达到80.88%,类标签抽取正确率达到89.71%。【局限】对构建效果的验证是通过随机抽样进行的,且仅与一种其他构建方法进行实证比较。【结论】应用BIRCH算法聚类构建术语层次结构,该方法与K-means聚类方法相比具有明显优势,具备较高的执行效率和聚类有效性。
[Objective] Discuss how to obtain the terminology taxonomic relation from Chinese domain unstructured text. [Methods] Based on Digital Library domain text from CNKI, construct terminology hierarchy by terminology extraction, terminology Vector Space Model construction, BIRCH clustering and cluster tag distribution. [Results] Obtain the terminology taxonomic relation of Digital Library domain, and evaluate the effectiveness. The accuracy of clustering reaches up to 80.88%, and the accuracy of cluster tag extraction reaches up to 89.71%. [Limitations] Evaluate the effectiveness by random sampling, and in comparison with one method only. [Conclusions] Making use of BIRCH algorithm to construct terminology taxonomic relation, this algorithm has obvious advantage compared with K-means clustering method, and has higher execution and clustering effectiveness.
出处
《现代图书情报技术》
CSSCI
2016年第1期73-80,共8页
New Technology of Library and Information Service
基金
江苏省自然科学基金项目"面向专利预警的中文本体学习研究"(项目编号:BK20130587)
中央高校基本科研业务费专项资金项目"我国图书情报学科知识结构及演化动态研究"(项目编号:20620140645)的研究成果之一
关键词
术语
层次关系
本体
本体学习
聚类
Terminology Taxonomic relation Ontology Ontology learning Clustering