摘要
使用万方数据1987-2009年的专业文献数据,抽取所有工业技术类的16个二级类文献,随机统计这些文献中作者关键词的专业词频与类目间相对词频值的标准差。实验结果为50%以上的关键词可以归到1个类目中,将近90%的关键词可以归到1-3个类目中;如果关键词属于3个或3个以上的类目,当词频小于11时,16%的词可归类,当词频等于大于11时,49%的词可归类。实验结论认为,通过词频统计与标准差计算可以实现机器辅助的关键词快速分类,显著减轻传统人工分类方法的工作量。
With 1987 -2009 documents in Wanfang Data, the paper collects all documents of industrial technology. Within 16 second categories, it computes the keywords frequency, and calculates the standard deviation of keywords within relative categories. There are more than 50% keywords can be attributed to one category, and nearly 90% keywords can be put in 1 -3 categories. If keywords belong to 3 or more than 3 categories, when the word frequency is less than 11, 16% of the words can be categorized ; when word frequency is equal or greater than 11, and 49% of the words can be categorized. Test concludes that keywords can be classified by machine - aided with keyword frequency statistics and standard deviation, which is better than traditional classification method.
出处
《现代图书情报技术》
CSSCI
北大核心
2011年第10期34-39,共6页
New Technology of Library and Information Service
基金
国家社会科学基金项目"网络环境下叙词表的编制模式与应用方式研究"(项目编号:10BTQ048)的研究成果之一
关键词
叙词表
本体
概念
分类
词频
Thesaurus Ontology Concept Classification Keywords frequency