摘要
使用中文文本挖掘方法来分析中国高校网页中各专业培养方案和培养目标的非结构化数据集。以K-means文本聚类算法和聚类结果归纳的各专业类别的技能关键词为基础,在集成了所有专业领域的专有特征和专家审核并结合了频率计算方法后,定义了技能指标与相应各个专业的重要性程度。最后,建立了专业和技能之间的关联知识库,为构建网络化创新外包人才技能模型建立了基础。通过实验评估发现,与基于基本中文语料库的分词方法相比较,在中文分词过程中引入专业专有特征的方法能够提供更加精确和合理的聚类结果。因此,本文提出的方法能够高效地构建专业技能关联知识库。
This paper uses Chinese text mining method to analyze the unstructured dataset of all major programs and their program goals listed on the webpages of China universities and colleges. Based on the text clustering algorithm of K-means and various skill-related keywords of every major summarized by the clustering results, we define the degree of importance of technical indices corresponding to all kinds of majors, after integrating the proprietary features extracted from the field of all majors and the professional audits by experts as well as combining the frequency calculation method. We finally establish a knowledge base that reflects the relationship between majors and skills, in order to build the foundation of the skill models for online innovative outsourcing talents. Through experimental evaluations, we find that by introducing the proprietary features of majors into the process of Chinese word segmentation, it provides an more accurate and reasonable clustering result, compared with using the word segmentation method based on the fundamental Chinese linguistic data. Therefore, we demonstrate that our proposed method could construct a major-skill correlation knowled~:e base in a more efficient way.
出处
《系统管理学报》
CSSCI
CSCD
北大核心
2017年第6期1007-1014,1021,共9页
Journal of Systems & Management
基金
国家自然科学基金青年项目(71301102)
国家自然科学基金资助项目(71171131)
国家自然科学基金委创新研究群体资助项目(71421002)
长江学者和创新团队发展计划资助项目(IRT13030)