摘要
【目的】利用在线招聘文档,准确分析用人单位需求,为解决劳动力供需失配提供技术支持。【方法】提出一种基于跨域迁移学习的专业技能词识别方法(CDTL-PSE)。CDTL-PSE将专业技能词的识别任务当作序列标注任务,首先将SIGHAN语料库分解为三个源域,利用插入在Bi-LSTM层和CRF层之间的域自适应层来有效实现从各个源域到目标域的跨域迁移学习;然后采用参数迁移法训练每个子模型;最后通过多数投票获得标签序列的预测结果。【结果】在自建在线招聘文档数据集上,相对于基线方法,使用交替训练的具有Bi-LSTM域自适应层的CDTL-PSE的F1值提高0.91%,能减少50%左右的标记样本。【局限】模型的可解释性有待进一步改善。【结论】CDTL-PSE能有效实现对技能词的自动抽取,还可有效缓解目标域标注样本的不足。
[Objective] This paper analyzes the online job postings and identifies the demands of employers accurately, aiming to address the skill gaps between supply and demand in the labor market. [Methods] We proposed a model with cross-domain transfer learning to recognize professional skill words(CDTL-PSE). This task was treated as sequence tagging like named entity recognition or term extraction in CDTL-PSE. It also decomposed the SIGHAN corpus into three source domains. A domain adaptation layer was inserted between the Bi-LSTM and the CRF layers, which helped us transfer learning from each source domain to the target domain.Then, we used parameter transfer approach to train each sub-model. Finally, we obtained the prediction of label sequence by majority vote. [Results] On the self-built online recruitment data set, compared with the baseline method, the proposed model improved the F1 value by 0.91%, and reduced the labeled samples by about 50%.[Limitations] The interpretability of CDTL-PSE needs to be further improved. [Conclusions] CDTL-PSE can automatically extract words on professional skills, and effectively increase the labeled samples in the target domain.
作者
易新河
杨鹏
文益民
Yi Xinhe;Yang Peng;Wen Yimin(Library of Guilin University of Electronic Technology,Guilin 541004,China;School of Computer Science and Information Security,Guilin University of Electronic Technology,Guilin 541004,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2022年第2期274-288,共15页
Data Analysis and Knowledge Discovery
基金
教育部人文社会科学研究专项任务项目(项目编号:17JDGC022)
广西学位与研究生教育改革课题(项目编号:JGY2017055)
广西自然科学基金项目(项目编号:2018GXNSFDA138006)的研究成果之一。
关键词
专业技能词
跨域迁移学习
域自适应
Professional Skill Words
Cross Domain Transfer Learning
Domain Adaptation