期刊文献+

危化品领域专业分词库构建与应用研究 被引量:1

Research on Construction and Application of Professional Corpus in the Field of Hazardous Chemicals
下载PDF
导出
摘要 现有通用分词库无法在危化品领域专业文本中取得良好的分词效果,因此采用结合机器学习分词结果与人工筛选整理的方式提取专业语料,设计包含词表、词条及词条属性的分词库分级架构,提出基于标识树结构的词库特征向量索引方法,形成危化品领域专业分词库,并将该分词库作为自定义词典应用于分词模型中进行验证。结果表明,与通用分词库相比,专业分词库能够提升危化品领域文本分词准确率,有助于对危化品领域文档数据进行深化分析。 Given that the current professional corpus in the field of hazardous chemicals cannot achieve good analysis results in the professional text of hazardous chemicals,the professional corpus was extracted by combining machine learning algorithm and manual screening and sorting.The hierarchical structure of thesaurus including thesaurus,term and term attributes were designed.A feature vector index method based on identification tree structure was proposed to form a professional thesaurus in the field of hazardous chemicals,and the thesaurus was applied to the word segmentation model as a custom dictionary for verification.Experimental results showed that,compared with the general corpus,the proposed corpus can greatly improve the word segmentation accuracy of the text in the field of hazardous chemicals,and lay a foundation for the in-depth analysis and utilization of document data.
作者 蒋瀚 Jiang Han(SINOPEC Research Institute of Safety Engineering Co.,Ltd.,Shandong,Qingdao,266104)
出处 《安全、健康和环境》 2022年第6期66-70,共5页 Safety Health & Environment
基金 中国石化科技部项目(A-539),基于电子标签的Handle标识解析技术及装备研发。
关键词 危化品 分词库 机器学习 自然语言处理 文本挖掘 hazardous chemicals corpus machine learning natural language processing text mining
  • 相关文献

参考文献10

二级参考文献139

共引文献488

同被引文献6

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部