摘要
基于机器学习的分词模型可以借助科技词汇构词特征分析提升其在科技领域的适应性,本文对传统语言学的句法构词、韵律构词、语义构词几个方面理论进行总结归纳,融合术语学研究理论,围绕提升分词准确率的目的,提出了适用于科技词汇的构词特征标注系统,并对标注系统的结构进行了规划。这为科技词汇构词特征标注工作完成了前期的探索,为后期批量标注,辅助分词等环节提供了基础依据。
To improve the adaptability of word segmentation model in S&T domain, more features of S&T terms are needed. Based on the exploration on syntactic, rhetoric and semantic method of word formation, as well as terminology, tags are extracted and a labeling system is roughly designed aiming at improving the accuracy of word-parsing system. The research work on S&T word formation is not only the preliminary exploration of S&T terms tagging, but also the foundation of large size tagging and word segmentation.
作者
周雷
李颖
石崇德
ZHOU Lei LI Ying SHI Chongde(Institute of Scientific and Technical Information of China, Beijing 10038, China Wanfang Data Co., Ltd, Beijing 10038, China)
出处
《情报工程》
2015年第3期64-75,共12页
Technology Intelligence Engineering
基金
国家自然科学基金项目"面向科技监测的实体识别与关系抽取研究"(编号:71403257)的资助
关键词
汉语科技词汇
构词法
词标注
Chinese science and technique terms, word formation, word tagging