期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
分词技术的研究与应用——一种抽取新词的简便方法 被引量:3
1
作者 吴宏洲 《软件工程师》 2015年第12期64-68,共5页
一种无需语料库和复杂数学模型支持的抽取新词最简方法。通过扫描文献文字流,消除停用字词,切分单元子句,对子句枚举可能的候选词条,统计候选词条频度,计算长短包含关系候选词之间的置信度值,只须依据大于90%的值来消除短词,得到候选关... 一种无需语料库和复杂数学模型支持的抽取新词最简方法。通过扫描文献文字流,消除停用字词,切分单元子句,对子句枚举可能的候选词条,统计候选词条频度,计算长短包含关系候选词之间的置信度值,只须依据大于90%的值来消除短词,得到候选关键词,再经过已有词库过滤,留下新词。该方法可作为信息加工的辅助工具。 展开更多
关键词 停用词 候选分词 置信度 抽取新词
下载PDF
Unified Framework of Performing Chinese Word Segmentation and Part-of-Speech Tagging 被引量:3
2
作者 Zhang Kaixu Sun Maosong 《China Communications》 SCIE CSCD 2012年第3期1-9,共9页
The paper proposes a unified framework to combine the advantages of the fast one-at-a-time approach and the high-performance all-at-once approach to perform Chinese Word Segmentation(CWS) and Part-of-Speech(PoS) taggi... The paper proposes a unified framework to combine the advantages of the fast one-at-a-time approach and the high-performance all-at-once approach to perform Chinese Word Segmentation(CWS) and Part-of-Speech(PoS) tagging.In this framework,the input of the PoS tagger is a candidate set of several CWS results provided by the CWS model.The widely used one-at-a-time approach and all-at-once approach are two extreme cases of the proposed candidate-based approaches.Experiments on Penn Chinese Treebank 5 and Tsinghua Chinese Treebank show that the generalized candidate-based approach outperforms one-at-a-time approach and even the all-at-once approach.The candidate-based approach is also faster than the time-consuming all-at-once approach.The authors compare three different methods based on sentence,words and character-intervals to generate the candidate set.It turns out that the word-based method has the best performance. 展开更多
关键词 natural language processing Chineseword segmentation PoS tagging CANDIDATE wordlattice
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部