期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node
1
作者 Nuo Qun Hang Yan +1 位作者 Xi-Peng Qiu Xuan-Jing Huang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第5期1115-1126,共12页
Semi-Markov conditional random fields(Semi-CRFs)have been successfully utilized in many segmentation problems,including Chinese word segmentation(CWS).The advantage of Semi-CRF lies in its inherent ability to exploit ... Semi-Markov conditional random fields(Semi-CRFs)have been successfully utilized in many segmentation problems,including Chinese word segmentation(CWS).The advantage of Semi-CRF lies in its inherent ability to exploit properties of segments instead of individual elements of sequences.Despite its theoretical advantage,Semi-CRF is still not the best choice for CWS because its computation complexity is quadratic to the sentenced length.In this paper,we propose a simple yet effective framework to help Semi-CRF achieve comparable performance with CRF-based models under similar computation complexity.Specifically,we first adopt a bi-directional long short-term memory(BiLSTM)on character level to model the context information,and then use simple but effective fusion layer to represent the segment information.Besides,to model arbitrarily long segments within linear time complexity,we also propose a new model named Semi-CRF-Relay.The direct modeling of segments makes the combination with word features easy and the CWS performance can be enhanced merely by adding publicly available pre-trained word embeddings.Experiments on four popular CWS datasets show the effectiveness of our proposed methods.The source codes and pre-trained embeddings of this paper are available on https://github.com/fastnlp/fastNLP/. 展开更多
关键词 Semi-Markov conditional random field(Semi-CRF) chinese word segmentation bi-directional long short-term memory deep learning
原文传递
Construction of Word Segmentation Model Based on HMM+BI-LSTM
2
作者 Hang Zhang Bin Wen 《国际计算机前沿大会会议论文集》 2020年第2期47-61,共15页
Chinese word segmentation plays an important role in search engine,artificial intelligence,machine translation and so on.There are currently three main word segmentation algorithms:dictionary-based word segmentation a... Chinese word segmentation plays an important role in search engine,artificial intelligence,machine translation and so on.There are currently three main word segmentation algorithms:dictionary-based word segmentation algorithms,statistics-based word segmentation algorithms,and understandingbased word segmentation algorithms.However,few people combine these three methods or two of them.Therefore,a Chinese word segmentation model is proposed based on a combination of statistical word segmentation algorithm and understanding-based word segmentation algorithm.It combines Hidden Markov Model(HMM)word segmentation and Bi-LSTM word segmentation to improve accuracy.The main method is to make lexical statistics on the results of the two participles,and to choose the best results based on the statistical results,and then to combine them into the final word segmentation results.This combined word segmentation model is applied to perform experiments on the MSRA corpus provided by Bakeoff.Experiments show that the accuracy of word segmentation results is 12.52%higher than that of traditional HMM model and 0.19%higher than that of BI-LSTM model. 展开更多
关键词 chinese word segmentation HMM BI-LSTM Sequence tagging
原文传递
Scaling Conditional Random Fields by One-Against-the-Other Decomposition 被引量:1
3
作者 赵海 揭春雨 《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第4期612-619,共8页
As a powerful sequence labeling model, conditional random fields (CRFs) have had successful applications in many natural language processing (NLP) tasks. However, the high complexity of CRFs training only allows a... As a powerful sequence labeling model, conditional random fields (CRFs) have had successful applications in many natural language processing (NLP) tasks. However, the high complexity of CRFs training only allows a very small tag (or label) set, because the training becomes intractable as the tag set enlarges. This paper proposes an improved decomposed training and joint decoding algorithm for CRF learning. Instead of training a single CRF model for all tags, it trains a binary sub-CRF independently for each tag. An optimal tag sequence is then produced by a joint decoding algorithm based on the probabilistic output of all sub-CRFs involved. To test its effectiveness, we apply this approach to tackling Chinese word segmentation (CWS) as a sequence labeling problem. Our evaluation shows that it can reduce the computational cost of this language processing task by 40-50% without any significant performance loss on various large-scale data sets. 展开更多
关键词 natural language processing machine learning conditional random fields chinese word segmentation
原文传递
Resolution of overlapping ambiguity strings based on maximum entropy model 被引量:1
4
作者 ZHANG Feng FAN Xiao-zhong 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2006年第3期273-276,共4页
The resolution of overlapping ambiguity strings(OAS)is studied based on the maximum entropy model.There are two model outputs,where either the first two characters form a word or the last two characters form a word.Th... The resolution of overlapping ambiguity strings(OAS)is studied based on the maximum entropy model.There are two model outputs,where either the first two characters form a word or the last two characters form a word.The features of the model include one word in con-text of OAS,the current OAS and word probability relation of two kinds of segmentation results.OAS in training text is found by the combination of the FMM and BMM segmen-tation method.After feature tagging they are used to train the maximum entropy model.The People Daily corpus of January 1998 is used in training and testing.Experimental results show a closed test precision of 98.64%and an open test precision of 95.01%.The open test precision is 3.76%better compared with that of the precision of common word probability method. 展开更多
关键词 chinese information processing chinese auto-matic word segmentation overlapping ambiguity strings maximum entropy model
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部