期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node
1
作者 Nuo Qun Hang Yan +1 位作者 Xi-Peng Qiu Xuan-Jing Huang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第5期1115-1126,共12页
Semi-Markov conditional random fields(Semi-CRFs)have been successfully utilized in many segmentation problems,including Chinese word segmentation(CWS).The advantage of Semi-CRF lies in its inherent ability to exploit ... Semi-Markov conditional random fields(Semi-CRFs)have been successfully utilized in many segmentation problems,including Chinese word segmentation(CWS).The advantage of Semi-CRF lies in its inherent ability to exploit properties of segments instead of individual elements of sequences.Despite its theoretical advantage,Semi-CRF is still not the best choice for CWS because its computation complexity is quadratic to the sentenced length.In this paper,we propose a simple yet effective framework to help Semi-CRF achieve comparable performance with CRF-based models under similar computation complexity.Specifically,we first adopt a bi-directional long short-term memory(BiLSTM)on character level to model the context information,and then use simple but effective fusion layer to represent the segment information.Besides,to model arbitrarily long segments within linear time complexity,we also propose a new model named Semi-CRF-Relay.The direct modeling of segments makes the combination with word features easy and the CWS performance can be enhanced merely by adding publicly available pre-trained word embeddings.Experiments on four popular CWS datasets show the effectiveness of our proposed methods.The source codes and pre-trained embeddings of this paper are available on https://github.com/fastnlp/fastNLP/. 展开更多
关键词 Semi-Markov conditional random field(Semi-CRF) Chinese word segmentation bi-directional long short-term memory deep learning
原文传递
Construction of Word Segmentation Model Based on HMM+BI-LSTM
2
作者 Hang Zhang Bin Wen 《国际计算机前沿大会会议论文集》 2020年第2期47-61,共15页
Chinese word segmentation plays an important role in search engine,artificial intelligence,machine translation and so on.There are currently three main word segmentation algorithms:dictionary-based word segmentation a... Chinese word segmentation plays an important role in search engine,artificial intelligence,machine translation and so on.There are currently three main word segmentation algorithms:dictionary-based word segmentation algorithms,statistics-based word segmentation algorithms,and understandingbased word segmentation algorithms.However,few people combine these three methods or two of them.Therefore,a Chinese word segmentation model is proposed based on a combination of statistical word segmentation algorithm and understanding-based word segmentation algorithm.It combines Hidden Markov Model(HMM)word segmentation and Bi-LSTM word segmentation to improve accuracy.The main method is to make lexical statistics on the results of the two participles,and to choose the best results based on the statistical results,and then to combine them into the final word segmentation results.This combined word segmentation model is applied to perform experiments on the MSRA corpus provided by Bakeoff.Experiments show that the accuracy of word segmentation results is 12.52%higher than that of traditional HMM model and 0.19%higher than that of BI-LSTM model. 展开更多
关键词 Chinese word segmentation HMM BI-LSTM Sequence tagging
原文传递
Resolution of overlapping ambiguity strings based on maximum entropy model 被引量:1
3
作者 ZHANG Feng FAN Xiao-zhong 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2006年第3期273-276,共4页
The resolution of overlapping ambiguity strings(OAS)is studied based on the maximum entropy model.There are two model outputs,where either the first two characters form a word or the last two characters form a word.Th... The resolution of overlapping ambiguity strings(OAS)is studied based on the maximum entropy model.There are two model outputs,where either the first two characters form a word or the last two characters form a word.The features of the model include one word in con-text of OAS,the current OAS and word probability relation of two kinds of segmentation results.OAS in training text is found by the combination of the FMM and BMM segmen-tation method.After feature tagging they are used to train the maximum entropy model.The People Daily corpus of January 1998 is used in training and testing.Experimental results show a closed test precision of 98.64%and an open test precision of 95.01%.The open test precision is 3.76%better compared with that of the precision of common word probability method. 展开更多
关键词 Chinese information processing Chinese auto-matic word segmentation overlapping ambiguity strings maximum entropy model
原文传递
Pretrained Models and Evaluation Data for the Khmer Language
4
作者 Shengyi Jiang Sihui Fu +1 位作者 Nankai Lin Yingwen Fu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第4期709-718,共10页
Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(N... Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(NLP)tasks.In recent years,PTMs have been widely used in most NLP applications,especially for high-resource languages,such as English and Chinese.However,scarce resources have discouraged the progress of PTMs for low-resource languages.Transformer-based PTMs for the Khmer language are presented in this work for the first time.We evaluate our models on two downstream tasks:Part-of-speech tagging and news categorization.The dataset for the latter task is self-constructed.Experiments demonstrate the effectiveness of the Khmer models.In addition,we find that the current Khmer word segmentation technology does not aid performance improvement.We aim to release our models and datasets to the community in hopes of facilitating the future development of Khmer NLP applications. 展开更多
关键词 pretrained models Khmer language word segmentation part-of-speech(POS)tagging news categorization
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部