期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
Chinese word segmentation with local and global context representation learning 被引量:2
1
作者 李岩 Zhang Yinghua +2 位作者 Huang Xiaoping Yin Xucheng Hao Hongwei 《High Technology Letters》 EI CAS 2015年第1期71-77,共7页
A local and global context representation learning model for Chinese characters is designed and a Chinese word segmentation method based on character representations is proposed in this paper. First, the proposed Chin... A local and global context representation learning model for Chinese characters is designed and a Chinese word segmentation method based on character representations is proposed in this paper. First, the proposed Chinese character learning model uses the semanties of loeal context and global context to learn the representation of Chinese characters. Then, Chinese word segmentation model is built by a neural network, while the segmentation model is trained with the eharaeter representations as its input features. Finally, experimental results show that Chinese charaeter representations can effectively learn the semantic information. Characters with similar semantics cluster together in the visualize space. Moreover, the proposed Chinese word segmentation model also achieves a pretty good improvement on precision, recall and f-measure. 展开更多
关键词 local and global context representation learning chinese character representa- tion chinese word segmentation
下载PDF
Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node 被引量:2
2
作者 Nuo Qun Hang Yan +1 位作者 Xi-Peng Qiu Xuan-Jing Huang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第5期1115-1126,共12页
Semi-Markov conditional random fields(Semi-CRFs)have been successfully utilized in many segmentation problems,including Chinese word segmentation(CWS).The advantage of Semi-CRF lies in its inherent ability to exploit ... Semi-Markov conditional random fields(Semi-CRFs)have been successfully utilized in many segmentation problems,including Chinese word segmentation(CWS).The advantage of Semi-CRF lies in its inherent ability to exploit properties of segments instead of individual elements of sequences.Despite its theoretical advantage,Semi-CRF is still not the best choice for CWS because its computation complexity is quadratic to the sentenced length.In this paper,we propose a simple yet effective framework to help Semi-CRF achieve comparable performance with CRF-based models under similar computation complexity.Specifically,we first adopt a bi-directional long short-term memory(BiLSTM)on character level to model the context information,and then use simple but effective fusion layer to represent the segment information.Besides,to model arbitrarily long segments within linear time complexity,we also propose a new model named Semi-CRF-Relay.The direct modeling of segments makes the combination with word features easy and the CWS performance can be enhanced merely by adding publicly available pre-trained word embeddings.Experiments on four popular CWS datasets show the effectiveness of our proposed methods.The source codes and pre-trained embeddings of this paper are available on https://github.com/fastnlp/fastNLP/. 展开更多
关键词 Semi-Markov conditional random field(Semi-CRF) chinese word segmentation bi-directional long short-term memory deep learning
原文传递
A New Word Detection Method for Chinese Based on Local Context Information 被引量:1
3
作者 曾华琳 周昌乐 郑旭玲 《Journal of Donghua University(English Edition)》 EI CAS 2010年第2期189-192,共4页
Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction b... Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction by partical match (PPM) segmenting algorithm for Chinese words based on extracting local context information, which adds the context information of the testing text into the local PPM statistical model so as to guide the detection of new words. The algorithm focuses on the process of online segmentatien and new word detection which achieves a good effect in the close or opening test, and outperforms some well-known Chinese segmentation system to a certain extent. 展开更多
关键词 new word detection improved PPM model context information chinese words segmentation
下载PDF
Feature study for improving Chinese overlapping ambiguity resolution based on SVM 被引量:1
4
作者 熊英 朱杰 《Journal of Southeast University(English Edition)》 EI CAS 2007年第2期179-184,共6页
In order to improve Chinese overlapping ambiguity resolution based on a support vector machine, statistical features are studied for representing the feature vectors. First, four statistical parameters-mutual informat... In order to improve Chinese overlapping ambiguity resolution based on a support vector machine, statistical features are studied for representing the feature vectors. First, four statistical parameters-mutual information, accessor variety, two-character word frequency and single-character word frequency are used to describe the feature vectors respectively. Then other parameters are tried to add as complementary features to the parameters which obtain the best results for further improving the classification performance. Experimental results show that features represented by mutual information, single-character word frequency and accessor variety can obtain an optimum result of 94. 39%. Compared with a commonly used word probability model, the accuracy has been improved by 6. 62%. Such comparative results confirm that the classification performance can be improved by feature selection and representation. 展开更多
关键词 support vector machine chinese overlapping ambiguity chinese word segmentation word probability model
下载PDF
Apriori and N-gram Based Chinese Text Feature Extraction Method 被引量:4
5
作者 王晔 黄上腾 《Journal of Shanghai Jiaotong university(Science)》 EI 2004年第4期11-14,20,共5页
A feature extraction, which means extracting the representative words from a text, is an important issue in text mining field. This paper presented a new Apriori and N-gram based Chinese text feature extraction method... A feature extraction, which means extracting the representative words from a text, is an important issue in text mining field. This paper presented a new Apriori and N-gram based Chinese text feature extraction method, and analyzed its correctness and performance. Our method solves the question that the exist extraction methods cannot find the frequent words with arbitrary length in Chinese texts. The experimental results show this method is feasible. 展开更多
关键词 Apriori algorithm N-GRAM chinese words segmentation feature extraction
下载PDF
Construction of Word Segmentation Model Based on HMM+BI-LSTM
6
作者 Hang Zhang Bin Wen 《国际计算机前沿大会会议论文集》 2020年第2期47-61,共15页
Chinese word segmentation plays an important role in search engine,artificial intelligence,machine translation and so on.There are currently three main word segmentation algorithms:dictionary-based word segmentation a... Chinese word segmentation plays an important role in search engine,artificial intelligence,machine translation and so on.There are currently three main word segmentation algorithms:dictionary-based word segmentation algorithms,statistics-based word segmentation algorithms,and understandingbased word segmentation algorithms.However,few people combine these three methods or two of them.Therefore,a Chinese word segmentation model is proposed based on a combination of statistical word segmentation algorithm and understanding-based word segmentation algorithm.It combines Hidden Markov Model(HMM)word segmentation and Bi-LSTM word segmentation to improve accuracy.The main method is to make lexical statistics on the results of the two participles,and to choose the best results based on the statistical results,and then to combine them into the final word segmentation results.This combined word segmentation model is applied to perform experiments on the MSRA corpus provided by Bakeoff.Experiments show that the accuracy of word segmentation results is 12.52%higher than that of traditional HMM model and 0.19%higher than that of BI-LSTM model. 展开更多
关键词 chinese word segmentation HMM BI-LSTM Sequence tagging
原文传递
Scaling Conditional Random Fields by One-Against-the-Other Decomposition 被引量:1
7
作者 赵海 揭春雨 《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第4期612-619,共8页
As a powerful sequence labeling model, conditional random fields (CRFs) have had successful applications in many natural language processing (NLP) tasks. However, the high complexity of CRFs training only allows a... As a powerful sequence labeling model, conditional random fields (CRFs) have had successful applications in many natural language processing (NLP) tasks. However, the high complexity of CRFs training only allows a very small tag (or label) set, because the training becomes intractable as the tag set enlarges. This paper proposes an improved decomposed training and joint decoding algorithm for CRF learning. Instead of training a single CRF model for all tags, it trains a binary sub-CRF independently for each tag. An optimal tag sequence is then produced by a joint decoding algorithm based on the probabilistic output of all sub-CRFs involved. To test its effectiveness, we apply this approach to tackling Chinese word segmentation (CWS) as a sequence labeling problem. Our evaluation shows that it can reduce the computational cost of this language processing task by 40-50% without any significant performance loss on various large-scale data sets. 展开更多
关键词 natural language processing machine learning conditional random fields chinese word segmentation
原文传递
Resolution of overlapping ambiguity strings based on maximum entropy model 被引量:1
8
作者 ZHANG Feng FAN Xiao-zhong 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2006年第3期273-276,共4页
The resolution of overlapping ambiguity strings(OAS)is studied based on the maximum entropy model.There are two model outputs,where either the first two characters form a word or the last two characters form a word.Th... The resolution of overlapping ambiguity strings(OAS)is studied based on the maximum entropy model.There are two model outputs,where either the first two characters form a word or the last two characters form a word.The features of the model include one word in con-text of OAS,the current OAS and word probability relation of two kinds of segmentation results.OAS in training text is found by the combination of the FMM and BMM segmen-tation method.After feature tagging they are used to train the maximum entropy model.The People Daily corpus of January 1998 is used in training and testing.Experimental results show a closed test precision of 98.64%and an open test precision of 95.01%.The open test precision is 3.76%better compared with that of the precision of common word probability method. 展开更多
关键词 chinese information processing chinese auto-matic word segmentation overlapping ambiguity strings maximum entropy model
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部