A method of part-of-speech tagging of English text based on closed-words, wold-form and rules, its abstract model and formal description of its realizing procedure are presented. Finally, an experimental example is gi...A method of part-of-speech tagging of English text based on closed-words, wold-form and rules, its abstract model and formal description of its realizing procedure are presented. Finally, an experimental example is givento illustrate the application of this method.展开更多
Previous studies have shown that there is potential semantic dependency between part-of-speech and semantic roles.At the same time,the predicate-argument structure in a sentence is important information for semantic r...Previous studies have shown that there is potential semantic dependency between part-of-speech and semantic roles.At the same time,the predicate-argument structure in a sentence is important information for semantic role labeling task.In this work,we introduce the auxiliary deep neural network model,which models semantic dependency between part-of-speech and semantic roles and incorporates the information of predicate-argument into semantic role labeling.Based on the framework of joint learning,part-of-speech tagging is used as an auxiliary task to improve the result of the semantic role labeling.In addition,we introduce the argument recognition layer in the training process of the main task-semantic role labeling,so the argument-related structural information selected by the predicate through the attention mechanism is used to assist the main task.Because the model makes full use of the semantic dependency between part-of-speech and semantic roles and the structural information of predicate-argument,our model achieved the F1 value of 89.0%on the WSJ test set of CoNLL2005,which is superior to existing state-of-the-art model about 0.8%.展开更多
In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language proc...In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system.展开更多
Hidden Markov Model(HMM) is a main solution to ambiguities in Chinese segmentation anti POS (part-of-speech) tagging. While most previous works tot HMM-based Chinese segmentation anti POS tagging eonsult POS infor...Hidden Markov Model(HMM) is a main solution to ambiguities in Chinese segmentation anti POS (part-of-speech) tagging. While most previous works tot HMM-based Chinese segmentation anti POS tagging eonsult POS informatiou in contexts, they do not utilize lexieal information which is crucial for resoMng certain morphologieal ambiguity. This paper proposes a method which incorporates lexieal information and wider context information into HMM. Model induction anti related smoothing technique are presented in detail. Experiments indicate that this technique improves the segmentation and tagging accuracy by nearly 1%.展开更多
Part of speech (POS) tagging determines the attributes of each word, and it is the fundamental work in machine translation, speech recognition, information retrieval and other fields. For Tibetan part-of-speech (TPOS)...Part of speech (POS) tagging determines the attributes of each word, and it is the fundamental work in machine translation, speech recognition, information retrieval and other fields. For Tibetan part-of-speech (TPOS) tagging, a tagging method is proposed based on bidirectional long short-term memory with conditional random field model (BiLSTM_CRF). Firstly, the designed TOPS tagging set and manual tagging corpus were used to get word vectors by embedding Tibetan words and corresponding TPOS tags in continuous bag-of-words (CBOW) model. Secondly, the word vectors were input into the BiLSTM_CRF model. To obtain the predictive score matrix, this model using the past input features and future input feature information respectively learned by forward long short-term memory (LSTM) and backward LSTM performs non-linear operations on the softmax layer. The prediction score matrix was input into the CRF model to judge the threshold value and calculate the sequence score error. Lastly, a Tibetan part of speech tagging model was got based on the BiLSTM_CRF model. The experimental results indicate that the accuracy of TPOS tagging model based on the BiLSTM_CRF model can reach 92.7%.展开更多
Fantasy novel is a kind of novel literature, which is different from other fictions. With the passage of time, fantasy novels have great development with human society. There are differences and similarities between C...Fantasy novel is a kind of novel literature, which is different from other fictions. With the passage of time, fantasy novels have great development with human society. There are differences and similarities between Chinese and Western fantasy novels,which make literary works widely spread and popular. This study uses corpus linguistics software(Tree-Tagger, Range) to analyze four famous Chinese and English fantasy novels and their English translation, observing the different of proportion of noun, verb,adverb and adjective and the lexical difficulty of the novel.展开更多
In order to improve the accuracy of text similarity calculation,this paper presents a text similarity function part of speech and word order-smooth inverse frequency(PO-SIF)based on sentence vector,which optimizes the...In order to improve the accuracy of text similarity calculation,this paper presents a text similarity function part of speech and word order-smooth inverse frequency(PO-SIF)based on sentence vector,which optimizes the classical SIF calculation method in two aspects:part of speech and word order.The classical SIF algorithm is to calculate sentence similarity by getting a sentence vector through weighting and reducing noise.However,the different methods of weighting or reducing noise would affect the efficiency and the accuracy of similarity calculation.In our proposed PO-SIF,the weight parameters of the SIF sentence vector are first updated by the part of speech subtraction factor,to determine the most crucial words.Furthermore,PO-SIF calculates the sentence vector similarity taking into the account of word order,which overcomes the drawback of similarity analysis that is mostly based on the word frequency.The experimental results validate the performance of our proposed PO-SIF on improving the accuracy of text similarity calculation.展开更多
Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(N...Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(NLP)tasks.In recent years,PTMs have been widely used in most NLP applications,especially for high-resource languages,such as English and Chinese.However,scarce resources have discouraged the progress of PTMs for low-resource languages.Transformer-based PTMs for the Khmer language are presented in this work for the first time.We evaluate our models on two downstream tasks:Part-of-speech tagging and news categorization.The dataset for the latter task is self-constructed.Experiments demonstrate the effectiveness of the Khmer models.In addition,we find that the current Khmer word segmentation technology does not aid performance improvement.We aim to release our models and datasets to the community in hopes of facilitating the future development of Khmer NLP applications.展开更多
文摘A method of part-of-speech tagging of English text based on closed-words, wold-form and rules, its abstract model and formal description of its realizing procedure are presented. Finally, an experimental example is givento illustrate the application of this method.
基金The work of this article is supported by Key Scientific Research Projects of Colleges and Universities in Henan Province(Grant No.20A520007)National Natural Science Foundation of China(Grant No.61402149).
文摘Previous studies have shown that there is potential semantic dependency between part-of-speech and semantic roles.At the same time,the predicate-argument structure in a sentence is important information for semantic role labeling task.In this work,we introduce the auxiliary deep neural network model,which models semantic dependency between part-of-speech and semantic roles and incorporates the information of predicate-argument into semantic role labeling.Based on the framework of joint learning,part-of-speech tagging is used as an auxiliary task to improve the result of the semantic role labeling.In addition,we introduce the argument recognition layer in the training process of the main task-semantic role labeling,so the argument-related structural information selected by the predicate through the attention mechanism is used to assist the main task.Because the model makes full use of the semantic dependency between part-of-speech and semantic roles and the structural information of predicate-argument,our model achieved the F1 value of 89.0%on the WSJ test set of CoNLL2005,which is superior to existing state-of-the-art model about 0.8%.
基金Project(60763001)supported by the National Natural Science Foundation of ChinaProjects(2009GZS0027,2010GZS0072)supported by the Natural Science Foundation of Jiangxi Province,China
文摘In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system.
基金国家高技术研究发展计划(863计划),the National Natural Science Foundation of China
文摘Hidden Markov Model(HMM) is a main solution to ambiguities in Chinese segmentation anti POS (part-of-speech) tagging. While most previous works tot HMM-based Chinese segmentation anti POS tagging eonsult POS informatiou in contexts, they do not utilize lexieal information which is crucial for resoMng certain morphologieal ambiguity. This paper proposes a method which incorporates lexieal information and wider context information into HMM. Model induction anti related smoothing technique are presented in detail. Experiments indicate that this technique improves the segmentation and tagging accuracy by nearly 1%.
文摘Part of speech (POS) tagging determines the attributes of each word, and it is the fundamental work in machine translation, speech recognition, information retrieval and other fields. For Tibetan part-of-speech (TPOS) tagging, a tagging method is proposed based on bidirectional long short-term memory with conditional random field model (BiLSTM_CRF). Firstly, the designed TOPS tagging set and manual tagging corpus were used to get word vectors by embedding Tibetan words and corresponding TPOS tags in continuous bag-of-words (CBOW) model. Secondly, the word vectors were input into the BiLSTM_CRF model. To obtain the predictive score matrix, this model using the past input features and future input feature information respectively learned by forward long short-term memory (LSTM) and backward LSTM performs non-linear operations on the softmax layer. The prediction score matrix was input into the CRF model to judge the threshold value and calculate the sequence score error. Lastly, a Tibetan part of speech tagging model was got based on the BiLSTM_CRF model. The experimental results indicate that the accuracy of TPOS tagging model based on the BiLSTM_CRF model can reach 92.7%.
文摘Fantasy novel is a kind of novel literature, which is different from other fictions. With the passage of time, fantasy novels have great development with human society. There are differences and similarities between Chinese and Western fantasy novels,which make literary works widely spread and popular. This study uses corpus linguistics software(Tree-Tagger, Range) to analyze four famous Chinese and English fantasy novels and their English translation, observing the different of proportion of noun, verb,adverb and adjective and the lexical difficulty of the novel.
基金supported by Chongqing Education Committee(20SKGH059)。
文摘In order to improve the accuracy of text similarity calculation,this paper presents a text similarity function part of speech and word order-smooth inverse frequency(PO-SIF)based on sentence vector,which optimizes the classical SIF calculation method in two aspects:part of speech and word order.The classical SIF algorithm is to calculate sentence similarity by getting a sentence vector through weighting and reducing noise.However,the different methods of weighting or reducing noise would affect the efficiency and the accuracy of similarity calculation.In our proposed PO-SIF,the weight parameters of the SIF sentence vector are first updated by the part of speech subtraction factor,to determine the most crucial words.Furthermore,PO-SIF calculates the sentence vector similarity taking into the account of word order,which overcomes the drawback of similarity analysis that is mostly based on the word frequency.The experimental results validate the performance of our proposed PO-SIF on improving the accuracy of text similarity calculation.
基金supported by the Major Projects of Guangdong Education Department for Foundation Research and Applied Research(No.2017KZDXM031)Guangzhou Science and Technology Plan Project(No.202009010021)。
文摘Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(NLP)tasks.In recent years,PTMs have been widely used in most NLP applications,especially for high-resource languages,such as English and Chinese.However,scarce resources have discouraged the progress of PTMs for low-resource languages.Transformer-based PTMs for the Khmer language are presented in this work for the first time.We evaluate our models on two downstream tasks:Part-of-speech tagging and news categorization.The dataset for the latter task is self-constructed.Experiments demonstrate the effectiveness of the Khmer models.In addition,we find that the current Khmer word segmentation technology does not aid performance improvement.We aim to release our models and datasets to the community in hopes of facilitating the future development of Khmer NLP applications.