期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
MT-Oriented English PoS Tagging and Its Application to Noun Phrase Chunking
1
作者 Ma Jianjun Huang Degen +1 位作者 Liu Haixia Sheng Wenfeng 《China Communications》 SCIE CSCD 2012年第3期58-67,共10页
A hybrid approach to English Part-of-Speech(PoS) tagging with its target application being English-Chinese machine translation in business domain is presented,demonstrating how a present tagger can be adapted to learn... A hybrid approach to English Part-of-Speech(PoS) tagging with its target application being English-Chinese machine translation in business domain is presented,demonstrating how a present tagger can be adapted to learn from a small amount of data and handle unknown words for the purpose of machine translation.A small size of 998 k English annotated corpus in business domain is built semi-automatically based on a new tagset;the maximum entropy model is adopted,and rule-based approach is used in post-processing.The tagger is further applied in Noun Phrase(NP) chunking.Experiments show that our tagger achieves an accuracy of 98.14%,which is a quite satisfactory result.In the application to NP chunking,the tagger gives rise to 2.21% increase in F-score,compared with the results using Stanford tagger. 展开更多
关键词 English pos tagging maximum entro- py rule-based approach machine translation NP chunking
下载PDF
Unified Framework of Performing Chinese Word Segmentation and Part-of-Speech Tagging 被引量:3
2
作者 Zhang Kaixu Sun Maosong 《China Communications》 SCIE CSCD 2012年第3期1-9,共9页
The paper proposes a unified framework to combine the advantages of the fast one-at-a-time approach and the high-performance all-at-once approach to perform Chinese Word Segmentation(CWS) and Part-of-Speech(PoS) taggi... The paper proposes a unified framework to combine the advantages of the fast one-at-a-time approach and the high-performance all-at-once approach to perform Chinese Word Segmentation(CWS) and Part-of-Speech(PoS) tagging.In this framework,the input of the PoS tagger is a candidate set of several CWS results provided by the CWS model.The widely used one-at-a-time approach and all-at-once approach are two extreme cases of the proposed candidate-based approaches.Experiments on Penn Chinese Treebank 5 and Tsinghua Chinese Treebank show that the generalized candidate-based approach outperforms one-at-a-time approach and even the all-at-once approach.The candidate-based approach is also faster than the time-consuming all-at-once approach.The authors compare three different methods based on sentence,words and character-intervals to generate the candidate set.It turns out that the word-based method has the best performance. 展开更多
关键词 natural language processing Chineseword segmentation pos tagging CANDIDATE wordlattice
下载PDF
A Semi-Supervised Approach for Aspect Category Detection and Aspect Term Extraction from Opinionated Text 被引量:1
3
作者 Bishrul Haq Sher Muhammad Daudpota +2 位作者 Ali Shariq Imran Zenun Kastrati Waheed Noor 《Computers, Materials & Continua》 SCIE EI 2023年第10期115-137,共23页
The Internet has become one of the significant sources for sharing information and expressing users’opinions about products and their interests with the associated aspects.It is essential to learn about product revie... The Internet has become one of the significant sources for sharing information and expressing users’opinions about products and their interests with the associated aspects.It is essential to learn about product reviews;however,to react to such reviews,extracting aspects of the entity to which these reviews belong is equally important.Aspect-based Sentiment Analysis(ABSA)refers to aspects extracted from an opinionated text.The literature proposes different approaches for ABSA;however,most research is focused on supervised approaches,which require labeled datasets with manual sentiment polarity labeling and aspect tagging.This study proposes a semisupervised approach with minimal human supervision to extract aspect terms by detecting the aspect categories.Hence,the study deals with two main sub-tasks in ABSA,named Aspect Category Detection(ACD)and Aspect Term Extraction(ATE).In the first sub-task,aspects categories are extracted using topic modeling and filtered by an oracle further,and it is fed to zero-shot learning as the prompts and the augmented text.The predicted categories are the input to find similar phrases curated with extracting meaningful phrases(e.g.,Nouns,Proper Nouns,NER(Named Entity Recognition)entities)to detect the aspect terms.The study sets a baseline accuracy for two main sub-tasks in ABSA on the Multi-Aspect Multi-Sentiment(MAMS)dataset along with SemEval-2014 Task 4 subtask 1 to show that the proposed approach helps detect aspect terms via aspect categories. 展开更多
关键词 Natural language processing sentiment analysis aspect-based sentiment analysis topic-modeling pos tagging zero-shot learning
下载PDF
Chinese New Word Identification:A Latent Discriminative Model with Global Features 被引量:11
4
作者 孙晓 黄德根 +1 位作者 宋海玉 任福继 《Journal of Computer Science & Technology》 SCIE EI CSCD 2011年第1期14-24,共11页
Chinese new words are particularly problematic in Chinese natural language processing. With the fast development of Internet and information explosion, it is impossible to get a complete system lexicon for application... Chinese new words are particularly problematic in Chinese natural language processing. With the fast development of Internet and information explosion, it is impossible to get a complete system lexicon for applications in Chinese natural language processing, as new words out of dictionaries are always being created. The procedure of new words identification and POS tagging are usually separated and the features of lexical information cannot be fully used. A latent discriminative model, which combines the strengths of Latent Dynamic Conditional Random Field (LDCRF) and semi-CRF, is proposed to detect new words together with their POS synchronously regardless of the types of new words from Chinese text without being pre-segmented. Unlike semi-CRF, in proposed latent discriminative model, LDCRF is applied to generate candidate entities, which accelerates the training speed and decreases the computational cost. The complexity of proposed hidden semi-CRF could be further adjusted by tuning the number of hidden variables and the number of candidate entities from the Nbest outputs of LDCRF model. A new-word-generating framework is proposed for model training and testing, under which the definitions and distributions of new words conform to the ones in real text. The global feature called "Global Fragment Features" for new word identification is adopted. We tested our model on the corpus from SIGHAN-6. Experimental results show that the proposed method is capable of detecting even low frequency new words together with their POS tags with satisfactory results. The proposed model performs competitively with the state-of-the-art models. 展开更多
关键词 new word identification new words pos tagging conditional random fields hidden semi-CRF global fragment features
原文传递
Pretrained Models and Evaluation Data for the Khmer Language
5
作者 Shengyi Jiang Sihui Fu +1 位作者 Nankai Lin Yingwen Fu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第4期709-718,共10页
Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(N... Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(NLP)tasks.In recent years,PTMs have been widely used in most NLP applications,especially for high-resource languages,such as English and Chinese.However,scarce resources have discouraged the progress of PTMs for low-resource languages.Transformer-based PTMs for the Khmer language are presented in this work for the first time.We evaluate our models on two downstream tasks:Part-of-speech tagging and news categorization.The dataset for the latter task is self-constructed.Experiments demonstrate the effectiveness of the Khmer models.In addition,we find that the current Khmer word segmentation technology does not aid performance improvement.We aim to release our models and datasets to the community in hopes of facilitating the future development of Khmer NLP applications. 展开更多
关键词 pretrained models Khmer language word segmentation part-of-speech(pos)tagging news categorization
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部