期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Automatically building large-scale named entity recognition corpora from Chinese Wikipedia
1
作者 Jie ZHOU Bi-cheng LI Gang CHEN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2015年第11期940-956,共17页
Named entity recognition (NER) is a core component in many natural language processing applications. Most NER systems rely on supervised machine learning methods, which depend on time-consuming and expensive annotat... Named entity recognition (NER) is a core component in many natural language processing applications. Most NER systems rely on supervised machine learning methods, which depend on time-consuming and expensive annotations in different languages and domains. This paper presents a method for automatically building silver-standard NER corpora from Chinese Wikipedia. We refine novel and language-dependent features by exploiting the text and structure of Chinese Wikipedia. To reduce tagging errors caused by entity classification, we design four types of heuristic rules based on the characteristics of Chinese Wikipedia and train a supervised NE classifier, and a combined method is used to improve the precision and coverage. Then, we realize type identification of implicit mention by using boundary information of outgoing links. By selecting the sentences related with the domains of test data, we can train better NER models. In the experiments, large-scale NER corpora containing 2.3 million sentences are built from Chinese Wikipedia. The results show the effectiveness of automatically annotated corpora, and the trained NER models achieve the best performance when combining our silver-standard corpora with gold-standard corpora. 展开更多
关键词 NER corpora chinese wikipedia Entity classification Domain adaptation Corpus selection
原文传递
Entity generation algorithm based on reference expansion
2
作者 Jia-Jia Ruan Xi-Xu He +1 位作者 Min Zhang Yuan Gao 《Journal of Electronic Science and Technology》 EI CAS CSCD 2023年第3期63-72,共10页
The extraction and understanding of text knowledge become increasingly crucial in the age of big data.One of the current research areas in the field of natural language processing(NLP)is how to accurately understand t... The extraction and understanding of text knowledge become increasingly crucial in the age of big data.One of the current research areas in the field of natural language processing(NLP)is how to accurately understand the text and collect accurate linguistic information because Chinese vocabulary is diverse and ambiguous.This paper mainly studies the candidate entity generation module of the entity link system.The candidate entity generation module constructs an entity reference expansion algorithm to improve the recall rate of candidate entities.In order to improve the efficiency of the connection algorithm of the entire system while ensuring the recall rate of candidate entities,we design a graph model filtering algorithm that fuses shallow semantic information to filter the list of candidate entities,and verify and analyze the efficiency of the algorithm through experiments.By analyzing the related technology of the entity linking algorithm,we study the related technology of candidate entity generation and entity disambiguation,improve the traditional entity linking algorithm,and give an innovative and practical entity linking model.The recall rate exceeds 82%,and the link accuracy rate exceeds 73%.Efficient and accurate entity linking can help machines to better understand text semantics,further promoting the development of NLP and improving the users’knowledge acquisition experience on the text. 展开更多
关键词 chinese wikipedia Entity reference expansion Graph model
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部