摘要
文章以突发事件新闻语料为研究背景,深度挖掘维基百科作为消解的背景语义知识,提炼出四类基于维基百科的特征,分别是解释网页内容的特征、同义词的特征、链接图的特征和分类图的特征。在标注的20万字语料上进行训练与测试,经过实验测试,证明将维基百科引入突发事件共指消解是一个有效的方法,系统F值为66.7%,其中,基于维基百科链接图的特征对系统贡献最大。利用爬山算法的SBS算法做特征选择,在剔除掉7个特征之后,使得系统F值提升了3.58%。
In this paper we mainly took emergency news corpus as the research background, With the help of a deep excavation of Wikipedia's semantic knowledge, we had extracted four categories of characteristics based on the Wikipedia, which respectively is the Wikipedia page text features, the hyperlink graph features and categories graph features. We trained and tested on the tagging 200000 Chinese characters corpus. Through experimental tests, introducing Wikipedia to co reference resolution was an effective method and F value of system was 66.7%, the hyperlink graph features made the largest contribution to the system.We selected mountain climbing algorithm(SBS algorithm) as feature selection to weed out the seven characteristics, which made the F value of system to increase by 3.58%.
出处
《信息通信》
2018年第1期4-8,共5页
Information & Communications
关键词
突发事件
共指消解
维基百科
语义特征
最大熵模型
Paroxysmal event
co reference resolution
semantic features
Wikipedia
maximum entropy mode