期刊文献+

网络食品安全的歧义性消解算法

Disambiguation Algorithm Design and Implementation of Food Safety Issues in Network
下载PDF
导出
摘要 以网络食品安全信息为研究对象,旨在提出一个能够解决食品安全领域专有名词指代不明的歧义消解算法。文中采用的歧义消解算法是在改进的TF-IDF特征选择算法的基础上,结合了隐含马尔可夫模型(HMM)和SVM分类器,从而实现专有名词的歧义消解。提出了一个在TF-IDF的基础上增加两个加权因子的特征提取算法LN-TFIDF。实验表明,以202831条文本实验所得的准确率和召回率的调和平均值F1值为评价标准,设计的基于改进TFIDF的食品安全领域歧义消解算法的效果比基于传统TF-IDF的歧义消解算法平均提升了7.31%,且在不同时间抓取的实验数据集下,本算法的效果也相对稳定。 The article aimed to put forward a disambiguation algorithm which can correctly classify the unknown terms, based on the food safety information in network. The disambiguation algorithms used in this paper combines the hidden Markov model(HMM) and SVM classifier to achieve terminology disambiguation, based on the improved TF-IDF fea- ture selection algorithm. This paper proposed a new feature extraction algorithm LN-TF-IDF with two additional weighting factors on traditional TF-IDF. Experiments show that, the improved TF-1DF disamhiguation algorithm de- signed in the field of food safety enhances the effect of disambiguation by average 7. 31~ on the 202831 texts. It was compared with the traditional TF-IDF text feature selection algorithm, with the F-measure as evaluation criteria. At the same time, the effect of the algorithm is relatively stable on different experimental data sets obtained from different time.
出处 《计算机科学》 CSCD 北大核心 2015年第B11期7-9,26,共4页 Computer Science
基金 国家自然科学基金项目(61303214)资助
关键词 食品安全 歧义消解 隐含马尔可夫模型 TF-IDF 支持向量机 Food safety, Disambiguation, HMM, TF-IDF, SVM
  • 相关文献

参考文献7

  • 1何径舟,王厚峰.基于特征选择和最大熵模型的汉语词义消歧[J].软件学报,2010,21(6):1287-1295. 被引量:37
  • 2Pedersen T. A Decision Tree of Bigrams is an Accurate Predic- tor of Word Sense [C]//Proceedings of the Second Meeting of the North American Chapter of the Association for Computa- tional Linguistics(NAACL-01 ). Pittsburgh, PA, 2001.
  • 3Hoffart J,Yosef M A, Bordino H, et al. Robust Disambiguation of Named Entities in Text[C] // Proceedings of the 2011 Con- ference on Empirical Methods in Natural Language Processing. Edinburgh, Scotland, UK,2011 : 782 792.
  • 4李永亮,黄曙光,鲍蕾.一种基于PageRank算法和知网的词义消歧方法[J].计算机应用与软件,2011,28(5):213-215. 被引量:4
  • 5Mena B H,van K M. A Hybrid Approach for Robust Multilin- gual Toponym Extraction and Disambiguation [C]//Interna- tional Conference on Language Processing and Intelligent Infor mation Systems. Warsaw, Poland, 2013.
  • 6廖浩,李志蜀,王秋野,张意.基于词语关联的文本特征词提取方法[J].计算机应用,2007,27(12):3009-3012. 被引量:10
  • 7平源.基于支持向量机的聚类及文本分类研究[D].北京:北京邮电大学,2012.

二级参考文献20

共引文献54

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部