摘要
以网络食品安全信息为研究对象,旨在提出一个能够解决食品安全领域专有名词指代不明的歧义消解算法。文中采用的歧义消解算法是在改进的TF-IDF特征选择算法的基础上,结合了隐含马尔可夫模型(HMM)和SVM分类器,从而实现专有名词的歧义消解。提出了一个在TF-IDF的基础上增加两个加权因子的特征提取算法LN-TFIDF。实验表明,以202831条文本实验所得的准确率和召回率的调和平均值F1值为评价标准,设计的基于改进TFIDF的食品安全领域歧义消解算法的效果比基于传统TF-IDF的歧义消解算法平均提升了7.31%,且在不同时间抓取的实验数据集下,本算法的效果也相对稳定。
The article aimed to put forward a disambiguation algorithm which can correctly classify the unknown terms, based on the food safety information in network. The disambiguation algorithms used in this paper combines the hidden Markov model(HMM) and SVM classifier to achieve terminology disambiguation, based on the improved TF-IDF fea- ture selection algorithm. This paper proposed a new feature extraction algorithm LN-TF-IDF with two additional weighting factors on traditional TF-IDF. Experiments show that, the improved TF-1DF disamhiguation algorithm de- signed in the field of food safety enhances the effect of disambiguation by average 7. 31~ on the 202831 texts. It was compared with the traditional TF-IDF text feature selection algorithm, with the F-measure as evaluation criteria. At the same time, the effect of the algorithm is relatively stable on different experimental data sets obtained from different time.
出处
《计算机科学》
CSCD
北大核心
2015年第B11期7-9,26,共4页
Computer Science
基金
国家自然科学基金项目(61303214)资助