期刊文献+

一种基于类别信息的改进文本特征选择 被引量:1

AN IMPROVED TEXT FEATURE SELECTION METHOD BASED ON CATEGORY INFORMATION
下载PDF
导出
摘要 信息增益方法从整个训练集角度进行特征赋权,该模式不适合构造类别特征向量。通过改进的朴素贝叶斯方法选择类别特征用于构造类别向量,再利用词频信息改进信息增益模型用于文本特征选择,改善了信息增益模型对于中频词信息利用不足问题,提出一种基于类别的文本特征加权改进模型。随后的文本分类试验表明,提出的加权模型相比较于传统的信息增益方法具有较好的文本分类效果。 The information gain method determines the weight of text feature in terms of the whole training set,but it does not suit to forming the categorisation eigenvector.We put forward an improved model of text feature weighting based on categorisation.Firstly,we use the improved Nave Bayes to select the categorisation features for constructing the categorisation vector.Secondly,we use word frequency to improve the information gain method for text feature selection,which ameliorates the problem of insufficient use of the information of medium frequency words in information gain method.The following test on text categorization shows that the weighting model presented in the paper has better text categorisation effect than the conventional information gain method.
出处 《计算机应用与软件》 CSCD 2010年第6期8-10,56,共4页 Computer Applications and Software
基金 国家自然科学基金资助项目(70571087)
关键词 文本分类 特征选择 贝叶斯方法 特征加权 Text categorization Feature selection Nave Bayes Feature weighting
  • 相关文献

参考文献5

二级参考文献28

共引文献180

同被引文献20

  • 1Emeritus, Swanson DR. ASIST Award of Merit Accepance : on the fragmentaion of knowledge, the connection exploion, and assembing other people's [ J]. Bullttin of the American Socitey for Information Science&Technology, 2005, 27 (3): 12-14.
  • 2Swanson DR. Undiscovered Public Knowledge [ J]. Library Quarterly, 1986, 56 (2): 103-118.
  • 3Swanson DR. Two Medical Literatures that are Logically but not Bibliographically Connected [ J ]. Journal of the American So- ciety for Information Science, 1987, 38 (4) : 228 -233.
  • 4Swanson DR, Smalheiser NR. An Interactive System for Finding Complementary Literatures: a stimulus to scientific discovery [ J ]. Artificial Intelligence, 1997, 91 ( 97 ) : 183 - 203.
  • 5Hristovski D, Peterlin B, Mitchell JA, et al. Using Litera- ture- based Discovery to Identify Disease Candidate Genes [ J ]. International Journal of Medical Informatics, 2005, 74 (2/4) : 289 - 298.
  • 6Yetisgen- Yildiz M, Pratt W. Using Statistical and Knowl- edge- based Approaches for Literature -based Discovery [ J]. Journal of Biomedical Informatics, 2006, 39 (6) : 600 -611.
  • 7Weeber M, Klein H, Aronson AR, et al. Text -based Discovery in Biomecine: the architecture of the DAD-system [J]. Pro- ceedings of Amia the annual Conference of the American Medical Infonmtics Association, 2000, 7 (1): 903-~r/.
  • 8Weeber M. Drug Discovery as an Example of Literature - Based Discovery [ M ]. Berlin: Springer Berlin Heidel- berg, 2007.
  • 9Huang W, Nakamori Y, Wang S, et al. Mining Scientific Literature to Predict New Relationships [ J ]. Intelligent Data Analysis, 2005, 9 (2): 219-234.
  • 10Fabian G, Wachter T, Schroeder M. Extending Ontologies by Finding Siblings Using Set Expansion Techniques [ J ]. Bioinformatics, 2012, 28 (12): 292-300.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部