期刊文献+

运用语义集索引法实现英文文本分类 被引量:2

Realizing English Text Classification with Semantic Set Index Method
下载PDF
导出
摘要 为克服当前文本分类法中基于词形匹配带来的局限性,基于WordNet语义词典和隐含语义索引(LSI)模型,提出了基于语义集索引的英文文本分类方法.该方法在分类初期首先利用WordNet构建语义词典库,利用单词的语义集代替单词作为文本特征向量的特征项;然后利用LSI模型进一步深入挖掘语义集概念间的深层联系,将语言知识和概念索引有效地融合到文本向量空间的表示中.针对Na ve Bayes及简单向量距离文本分类法的实验结果显示,2种文本分类法的分类准确率均随着语义分析的深入逐步提高,充分表明了语义挖掘对文本分类的重要性和必要性. To overcome the limitations of actual text classification methods based on bag-of-words representation, An English text classification method based on semantic set index is presented from the WordNet thesaurus and LSI (latent semantic indexing) model. At the initial stages of text classification, the method first constructs semantic thesaurus database by WordNet and replaces bag-of-words with bag-of-semantic sets as an element of the text feature vector. Then LSI model will be used to further mine the deep-seated relations among concepts represented by semantic sets. It effectively incorporates linguistic knowledge and conceptual index into text vector space representation. The experimental results aiming at Naive Bayes and simple vector distance text classification methods show that the accuracy rates of the two classification methods are gradually improved along with more and more in-depth semantic analysis, fully indicating that semantic mining is very important and necessary to text classification.
出处 《北京邮电大学学报》 EI CAS CSCD 北大核心 2006年第2期18-21,共4页 Journal of Beijing University of Posts and Telecommunications
基金 总参谋部技术攀登工程项目(504-4)
关键词 文本分类 语义集索引 隐含语义索引 text classification semantic set index latent semantic indexing
  • 相关文献

参考文献6

二级参考文献22

  • 1黄萱青 吴立德.独立于语种的文本分类方法[M].,2000.37-43.
  • 2鲁松 白硕 等.文本中词语权重计算方法的改进[M].,2000.31-36.
  • 3卜东波.聚类/分类理论研究及其在大模型文本挖掘的应用:博士论文[M].,2000..
  • 4[4]Bilski J L,Rutkowski L.A fast training algorithm for neuralne tworks[J].IEEE Transactions on Circuits and Systems-Ⅱ:Analog and Digital Signal Processing,1998,15(6):749-753.
  • 5Yang Yiming,Information Retrieval,1999年,1卷,1/2期,69页
  • 6Yang Yiming,Proceedings ICML 97 14th Int Conference on Machine Learning,1997年
  • 7李国臣,中文信息学报,1997年,13卷,4期,10页
  • 8黄萱菁,2000 International Conference on Multilingual Information Processing,2000年,37页
  • 9鲁松,2000 International Conference on Multilingual Information Processing,2000年,31页
  • 10卜东波,博士学位论文,2000年

共引文献307

同被引文献13

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部