期刊文献+

基于隐Markov模型的文本分类 被引量:5

HMM based text categorization
下载PDF
导出
摘要 把基于序列模型的隐Markov模型引入文本分类领域。把待分类文本描述成一系列状态演化的隐Markov过程,其中状态以特定的概率产生代表文本的特征项。用序列模式来描述文本类,文本序列通过与隐Markov模型的匹配,求出其对应状态序列和最大输出概率。比较各个文本类的结果,达到文本分类的目的。最后通过和简单向量算法,KNN,Naive Bayes分类算法的比较,说明本算法的在文本分类中的成功应用。 Presents the new method using Hidden Markov Models (HMM) to supervise document classification.Represents the document to be classified in a kind of hidden Markov models.The states of HMM eject the symbols with a certain probability. These symbols composes of the classified documents.The class of document is supposed to be composed by some character item series.By calculating the output probability of the HMM on the class character series can get the max corresponding output probability and the output series.Compares the result on all the class can decide the category of a certain document.The model is evaluated on the real dataset with Naive Bayes,KNN and simple vector models.It is shown to be successful method in text categorization.
出处 《计算机工程与应用》 CSCD 北大核心 2007年第30期179-181,227,共4页 Computer Engineering and Applications
关键词 隐马尔可夫 文本分类 序列模型 Hidden Markov Models(HMM) text categorization sequence model
  • 相关文献

参考文献15

  • 1Denoyer L,Zaragoza H,Gallinari P.HMM-based passage models for document classification and ranking[C]//Proceedings of 23rd European Colloquium on Information Retrieval Research,ECIR - 01,2001.
  • 2Ge X,Smyth P.Deformable Markov model templates for time-series pattern matching [C]//Proc of the 6th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining,Boston,MA,2000:81-90.
  • 3Leah S.Larkey:automating survey coding by multiclass text categorization techniques[C]//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998 : 90-95.
  • 4Zaiane O R,Antonie M L.Classifying text documents by associating terms with text categories[C]/,rFhirteenth Australasian Database Conference(ADC'02), Melbourne, Australia, January 2002:215-222.
  • 5Frasconi P,SOda G,Vullo A.Hidden Markov models for text categorization in multi-page documents [J].Joumal of Intelligent Information Systems,2002,18(2/3): 195-217.
  • 6Freitag D,McCallum A.Information extraction with HMMs and shrinkage [C]//Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.
  • 7Zavrel J,Berck P,Lavrijssen W.Information extraction by text classification:corpus mining for features [C]//Proceedings of the Workshop Information Extraction Meets Corpus Linguistics,May 30 2000.
  • 8Szummer M,Jaakkola T.Partially labeled classification with Markov random walks [C]//Advances in Neural Information Processing Systems (NIPS)2002,14: 945-952.
  • 9Nigam K,McCallum A K.Text classification from labeled and unlabeled documents using EM [J].Machine Learning, 2000,39 (2/3) : 103-134.
  • 10McCallum A,Nigam K.A comparison of event models for Naive Bayes text classification [C]//AAAI-98Workshop Learning for Text Categorization, 1998.

同被引文献32

引证文献5

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部