期刊文献+

一种潜在语义索引差异模型 被引量:2

A Difference Latent Semantic Indexing
下载PDF
导出
摘要 通过对全局模型和局部模型的分析,提出一种新的潜在语义索引差异模型,能将类别信息反应在词项中.以医学网页为实验对象,将网页中的文本抽取出来并分别用全局模型和差异模型表示,采用SVD和SLSI降维,利用SVM算法进行分类并计算分类正确率和F1指标.实验发现:采用差异模型表示时,2种降维技术下分类正确率和F1指标较全局模型都有明显提高;同时采用差异模型和SLSI算法并不能对分类结果有更大改善. On the base of analysis of global LSI and local LSI, a new difference latent semantic indexing is proposed, which integrates the class information into term set. Medical web pages are used to test the new LSI. The text in medical webpage is extracted and represented by the global LSI and the difference LSI respectively. SVD and SLSI are used to reduce the dimension of feature space, SVM algorithm is employed to classify the feature vectors of testing collection, and the categorical accuracy and macro-average F1 are calculated. Experiment illustrates that the difference LSI gives higher accuracy and macro-average F1 than the global LSI when combined with SVD or SLSI. However, the difference LSI combines with SLSI can' t obtain more improvement on accuracy and the macro-average F1.
出处 《烟台大学学报(自然科学与工程版)》 CAS 2008年第2期125-129,共5页 Journal of Yantai University(Natural Science and Engineering Edition)
基金 国家自然科学基金资助项目(60772028) 山东省自然科学基金资助项目(Y2006G22)
关键词 潜在语义索引 差异模型 文本分类 SVM算法 latent semantic indexing difference model text categorization SVM algorithm
  • 相关文献

参考文献8

  • 1Michael W, Susan T, Gavin W. Using linear algebra for intelligent information retrieval [ J ]. SIAM Review, 1995, 37(4) : 573-595.
  • 2Scott D, Susan T, Richard H. Indexing by latent semantic analysis [ J ]. Journal of the American Society for Information Science, 1990, 41 (6) :391 - 407.
  • 3林鸿飞,姚天顺.基于潜在语义索引的文本浏览机制[J].中文信息学报,2000,14(5):49-56. 被引量:29
  • 4Chakraborti S, Lothian R, Wiratunga N, et al. Sprinkling: supervised latent semantic Indexing[ C]//28th European Conference on Information Retrieval, ECIR. Imperial College London: Springer-Verlag,2006. 510-514.
  • 5David H. Improving text retrieval for the routing problem using latent semantic indexing[ C ]//The 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin: Springer-Verlag, 1994:282 - 291.
  • 6Kazuo H, Yuji M. Information extraction from MED- LINE abstracts of clinical trials [ J ]. IEICE Technical Report: Artificial Intelligence and Knowledge Based Processing, 2004, 104(486) : 45 -49.
  • 7Sean S. Hypertext Classification [ D ]. Pittsburjh : Carnegie Mellon University, 2001.
  • 8Thorsten J. Text categorization with support vector machines: learning with many relevant features[ C ]//10th European Conference on Machine Learning. Heidelberg: SpringerVerlag, 1998, 1398 : 137 - 142.

二级参考文献5

共引文献28

同被引文献18

  • 1马国俊,贠卫国.基于潜在语义索引的中文文本聚类的研究[J].现代电子技术,2005,28(10):58-59. 被引量:4
  • 2吉翔华 陈超 邵正荣 等.基于概念空间的文本模糊c-均值聚类方法.Journal of Donghua University(东华大学学报:英文版),2007,23(3):39-42.
  • 3GAO J, ZHANG J. Clustering SVD strategies in latent semantic indexing [J]. Information Proces sing & Management, 2005, 41 (3): 1051-1063.
  • 4Scherf M, Klingenhoff A, Werner T. Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach [ J ]. Molecular Biology, 2000, 297 (3) : 599-606.
  • 5Bajic V B, Seah S H, Chong A, et al. Dragon promoter finder: recognition of vertebrate RNA polymerase II promoters [ J ]. Bioinformatics ,2002,18 ( 1 ) : 198-199.
  • 6Down T A, Hubbard T J. Computational detection and location of transcription start sites in mammalian genomic DNA [J]. Genome Research, 2002, 12(3): 458--461.
  • 7Wu Shuanhu, Xie Xudong, Wee A, et al. Eukaryotic promoter prediction based on relative entropy and positional information [ J]. Physical Review E, 2007, 75 (4):041908.
  • 8秦洋,王立宏,武栓虎,等.基于潜在语义分析的启动子识别[C]//中国电子学会第十五届信息论学术年会暨第一届全国网络编码学术年会论文集.北京:国防工业出版社,2008:1251-1255.
  • 9Deerwester S, Dumais S T, Fumas G W,et al. Indexing by latent semantic analysis [ J]. Journal of the American Society for Information Science, 1990, 41(6) : 391-407.
  • 10李媛媛,马永强.基于潜在语义索引的文本特征词权重计算方法[J].计算机应用,2008,28(6):1460-1462. 被引量:17

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部