一种潜在语义索引差异模型被引量：2

A Difference Latent Semantic Indexing

下载PDF

导出

摘要通过对全局模型和局部模型的分析,提出一种新的潜在语义索引差异模型,能将类别信息反应在词项中.以医学网页为实验对象,将网页中的文本抽取出来并分别用全局模型和差异模型表示,采用SVD和SLSI降维,利用SVM算法进行分类并计算分类正确率和F1指标.实验发现:采用差异模型表示时,2种降维技术下分类正确率和F1指标较全局模型都有明显提高;同时采用差异模型和SLSI算法并不能对分类结果有更大改善. On the base of analysis of global LSI and local LSI, a new difference latent semantic indexing is proposed, which integrates the class information into term set. Medical web pages are used to test the new LSI. The text in medical webpage is extracted and represented by the global LSI and the difference LSI respectively. SVD and SLSI are used to reduce the dimension of feature space, SVM algorithm is employed to classify the feature vectors of testing collection, and the categorical accuracy and macro-average F1 are calculated. Experiment illustrates that the difference LSI gives higher accuracy and macro-average F1 than the global LSI when combined with SVD or SLSI. However, the difference LSI combines with SLSI can＇ t obtain more improvement on accuracy and the macro-average F1.

作者米晓芳王立宏宋宜斌

机构地区烟台大学计算机科学与技术学院

出处《烟台大学学报（自然科学与工程版）》 CAS 2008年第2期125-129,共5页 Journal of Yantai University(Natural Science and Engineering Edition)

基金国家自然科学基金资助项目(60772028) 山东省自然科学基金资助项目(Y2006G22)

关键词潜在语义索引差异模型文本分类 SVM算法 latent semantic indexing difference model text categorization SVM algorithm

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献8

1Michael W, Susan T, Gavin W. Using linear algebra for intelligent information retrieval [ J ]. SIAM Review, 1995, 37(4) : 573-595.
2Scott D, Susan T, Richard H. Indexing by latent semantic analysis [ J ]. Journal of the American Society for Information Science, 1990, 41 (6) :391 - 407.
3林鸿飞,姚天顺.基于潜在语义索引的文本浏览机制[J].中文信息学报,2000,14(5):49-56. 被引量：29
4Chakraborti S, Lothian R, Wiratunga N, et al. Sprinkling: supervised latent semantic Indexing[ C]//28th European Conference on Information Retrieval, ECIR. Imperial College London: Springer-Verlag,2006. 510-514.
5David H. Improving text retrieval for the routing problem using latent semantic indexing[ C ]//The 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin: Springer-Verlag, 1994:282 - 291.
6Kazuo H, Yuji M. Information extraction from MED- LINE abstracts of clinical trials [ J ]. IEICE Technical Report: Artificial Intelligence and Knowledge Based Processing, 2004, 104(486) : 45 -49.
7Sean S. Hypertext Classification [ D ]. Pittsburjh : Carnegie Mellon University, 2001.
8Thorsten J. Text categorization with support vector machines: learning with many relevant features[ C ]//10th European Conference on Machine Learning. Heidelberg: SpringerVerlag, 1998, 1398 : 137 - 142.

二级参考文献5

1Yang Y，Proceedingsofthe 14thInternationalConferenceonMachineLearning，1997年
2吴立德，大规模中文文本处理，1997年
3姚天顺，自然语言理解，1995年
4林鸿飞,战学刚,姚天顺.文本层次分析与文本浏览[J].中文信息学报,1999,13(4):7-15. 被引量：12
5林鸿飞,战学刚,姚天顺.基于概念的文本结构分析方法[J].计算机研究与发展,2000,37(3):324-328. 被引量：35

共引文献28

1郝占刚,王正欧.基于潜在语义索引和遗传算法的文本特征提取方法[J].情报科学,2006,24(1):104-107. 被引量：16
2赵晶,林鸿飞,卢冶.可视化文本分类树浏览机制[J].小型微型计算机系统,2006,27(3):524-528. 被引量：1
3刘磊.基于潜在语义分析的JAVA类库检索方法[J].电脑开发与应用,2006,19(3):43-44.
4李莉,张太红,李霞.潜在语义分析在中文文本分类中的应用[J].新疆农业大学学报,2006,29(2):99-102. 被引量：2
5秦春秀,刘怀亮,赵捧未.一种基于本体论和潜在语义索引的文本语义处理方法[J].现代图书情报技术,2006(9):34-37.
6伍建军,康耀红.潜在语义索引在文本分类中的应用[J].电脑与信息技术,2006,14(5):32-34. 被引量：3
7徐晓琳,熊建萍.基于隐含语义的馆藏资源检索研究[J].情报杂志,2006,25(10):66-67.
8林鸿飞,杨志豪,赵晶.基于段落匹配和分布密度的偏重摘要实现机制[J].中文信息学报,2007,21(1):43-48. 被引量：1
9李玉华,王光武.基于本体特征提取的事例推理研究[J].计算机工程与科学,2007,29(6):74-76.
10许高建.基于Web的文本挖掘技术研究[J].计算机技术与发展,2007,17(6):187-190. 被引量：19

同被引文献18

1马国俊,贠卫国.基于潜在语义索引的中文文本聚类的研究[J].现代电子技术,2005,28(10):58-59. 被引量：4
2吉翔华陈超邵正荣等.基于概念空间的文本模糊c-均值聚类方法.Journal of Donghua University（东华大学学报：英文版）,2007,23(3):39-42.
3GAO J, ZHANG J. Clustering SVD strategies in latent semantic indexing [J]. Information Proces sing & Management, 2005, 41 (3): 1051-1063.
4Scherf M, Klingenhoff A, Werner T. Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach [ J ]. Molecular Biology, 2000, 297 (3) : 599-606.
5Bajic V B, Seah S H, Chong A, et al. Dragon promoter finder: recognition of vertebrate RNA polymerase II promoters [ J ]. Bioinformatics ,2002,18 ( 1 ) : 198-199.
6Down T A, Hubbard T J. Computational detection and location of transcription start sites in mammalian genomic DNA [J]. Genome Research, 2002, 12(3): 458--461.
7Wu Shuanhu, Xie Xudong, Wee A, et al. Eukaryotic promoter prediction based on relative entropy and positional information [ J]. Physical Review E, 2007, 75 (4):041908.
8秦洋,王立宏,武栓虎,等.基于潜在语义分析的启动子识别[C]//中国电子学会第十五届信息论学术年会暨第一届全国网络编码学术年会论文集.北京:国防工业出版社,2008:1251-1255.
9Deerwester S, Dumais S T, Fumas G W,et al. Indexing by latent semantic analysis [ J]. Journal of the American Society for Information Science, 1990, 41(6) : 391-407.
10李媛媛,马永强.基于潜在语义索引的文本特征词权重计算方法[J].计算机应用,2008,28(6):1460-1462. 被引量：17

引证文献2

1韩毅,张克菊,金碧辉.集成概念空间与潜在语义索引的文本聚类检索研究[J].情报理论与实践,2009,32(6):102-105. 被引量：2
2秦洋,王立宏,武栓虎,宋宜斌.启动子的潜在语义索引差异识别算法[J].烟台大学学报（自然科学与工程版）,2010,23(3):211-216. 被引量：1

二级引证文献3

1张友新,王立宏.基于流形结构重建的启动子识别[J].计算机工程与科学,2013,35(2):96-102.
2叶蓉,刘书玲.高效文本信息检索在信息平台中的应用与探究[J].科技广场,2017(5):156-158. 被引量：1
3任宇杰,马坤,唐晓岚,柳操.基于LBSN大数据的旅游目的地类簇选点及热度分析[J].科技通报,2019,35(1):94-100. 被引量：1

1秦洋,王立宏,武栓虎,宋宜斌.启动子的潜在语义索引差异识别算法[J].烟台大学学报（自然科学与工程版）,2010,23(3):211-216. 被引量：1
2米晓芳,秦洋,王立宏,宋宜斌.基于潜在语义差异的医学网页聚类[J].计算机工程,2008,34(19):64-66. 被引量：2
3装机问答[J].电脑爱好者,2014(7):84-84.
4任澍,唐向宏,康佳伦.针对匹配块不存在的Criminisi算法改进研究[J].杭州电子科技大学学报（自然科学版）,2012,32(5):139-142.
5张立.企业管理系统中数据存储的可靠性探究[J].信息与电脑（理论版）,2010(1):20-22.
6袁钢,王彤威,冯建呈.一种SLSI类电路板的故障诊断方法及其应用研究[J].计算机测量与控制,2010,18(6):1289-1292.
7董磊.构建校园网需要解决的问题[J].职业技术,2016,15(6):100-101.
8李荣德.双风门报警仪的应用[J].信息系统工程,2011(6):100-101.
9吕新荣.基于语义聚类的Web服务发现机制[J].数字技术与应用,2010,28(8):42-44.
10梁华国,李军,许达文,许晓琳,靳松.缓解异构MPSoC电迁移效应的任务调度算法[J].计算机辅助设计与图形学学报,2015,27(8):1570-1577.

烟台大学学报（自然科学与工程版）

2008年第2期

浏览历史

内容加载中请稍等...

一种潜在语义索引差异模型被引量：2

参考文献8

二级参考文献5

共引文献28

同被引文献18

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

一种潜在语义索引差异模型 被引量：2

参考文献8

二级参考文献5

共引文献28

同被引文献18

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

一种潜在语义索引差异模型被引量：2