基于主题的Web文本聚类方法被引量：3

Web text clustering method based on topic

下载PDF

导出

摘要针对传统Web文本聚类算法没有考虑Web文本主题信息导致对多主题Web文本聚类结果准确率不高的问题,提出基于主题的Web文本聚类方法。该方法通过主题提取、特征抽取、文本聚类三个步骤实现对多主题Web文本的聚类。相对于传统的Web文本聚类算法,所提方法充分考虑了Web文本的主题信息。实验结果表明,对多主题Web文本聚类,所提方法的准确率比基于K-means的文本聚类方法和基于《知网》的文本聚类方法要好。 Concerning that the traditional Web text clustering algorithm without considering the Web text topic information leads to a low accuracy rate of multi-topic Web text clustering, a new algorithm was proposed for Web text clustering based on the topic theme. In the method, multi-topic Web text was clustered by three steps： topic extraction, feature extraction and text clustering. Compared to the traditional Web text clustering algorithm, the proposed method fully considered the Web text topic information. The experimental results show that the accuracy rate of the proposed algorithm for multi-topic Web text clustering is higher than the text clustering method based on K-means or HowNet.

作者张万山肖瑶梁俊杰余敦辉

机构地区湖北大学计算机与信息工程学院

出处《计算机应用》 CSCD 北大核心 2014年第11期3144-3146,3151,共4页 journal of Computer Applications

基金国家自然科学基金资助项目(61272111 61202031 61273216 61202032) 湖北省自然科学基金资助项目(2013CFB002 2013CFA115) 武汉市科技攻关计划项目(201210621214 201210421132)

关键词多主题 WEB文本聚类特征词准确率 multi-topic Web text clustering characteristic word accuracy

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献9

1LI Y. Text document clustering based on frequent word meaning sequences [J]. Data and Knowledge Engineering, 2008, 64(1):381-404.
2YI B, WANG Y, CHEN X, et al. Extracting hot topics from microblogging based on keywords detection and text clustering[J]. Applied Mechanics and Materials, 2013, 303-306:2289-2293.
3LI X. A new text clustering algorithm based on improved k_means[J]. Journal of Software, 2012, 7(1):95-101.
4GUPTA N, SAXENA P C, GUPTA J P. Automatic generation of initial value k to apply K-means method for text documents clustering [J]. International Journal of Data Mining, Modelling and Management, 2011, 3(1):18-41.
5赵鹏,蔡庆生.一种基于《知网》的中文文本聚类算法的研究[J].计算机工程与应用,2007,43(12):162-163. 被引量：7
6ZHENG Y, SHU J, CHUN L, et al. A text hybrid clustering algorithm based on HowNet semantics [J]. Key Engineering Materials, 2011, 474-476:2071-2078.
7赵世奇,刘挺,李生.一种基于主题的文本聚类方法[J].中文信息学报,2007,21(2):58-62. 被引量：23
8袁晓峰.一种基于主题的Web文本聚类算法[J].成都大学学报（自然科学版）,2010,29(3):249-252. 被引量：1
9KWALE F M. A critical review of k means text clustering algorithm[J]. International Journal of Advanced Research in Computer Science, 2013, 4(9):27-34.

二级参考文献19

1刘泉凤,陆蓓,王小华.文本挖掘中聚类算法的比较研究[J].计算机时代,2005(6):7-8. 被引量：8
2陈涛,谢阳群.文本分类中的特征降维方法综述[J].情报学报,2005,24(6):690-695. 被引量：79
3Yanjun Li.Text Document Clustering Based on Frequent Word Meaning Sequences[J].Data and Knowledge Engineering,2008,64(1):381-404.
4ZAMIR O E.Clustering Web Documents:A Phrase-Based Method for Grouping Search Engine Results[D].Washington DC:Unioversity of Washinton,1999.
5Xu D X.Energy,Entropy and Information Poterntial for Neural Coputation[D].Florida:Universtiy of Florida,1999.
6Yang Z R,Zwolinski Z.Mutual Information Theory for Adaptive Mixture Models[J].IEEE Transactions on Pattern Analaysis and Machine Intelllgence,2001,23(4):26-32.
7Hatzivassiloglou V, Gravano L and Maganti A. An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering [A]. In:Proceedings of the 23rd ACM SIGIR Conference, Athens [C]. 2000. 224-231.
8Zamir O and Etzioni O. Web Document Clustering:A Feasibility Demonstration [A]. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. 1998.46-54.
9Gusfield D. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology[M]. Cambridge, UK: Cambridge University Press,1997.
10Lee D-L, Chuang H and Seamons K. Document Ranking and the Vector-Space Model [J]. IEEE Software,1997, 14 (2): 67-75.

共引文献27

1吴启明,易云飞.文本聚类综述[J].河池学院学报,2008,28(2):86-91. 被引量：21
2吴柳燕,覃纪武.基于内容的文本模糊检索技术研究[J].情报杂志,2008,27(5):121-124.
3冯少荣,肖文俊.基于语义距离的高效文本聚类算法[J].华南理工大学学报（自然科学版）,2008,36(5):30-37. 被引量：15
4时念云,孔静.基于语义和领域相关的聚类挖掘方法研究[J].微计算机应用,2008,29(11):25-28.
5章成志,张庆国,师庆辉.基于主题聚类的主题数字图书馆构建[J].中国图书馆学报,2008(6):64-69. 被引量：5
6刘泉凤.一种基于文本聚类的开放式信息自动归类方法[J].情报杂志,2009,28(6):177-180. 被引量：1
7宋晓雷,王素格,李红霞.面向特定领域的产品评价对象自动识别研究[J].中文信息学报,2010,24(1):89-93. 被引量：34
8苏冲,陈清才,王晓龙,孟宪军.基于最大频繁项集的搜索引擎查询结果聚类算法[J].中文信息学报,2010,24(2):58-67. 被引量：5
9张榕.术语定义的聚类研究[J].中国科技术语,2011,13(1):14-18. 被引量：1
10高松,冯志伟.基于依存树库的文本聚类研究[J].中文信息学报,2011,25(3):59-63. 被引量：3

同被引文献21

1张云,冯博琴,麻首强,刘连梦.蚁群-遗传融合的文本聚类算法[J].西安交通大学学报,2007,41(10):1146-1150. 被引量：15
2Zong Ziliang, Fares R, Romoser B, et al. FastStor: improving the performance of a large scale hybrid storage system via cac- hing and prefetching [ J ]. Cluster Computing, 2014,17 ( 2 ) : 593 -604.
3Dr A K,Jayasudha S S. An efficient cluster based web object filters from web pre-fetching and web caching on web user navigation[J ]. International Journal of Computer Science Is-sues ,2012,9 ( 3 ) :483-489.
4Liu Qinghui, Solis- Oba R. Web prefetching with machine learning algorithms[ C ]//Proc of international conference on internet computing. [s. 1. ]:[ s. n.] ,2008:142-148.
5Wan Miao, Jsnsson A, Wang Cong, et al. Web user clustering and Web prefetching using random indexing with weight func- tions[J]. Knowledge and Information Systems,2012,33 (1): 89-115.
6de la Ossa B A, Sahuquillo J, Pont A, et al. Key factors in web latency savings in an experimental prefetching system [ J ]. Journal of Intelligent Information Systems,2012,39 ( 1 ) : 187- 207.
7Ban Zhijie,Wang Sansan. A framework of online proxy-based web prefetching [ J ]. Web Information Systems and Mining Lecture Notes in Computer Science,2012,7529:610-620.
8Jiang Hua, Yi Shenghe, Li Jing, et al. Ant clustering algorithm with K- harmonic means clustering[ J]. Expert Systems with Applications, 2010,37(12) :8679-8684.
9Mahdavi M, Abolhassani H. Harmony K-means algorithm for document clustering[ J ]. Data Mining and Knowledge Discovery, 2009,38 (3) :370-391.
10Shi Kansheng, Li Leming. High performance genetic algorithm based text clustering using parts of speech and outlier elimination [ J ]. Ap- plied Intelligence,2013,38(4) :511-519.

引证文献3

1姚瑶,张慧.基于ART1用户聚类的Web预取模型研究[J].计算机技术与发展,2015,25(9):106-110.
2柯钢.基于增强蜂群优化与K-means的文本聚类算法[J].计算机应用研究,2016,33(8):2298-2302. 被引量：8
3郭肇毅.文本主题提取及相似度计算系统研究与开发[J].现代信息科技,2017,1(4):20-22.

二级引证文献8

1赵文昌,李忠木.融合改进人工蜂群和K均值聚类的图像分割[J].液晶与显示,2017,32(9):726-735. 被引量：12
2朱圣烽.融合人工蜂群和混沌映射的混合视频水印算法[J].图学学报,2018,39(1):21-29. 被引量：1
3李海洋,何红洲.改进人工蜂群优化的K均值图像分割算法[J].智能计算机与应用,2018,8(3):45-49. 被引量：6
4沈美英.基于免疫网络学习机制的中文网络短文本聚类算法[J].自动化与仪器仪表,2018,0(10):185-186.
5温廷新,李洋子,孙静霜.基于多因素特征选择与AFOA/K-means的新闻热点发现方法[J].数据分析与知识发现,2019,3(4):97-106. 被引量：5
6田夏利,熊莹.融入新的特征选择机制的文本数据聚类算法[J].计算机工程与设计,2021,42(3):734-741. 被引量：2
7王琛,董永权.融合化学反应优化与K均值的文本数据聚类[J].计算机工程与设计,2021,42(8):2248-2256.
8菊花.基于改进磷虾群算法的多目标文本聚类方法[J].计算机工程与设计,2022,43(6):1694-1703. 被引量：1

1李建忠.Web网页聚类系统研究与设计[J].韩山师范学院学报,2008,29(6):27-30.
2陈宇,王强.聚类算法在Web文本挖掘中的应用研究[J].中国电子商情（通信市场）,2009(2):62-68.
3傅华忠,茅剑.基于DBSCAN聚类算法的Web文本挖掘[J].科技信息,2007(1):55-56. 被引量：5
4贾丙静,吴长勤,葛华.Web文本聚类的研究与实现[J].长春师范学院学报（自然科学版）,2011,30(3):26-29. 被引量：2
5贾丙静,王传安,王亚军,吴长勤.基于属性重要性的Web文本聚类研究[J].重庆文理学院学报（自然科学版）,2011,30(3):49-51.
6李云,田素方,李拓,徐涛.基于概念格的Web文本聚类[J].计算机工程与应用,2008,44(23):169-171. 被引量：3
7王卫玲,刘培玉,刘克非.一种用于Web文本聚类的特征选择方法[J].计算机应用与软件,2007,24(1):154-156. 被引量：2
8叶宇飞,安世全,代劲.一种新的Web中文文本聚类方法研究[J].计算机应用与软件,2013,30(12):222-225. 被引量：3
9许芳芳,王新伟.Web文本聚类算法的分析比较[J].计算机时代,2010(10):6-9. 被引量：2
10王乐,田李,贾焰,韩伟红.基于频繁词集和k-Means的Web文本聚类混合算法[J].计算机工程与科学,2008,30(8):92-96. 被引量：6

计算机应用

2014年第11期

浏览历史

内容加载中请稍等...

基于主题的Web文本聚类方法被引量：3

参考文献9

二级参考文献19

共引文献27

同被引文献21

引证文献3

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

基于主题的Web文本聚类方法 被引量：3

参考文献9

二级参考文献19

共引文献27

同被引文献21

引证文献3

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

基于主题的Web文本聚类方法被引量：3