期刊文献+

基于主题的Web文本聚类方法 被引量:3

Web text clustering method based on topic
下载PDF
导出
摘要 针对传统Web文本聚类算法没有考虑Web文本主题信息导致对多主题Web文本聚类结果准确率不高的问题,提出基于主题的Web文本聚类方法。该方法通过主题提取、特征抽取、文本聚类三个步骤实现对多主题Web文本的聚类。相对于传统的Web文本聚类算法,所提方法充分考虑了Web文本的主题信息。实验结果表明,对多主题Web文本聚类,所提方法的准确率比基于K-means的文本聚类方法和基于《知网》的文本聚类方法要好。 Concerning that the traditional Web text clustering algorithm without considering the Web text topic information leads to a low accuracy rate of multi-topic Web text clustering, a new algorithm was proposed for Web text clustering based on the topic theme. In the method, multi-topic Web text was clustered by three steps: topic extraction, feature extraction and text clustering. Compared to the traditional Web text clustering algorithm, the proposed method fully considered the Web text topic information. The experimental results show that the accuracy rate of the proposed algorithm for multi-topic Web text clustering is higher than the text clustering method based on K-means or HowNet.
出处 《计算机应用》 CSCD 北大核心 2014年第11期3144-3146,3151,共4页 journal of Computer Applications
基金 国家自然科学基金资助项目(61272111 61202031 61273216 61202032) 湖北省自然科学基金资助项目(2013CFB002 2013CFA115) 武汉市科技攻关计划项目(201210621214 201210421132)
关键词 多主题 WEB文本 聚类 特征词 准确率 multi-topic Web text clustering characteristic word accuracy
  • 相关文献

参考文献9

  • 1LI Y. Text document clustering based on frequent word meaning sequences [J]. Data and Knowledge Engineering, 2008, 64(1):381-404.
  • 2YI B, WANG Y, CHEN X, et al. Extracting hot topics from microblogging based on keywords detection and text clustering[J]. Applied Mechanics and Materials, 2013, 303-306:2289-2293.
  • 3LI X. A new text clustering algorithm based on improved k_means[J]. Journal of Software, 2012, 7(1):95-101.
  • 4GUPTA N, SAXENA P C, GUPTA J P. Automatic generation of initial value k to apply K-means method for text documents clustering [J]. International Journal of Data Mining, Modelling and Management, 2011, 3(1):18-41.
  • 5赵鹏,蔡庆生.一种基于《知网》的中文文本聚类算法的研究[J].计算机工程与应用,2007,43(12):162-163. 被引量:7
  • 6ZHENG Y, SHU J, CHUN L, et al. A text hybrid clustering algorithm based on HowNet semantics [J]. Key Engineering Materials, 2011, 474-476:2071-2078.
  • 7赵世奇,刘挺,李生.一种基于主题的文本聚类方法[J].中文信息学报,2007,21(2):58-62. 被引量:23
  • 8袁晓峰.一种基于主题的Web文本聚类算法[J].成都大学学报(自然科学版),2010,29(3):249-252. 被引量:1
  • 9KWALE F M. A critical review of k means text clustering algorithm[J]. International Journal of Advanced Research in Computer Science, 2013, 4(9):27-34.

二级参考文献19

  • 1刘泉凤,陆蓓,王小华.文本挖掘中聚类算法的比较研究[J].计算机时代,2005(6):7-8. 被引量:8
  • 2陈涛,谢阳群.文本分类中的特征降维方法综述[J].情报学报,2005,24(6):690-695. 被引量:79
  • 3Yanjun Li.Text Document Clustering Based on Frequent Word Meaning Sequences[J].Data and Knowledge Engineering,2008,64(1):381-404.
  • 4ZAMIR O E.Clustering Web Documents:A Phrase-Based Method for Grouping Search Engine Results[D].Washington DC:Unioversity of Washinton,1999.
  • 5Xu D X.Energy,Entropy and Information Poterntial for Neural Coputation[D].Florida:Universtiy of Florida,1999.
  • 6Yang Z R,Zwolinski Z.Mutual Information Theory for Adaptive Mixture Models[J].IEEE Transactions on Pattern Analaysis and Machine Intelllgence,2001,23(4):26-32.
  • 7Hatzivassiloglou V, Gravano L and Maganti A. An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering [A]. In:Proceedings of the 23rd ACM SIGIR Conference, Athens [C]. 2000. 224-231.
  • 8Zamir O and Etzioni O. Web Document Clustering:A Feasibility Demonstration [A]. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. 1998.46-54.
  • 9Gusfield D. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology[M]. Cambridge, UK: Cambridge University Press,1997.
  • 10Lee D-L, Chuang H and Seamons K. Document Ranking and the Vector-Space Model [J]. IEEE Software,1997, 14 (2): 67-75.

共引文献27

同被引文献21

  • 1张云,冯博琴,麻首强,刘连梦.蚁群-遗传融合的文本聚类算法[J].西安交通大学学报,2007,41(10):1146-1150. 被引量:15
  • 2Zong Ziliang, Fares R, Romoser B, et al. FastStor: improving the performance of a large scale hybrid storage system via cac- hing and prefetching [ J ]. Cluster Computing, 2014,17 ( 2 ) : 593 -604.
  • 3Dr A K,Jayasudha S S. An efficient cluster based web object filters from web pre-fetching and web caching on web user navigation[J ]. International Journal of Computer Science Is-sues ,2012,9 ( 3 ) :483-489.
  • 4Liu Qinghui, Solis- Oba R. Web prefetching with machine learning algorithms[ C ]//Proc of international conference on internet computing. [s. 1. ]:[ s. n.] ,2008:142-148.
  • 5Wan Miao, Jsnsson A, Wang Cong, et al. Web user clustering and Web prefetching using random indexing with weight func- tions[J]. Knowledge and Information Systems,2012,33 (1): 89-115.
  • 6de la Ossa B A, Sahuquillo J, Pont A, et al. Key factors in web latency savings in an experimental prefetching system [ J ]. Journal of Intelligent Information Systems,2012,39 ( 1 ) : 187- 207.
  • 7Ban Zhijie,Wang Sansan. A framework of online proxy-based web prefetching [ J ]. Web Information Systems and Mining Lecture Notes in Computer Science,2012,7529:610-620.
  • 8Jiang Hua, Yi Shenghe, Li Jing, et al. Ant clustering algorithm with K- harmonic means clustering[ J]. Expert Systems with Applications, 2010,37(12) :8679-8684.
  • 9Mahdavi M, Abolhassani H. Harmony K-means algorithm for document clustering[ J ]. Data Mining and Knowledge Discovery, 2009,38 (3) :370-391.
  • 10Shi Kansheng, Li Leming. High performance genetic algorithm based text clustering using parts of speech and outlier elimination [ J ]. Ap- plied Intelligence,2013,38(4) :511-519.

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部