期刊文献+

一种基于密度的文本聚类挖掘算法 被引量:4

Text cluster mining algorithm based on density
下载PDF
导出
摘要 针对DBSCAN算法需用户设置参数值、易产生挖掘结果偏差等不足,提出改进算法DBTC(density-basedtext clustering),该算法不仅能够发现任意形状的簇,还有效地解决了基于密度的DBSCAN聚类算法在文本挖掘中参数设置困难和高密度的簇被相连的低密度簇包含的问题。理论分析和实验结果表明,算法是有效可行的。 Focusing on the problem that the DBSCAN algorithm needs to set parameters by users and lends to warp the mining result, proposed an improved text clustering algorithms DBTC (density-based text clustering). The algorithm not only could find arbitrary shaped clusters, but also efficiently solved these problems which were it was too difficult for users to determine the parameters and the high-density cluster was completely contained to the linked low-density cluster. Theoretic analysis and experimental results indicate that the algorithm is effective and efficient.
出处 《计算机应用研究》 CSCD 北大核心 2009年第1期124-126,共3页 Application Research of Computers
基金 江苏省自然科学基金资助项目(BK2006095)
关键词 分词 文本聚类 向量空间模型 核心对象 words segmentation text clustering vector space model core-object
  • 相关文献

参考文献9

  • 1CHEN Ming-shan, HAN Jia-wei, PHILP S Y. Datamining: an overview from a database perceptive [ J]. IEEE Trans on Knowledge and Data Engineering,1996,8(6) :866-882.
  • 2HAN Jia-wei,KAMBER M.数据挖掘:概念与技术[M].范明,孟小峰,等译.北京:机械工业出版社.2004.
  • 3YANG Yi-ming, PEDERSON J O. A comparative study on feature selection in text categorization[ C ]//Proc of the 14th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 1997:412- 420.
  • 4易高翔,程耕国.Web文本挖掘研究[J].武汉科技大学学报,2005,28(1):72-74. 被引量:5
  • 5ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise [ C]//Proc of the 2nd International Conference on Knowledge Discovery and Data Mining. [ S. l. ] :AAAI Press, 1996:226-231.
  • 6刘青宝,邓苏,张维明.基于相对密度的聚类算法[J].计算机科学,2007,34(2):192-195. 被引量:13
  • 7HOTHO A, STAAB S, STUMME G. Wordnet improves text document clustering [ C ]//Proc of SIGIR Semantic Web Workshop. 2003 : 541-544.
  • 8石陆魁,何丕廉.一种基于密度的高效聚类算法[J].计算机应用,2005,25(8):1824-1826. 被引量:21
  • 9鲁松,李晓黎,白硕,王实.文档中词语权重计算方法的改进[J].中文信息学报,2000,14(6):8-13. 被引量:120

二级参考文献26

  • 1HanJiawei Kamber M 范明等译.数据挖掘:概念与技术[M].北京:机械工业出版社,2001..
  • 2Yang Yiming,ProceedingsoftheSeventeenthInternationalACMSIGIRConferenceonResearchandDevelopme,1994年,12页
  • 3JiaweiHah MichelineKamber 范明 孟小峰译.数据挖掘概念与技术[M].北京:机械工业出版社,2001.8.
  • 4KAUFMAN L, ROUSSEEUW PJ. Finding groups in data: An introduction to cluster analysis[ M]. New York: John Wiley & Sons,1990.
  • 5RAYMOND T NG, HAN JW. CLARANS: A method for clustering objects for spatial data mining[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14(5): 1003 -1016.
  • 6ZHANGT, RAMAKRISHNAN R, LIVNY M. BIRCH: An efficient data clustering method for very large databases[ A]. Proceedings of the ACM SIGMOD internatioal conference on Management of data[C]. New York: ACM Press, 1996. 103 - 114.
  • 7GUHA S, RASTOGI R, SHIM K. CURE: An efficient clustering algorithm for large databases[ A]. Proceedings of the ACM SIGMOD internatioal conference on Management of data[ C]. New York: ACM Press, 1998.73 - 84.
  • 8ESTER M, HANS-PETER KRIEGEL, SANDER J, et al. A densitybased algorithm for discovering clusters in large spatial databases with noise[A]. Proceeding the 2nd international conference on Knowledge discovery and data mining(KDD) [ C]. Portland, 1996.226 - 231.
  • 9ANKERST M, BREUNIG M, HANS-PETER KRIEGEL, et al. OPTICS: Ordering points to identify the clustering structure[ A]. Proceedings of the ACM SIGMOD internatioal conference on Management of data[ C]. New York: ACM Press, 1999.49 -60.
  • 10WANG W, YANG J, MUNTZ R. STING: A statistical information grid approach to spatial data mining[ A]. Proceedings of the 23rd IEEE international conference on very large data bases[ C]. Athens,1997. 186 - 195.

共引文献155

同被引文献33

  • 1刁柏青,步万峰.构建集中统一的电网集团数据中心[J].电力信息化,2004,2(10):57-59. 被引量:4
  • 2岳士弘,李平,郭继东,周水庚.A statistical information-based clustering approach in distance space[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2005,6(1):71-78. 被引量:8
  • 3郭岩,白硕,杨志峰,张凯.网络日志规模分析和用户兴趣挖掘[J].计算机学报,2005,28(9):1483-1496. 被引量:62
  • 4彭京,杨冬青,唐世渭,付艳,蒋汉奎.一种基于语义内积空间模型的文本聚类算法[J].计算机学报,2007,30(8):1354-1363. 被引量:44
  • 5CHAU M, XU J. Mining communities and their relationships in blogs= a study of online groups [J]. Int of Human-Computer Studies, 2007, 65(1): 57-70.
  • 6BRIN S, PAGE L. The anatomy of a large-scale hypertextual web search engine [J]. Comput Networks 5- ISDN Syst, 1998, 30(1/7): 107-117.
  • 7CHIRITA P A, OLMEDILLA D, NEJDL W. Finding related pages using the link structure of the WWW [C]// Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. Washington, DC: IEEE Computer Society, 2004: 632-635.
  • 8WOOD L. Programming the web: the W3C DOM specification[J]. Int Comput, 1999, 3(1): 48-54.
  • 9CHEN Yun, TSAI F S, CHAN K L. Machine learning techniques for business blog search and mining [J]. Expert Syst Appl, 2008, 35(3): 581-590.
  • 10傅怀慧 林共进 白峰杉 等.阻尼因子对网页排名之敏感度分析.中国统计学报,2005,43(2):145-164.

引证文献4

二级引证文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部