期刊文献+

基于K-means和TF-IDF的中文药名聚类分析 被引量:2

Chinese drug name cluster analysis based on K-means and TF-IDF
下载PDF
导出
摘要 针对药名聚类中药物命名特殊性导致的命名准确率低的问题,提出了基于TF-IDF和K-means的药名聚类方法。药物命名具有一定的规律性且中西药名命名形式不同等特点,基于字词共现频率的方法难以取得较好的聚类效果,因此,使用TF-IDF方法计算药名相似的方法并采用K-means聚类算法进行药名的聚类。实验结果表明,TFIDF的聚类准确率高于TF的聚类方法,按字切分的聚类准确率高于分词后的聚类准确率,基于字和TF-IDF的聚类准确率最高且稳定,准确率达到96.77%。 Because of the problem of low accuracy of Chinese name clustering, the method of durg name clustering based on TF-IDF (Term Frequency-Inverse Document Frequency) and K- means was proposed. As the durg name is with a certain regularity and western medicine is named in different forms, it's difficult to obtain better clustering results based on word co- occurrence frequency, so, TF-IDF method was used to identify similar drug names and K- means clustering algorithm was used for clustering drug names. Experimental results show that TF-IDF clusters drug names with high accuracy, the clustering of word-segmentation has higher accuracy than the clustering of participle. The clustering of words and TF-IDF has the higher accuracy and stablility, and its aeeuraey rate reaches 96.77%.
出处 《计算机应用》 CSCD 北大核心 2014年第A01期173-174,210,共3页 journal of Computer Applications
基金 国家科技支撑计划项目(2012BAH19F01)
关键词 TF-IDF K-MEANS 中文药名聚类 药名分析 字词共现频率 TF-IDF K- means Chinese drug name clustering drug name analysis word co-occurrence frequency
  • 相关文献

参考文献10

  • 1HAN J,KAMBER M.数据挖掘概念与技术[M].北京:机械工业出版社,2007:251-261.
  • 2JONES K S. A statistical interpretation of term specificity and its ap- plication in retrieval[ J]. Journal of documentation, 1972, 28 (1) : 11 -21.
  • 3SALTON G, BUCKLEY C. Term weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24 (5): 513-523.
  • 4施聪莺,徐朝军,杨晓江.TFIDF算法研究综述[J].计算机应用,2009,29(B06):167-170. 被引量:218
  • 5王字.基于TFIDF的文本分类算法研究[D].郑州:郑州大学,2006.
  • 6周琳娜.中成药药名的语言结构分析[J].锦州医学院学报(社会科学版),2005,3(3):85-87. 被引量:1
  • 7Wiki.文本聚类[EB/OL].[2013-04-01].http://zh.wikipedia.org/wiki/文本聚类.
  • 8Wiki.欧几里德距离[EB/OL].[2013-07-21].http://zh.wikipedia.org/wiki/欧儿里德距离.
  • 9Wiki.曼哈顿距离[EB/OLl.[2013-08-12].http://zh.wikipedia.org/wiki/曼哈顿距离.
  • 10张华平.NLPIR汉语分词系统[EB/OL].[2013-11-11].http://ictelas.nlpir.org/.

二级参考文献13

共引文献229

同被引文献11

引证文献2

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部