
一种从医学文本中实现自动关键词抽取和筛选的技术方法 被引量:3

A Method for Automatic Keyword Extraction and Filtration from Medical Texts
摘要 鉴于重要关键词对于文本有着重要的强文本表示功能,关键词抽取和筛选在信息检索、信息抽取和知识挖掘等领域中有着重要的作用。在调研当前关键词抽取的方法后,结合医学领域已有的叙词表和工具以及BM25F加权词频公式提出基于医学文本的重要关键词抽取和筛选的技术方法。该方法主要解决两个关键问题:关键词的识别和抽取、关键词重要性的衡量和筛选。以2001-2007年骨关节炎领域的文献集合为数据来源,对该技术方法进行实践尝试,并验证其实际有效性,为知识挖掘中的重要关键词抽取提供一个行之有效的途径。 Seeing that the keyword or key phrase can represent the feature of text, keyword extraction and filtration has great significance for information retrieval, information extraction and knowledge discovery. This paper first investigates current keyword extraction methods. Then it uses existing thesaurus and tools in the medical field and BM25F model in proposing a method for keyword extraction and filtration from medical texts. The proposed method mainly solves two key problems : identification and extraction of keywords, evaluation of keyword value and filtration of keywords. This paper applies the method on documents in the field of osteoarthritis from the year 2001 to 2007, and verifies its effectiveness, which offers an effective way for extracting keywords in knowledge discovery.
出处 《现代图书情报技术》 CSSCI 北大核心 2008年第8期31-36,共6页 New Technology of Library and Information Service
基金 国家社会科学基金项目"从数字信息资源中实现知识抽取的理论和方法研究"(项目编号:05BTQ006)的研究成果之一
关键词 关键词抽取 关键词筛选 BM25F MMTx文本挖掘 医学数据挖掘 Keyword extraction Keyword filtration BM25F MMTx Text mining Medical data mining
  • 相关文献


  • 1刘华.基于文本分类中特征提取的领域词语聚类[J].语言文字应用,2007(1):139-144. 被引量:21
  • 2Blank G D,Pottenger W M, Kessler C D. CIMEL:Constructive and Collaborative, Inquiry - based Multimedia E - Learning[ EB/OL]. [ 2007 - 08 - 01 ]. http://dimacs, rutgers, edu/- billp/pubs/IT- ICSE01. pdf.
  • 3Porter A L,Detampel M J. Technology Opportunities Analysis[ J]. Technological Forecasting and Social Change, 1995,49:237 -255.
  • 4Essential Science Indicators[ EB/OL]. [2007 -08 -01 ]. http:// www. esi - topics, com/RFmethodology, html.
  • 5Swan R, Jensen D. TimeMines:Constructing Timelines with Statistical Models of Word Usage [ EB/OL]. [ 2007 - 08 - 01 ]. http :// www. cs. cmu. edu/- dunja/KDDpapers/Swan_TM, pdf.
  • 6Lowe HJ, Barnett GO. Remote Access MicroMeSH:A Microcomputer System for Searching MEDLINE [ C ]. In : The Proceedings Annual Symposium on Computer Application in Medical Care, 1988 : 535 -539.
  • 7Miller RA, Gieszczykiewicz FM, Vries JK, et al. CHARTLINE: Providing Bibliographic References Relevant to Patient Charts Using the UMLS Metathesaurus Knowledge Sources [ C ]. In : the Proceedings Annual Symposium on Computer Application in Medical Care. 1992 : 86 - 90.
  • 8Evans DA, Hersh WR, Monarch IA, et al. Automatic Indexing of Abstracts via Natural - language Processing Using a Simple Thesaurus [ J ]. Medical Decision Making, 1991,11 (4) : S 108 - S 115.
  • 9Gordon M, Holt DG, Panigrahi A, et al. Genome -wide Dynamics of SAPHIRE, an Essential Complex for Gene Activation and Chromatin Boundaries [ J ]. Molecular and Cellular Biology, 2007,27 ( 11 ) :4058 - 69.
  • 10MMTx[EB/OL]. [2007 -08 -01]. http://mmtx, nlm. nih. gov/.


  • 1钟敏娟,林亚平,陈治平.基于分类和关键词组抽取的信息检索算法[J].系统仿真学报,2004,16(5):1009-1012. 被引量:10
  • 2陆伟,夏立新.基于OKAPI的XML信息检索实现研究[J].中国图书馆学报,2006,32(4):60-64. 被引量:8
  • 3INEX web site. [2006-03-29]. http://inex. is. informatik.uni-duisburg. de/
  • 4S.E. Robertson, H. Zaragozu, M. Taylor. Simple BM25 Extension to Multiple Weighted Fields. CIKM' 04,2004
  • 5R. Wilkinson. Effective retrieval of structured documents. In Research and Development in Information Retrieval, 1994
  • 6P. Ogilvie, J. Callan. Combining document representations for known item search. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), 2003
  • 7W. Kraaij, T. Westerveld, D. Hiemstra. The importance of prior probabilities for entry page search. In Proceedings of the 25^th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2002
  • 8S. Myacng, D. Jang, M. Kim, Z. Zhoo. A flexible model for retrieval of SGML documents. In Prececdings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998
  • 9S, E. Robertson, S. Walker. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. 1994, 345-354
  • 10S, E, Robertson. Overview of The OKAPI Projects. Journal of Documentation, 1997, 53(1)












使用帮助 返回顶部