摘要
分析了医学新闻信息利用的必要性及自动标引的发展现状,提出一种医学新闻文本自动受控标引方法,即以分词词表为基础词表,引入汉化MeSH词表建立标引词表,对中文医学新闻文本进行分词、词频统计和排序,过滤掉不在主题词表中的高频词后,选取词频最高的5个MeSH主题词用作标引词。
After the necessity of using medical news information and the advances in its automatic indexing were analyzed,a novel automatic controlled indexing method of medical news text was put forward. The method introduced translated MeSH vocabulary as the main indexing words,merging Chinese commonly used word segmentation dictionary,then calculated word frequency for document text which added split token and sorted it,choose top 5high-frequency words in MeSH vocabulary indexed document after deleting high-frequency words not in MeSH vocabulary.
出处
《中华医学图书情报杂志》
CAS
2014年第8期7-10,共4页
Chinese Journal of Medical Library and Information Science
基金
解放军总后勤部"全军医学信息资源共建共享服务体系建设"(司训[2011]116号)子项目
关键词
词频统计
自动标引
主题标引
受控标引
医学主题词表
Words frequency statistics
Automatic indexing
Subject heading indexing
Controlled indexing
MeSH