期刊文献+

中文报纸文献标引知识库设计与构建 被引量:1

Design and Construction of Knowledge Base for Indexing Chinese Newspaper Literatures
原文传递
导出
摘要 报纸文献主题标引、分类标引和命名实体抽取是其内容深加工的主要形式,基于知识库的自动标引是报纸文献标引自动化的一种实现方式。在报纸文献自动标引研究现状基础上提炼出报纸文献自动标引一般流程,提出知识库建设是其实现自动标引的前提。结合报纸文献标引的特点,提出报纸文献标引用知识库应由主题标引库、分类知识库和实体标引库三部分多个词表组成,具有多词表融合、规模大、可扩充、简单易行等特点。同时,就知识库构建中的主题规范表、分类主题对照表和命名实体抽取规则库建设等关键技术进行阐述。 Subject indexing, categorization and named entity extraction of newspaper literature are the main forms for its deep content processing. It is a major method that realizes automatic indexing the news- paper literature based on knowledge base. The general flow of automatic indexing for newspaper literature was figured based on the survey of its state of the art. From the flow, it could be found that the construc- tion of knowledge base is the premise of automatic indexing. The knowledge base was composed of subject indexing base, classification base and named entity extraction base which including many vocabularies and word lists. The characteristics of knowledge bases were analyzed in the paper. At last, the key tech- niques, such as the construction of vocabulary for subject control, cross concordances of class numbers and keyword strings and extraction rules for named entity, were expounded.
作者 薛春香
出处 《情报科学》 CSSCI 北大核心 2013年第7期121-125,共5页 Information Science
基金 教育部人文社会科学研究基金青年项目(09YJC870014) 江苏省社会科学基金青年项目(09TQC011)
关键词 报纸文献 自动标引 分类标引 知识库 newspaper literature automatic indexing categorization knowledge base
  • 相关文献

参考文献11

二级参考文献135

共引文献201

同被引文献2

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部