摘要
针对英文科技文献的特征,提出一种规则和统计相结合的关键内容识别方法。该方法首先通过对源文档进行特征标识,将其转换成更易于处理的中间文档;然后利用特征还原、线索词匹配、主题识别和临近分析等,从中间文档抽取代表文本的主要信息,生成目标文档。该方法能够有效地辅助科研人员阅读大量的英文科技文献,提高阅读效率。
Based on the features of the English scientific literatures, this paper proposes a method of combining rules with sta- tistics to recognize key content. The method firstly recognizes the features of the source document and turns it into the intermediary document which can be processed more easily. Then, through features recovery, clue word matching, topic recognition and proxi- mal analysis, the method creates the target document by extracting the main information representing the document from the inter- mediary document. The method can effectively help the scientific research personnel read lots of English scientific literatures and improve their reading efficiency.
出处
《情报理论与实践》
CSSCI
北大核心
2012年第9期112-116,共5页
Information Studies:Theory & Application
基金
国家自然科学基金项目“科技创新演化分析理论与方法研究”(项目编号:70873123)
中国科学院文献情报新增能力项目“面向‘未来科技竞争力’分析方法和工具研究”的成果
关键词
特征标识
线索词匹配
主题识别
临近分析
feature recognition
clue word matching
topic recognition
proximal analysis