摘要
自动文摘研究作为自然语言处理研究的一个重要且实用的分支 ,目前逐渐成为 Internet信息检索等应用领域的重要研究课题之一 .该文提出的基于语料库的文摘试图将传统的基地语言学分析的文摘方法和基于统计的文摘方法的优点结合在一起 .基于语料库的文摘方法的实质即以系统外的分析代价换取系统内的算法效率 .该文描述的算法给出了基于层次词典的关键字提取和基于语料库的自动文摘的实现 .
ing is a vital and practical information processing task in natural language processing, and becomes an important problem in domains such as Internet information retrieval. An approach based on corpus proposed by this paper provides an integration of the advantages of linguistic analysis based methods and those based on statistics. In essence, the basic idea of corpus based method is at the expense of the cost of analysis outside the system to gain the efficiency of the algorithm inside the system. The algorithm given by the paper implements both keywording and abstracting while the former is based on a hierarchical dictionary and the latter on the corpus.
出处
《软件学报》
EI
CSCD
北大核心
2000年第3期308-314,共7页
Journal of Software
关键词
自动文摘
语料库
层次词典
自然语言处理
Automatic abstracting, corpus, keywording, hierarchical dictionary.