摘要
基于数量有限的文档,该文构建以基本要素中的head和modifier为节点的无向网络UBEN,调查了话题相关文档的UBEN的连通性,指出了话题相关的文档的UBEN具有的特性。讨论停用词对UBEN连通性的影响,比较了相关文档集和随机文档集的UBEN的联通特性的差异,指出了连通性在一定程度上是文档之间内容相关导致的融合结果。结论对多文档自动文摘和信息检索等任务有一定的意义。
Based on relatively limited number of documents, undirected basic element networks (UBEN), in which nodes are header or modifier, are constructed. The connectivity of UBEN constructed on topic-related documents is investigated and the stopwords influence on connectivity is discussed. Furthermore, the connectivity difference be- tween UBENs respectively constructed on topic-related documents and randomly-selected documents are contrasted. It is pointed out that connectivity of UBEN construced on topic-related documents are resulted from information fu- sion of the topic-related documents on some level, instead of from property of language only. This conclusion is of some significance for some natural language processing tasks, such as automatic summarization, information retrieval, etc.
出处
《中文信息学报》
CSCD
北大核心
2015年第4期103-110,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金项目(61070243)
国家社科基金重大项目(11&ZD189)
贵州省高层次人才科研项目(TZJF-2010年048号)
贵州省科教青年英才培养工程项目("黔省专合字(2012)155号")
贵州师范大学博士科研启动基金项目(11904-05032110011)
中国博士后科学基金项目(2013M531730)
关键词
话题相关文档集
自动文摘
复杂网络
连通性
信息融合
topic-related document set
complex network
automated summarization
information fusion
informa-tion retrieval