摘要
针对文本信息内容结构参差不齐的问题,提出一种评价文本内容结构分析方法,该方法将文本中的句子作为节点,句子之间的共同名词作为边,构建文本复杂网络,并选取复杂网络的拓扑性质对文本结构特征进行分析。基于一个新闻文本案例构建复杂网络,并计算度、强度、最短路径、加权聚类系数等衡量指标,这些指标能很好地评价文本内容结构的好坏,也为理解和提取文本的中心思想、生成摘要、文本检索过滤提供重要参考依据。
To solve the problem of irregular structure of some texts, this paper presents a method based on the complex network theory to evaluate the text structure. This method uses a node to represent a sentence and an edge between two nodes to represent a common word of two sentences, which construct the complex network of a text. Then the authors ana- lyze characters of text structure by topological characteristics of text complex network. By building a text complex network based on a selected article, the degree, the degree of intensity, the shortest paths and the weighting clustering coefficients of this selected article are calculated. The results show that the structure of the text content can be effectively evaluated by this proposed method. Moreover, the results also provide important references to understand main ideas, to generate sum- maries and to filter text retrieval of a given text.
出处
《现代图书情报技术》
CSSCI
北大核心
2011年第1期69-73,共5页
New Technology of Library and Information Service
基金
教育部人文社会科学研究一般项目(规划基金项目)"地质资料信息社会化服务模型研究:基于复杂网络分析"(项目编号:10YJA630001)的研究成果之一
关键词
文本复杂网络
内容结构
最短路径
聚类系数
Complex network of text Content structure Shortest path Clustering coefficient