摘要
为探索高频词汇间上下文关系的远近,本文研究了一种基于英文文本中高频词汇的可视化算法流程,并进行了可视化实现。我们首先用统计算法从英文文本中抽取出高频词汇及词汇间的上下文,然后定义了3种词汇间的连接方式,计算出有上下文关系的词汇间的关系度,并通过k-means算法对词汇间的关系度进行聚类,以体现出词汇间关系的远近,最后利用放射状树布局对聚类结果进行可视化。通过这种可视化形式,我们能够快速理解英文文本的内容。
Targeting at exploring whether high-frequency words' context relations are close or distant,this paper studied on the algorithmic process of a kind of visual form based on high-frequency words in English texts and achieves this visual form.This paper firstly used statistic algorithm to extract high-frequency words and their context,then defined three kinds of context relations among words,compute values of relations among words that have context,cluster the values' set through k-means cluster algorithm to show whether words' context relations are close or distant.Finally,visualized these clustering results by means of radial layout graph.Through this visual form,can quickly understand the contents of the English text.
出处
《现代情报》
CSSCI
2011年第8期21-24,共4页
Journal of Modern Information