摘要
在搜索引擎结果相关性判断、文字语音转换与识别等领域中,如何准确地分析单词之间的搭配关系是主要研究问题之一。利用互联网中的海量信息,在对大量英文网页进行统计分析的基础上,利用单词的出现频率和单词对的共现频率归纳总结出了未分类互联网页面中单词相关程度判定的经验性结论,提出了一种基于文档集统计分析的单词相关程度排序方法和计算公式,并根据该方法实现了分布式的英文单词相关性挖掘系统的原型。
In the improvement of search engine result,voices recognize fields,how to analyze the relationship between two words exactly is a key point.To analyze and solve this problem,some experiment conclusions are proposed by statistics of frequency of reims and concurrency terms on the basis of considerable English web pages.According to the conclusions,an approach is addressed to calculate ranks of associative terms and a distributed proto-type system is implemented.
出处
《计算机工程与应用》
CSCD
北大核心
2009年第5期151-153,163,共4页
Computer Engineering and Applications
关键词
数据挖掘
网页分类
关联规则
排序算法
文本表示
data mining
web-page classification
association rules
sort algorithm
text representation