摘要
提出了一种实现对中文网页进行自动分类的平衡差值法,它利用本体中主题概念 的层次结构和主题词?特征项的各种语义关系,降低了分类算法的复杂性和计算量?试验表 明,该方法可以获得85%以上的网页分类准确率?
With the explosion of the information on Internet, automatically cla ssifying Web pages is becoming an important problem that information retrieval a nd information search have to be faced. This paper proposes a balance difference algorithm, which uses the semantic relations between topic words, feature items and utilizes the hierarchical structure of the ontology concepts to reduce the complexity and computation of the classification of Chinese Web pages. Experimen ts have proved that this method can get 85% precision at least.
出处
《计算机工程》
CAS
CSCD
北大核心
2003年第11期95-97,共3页
Computer Engineering
关键词
本体
主题识别
语义
层次结构
Ontology
Topic identification
Semantic
Hierarchical structure