摘要
借鉴tf-idf加权思想,利用新闻标题来做中文新闻网页自动分类的依据,构建基于标题的中文新闻自动分类方法,并设计多个实验对各种基于标题的中文新闻网页自动分类方法进行评测。实验结果表明,基于标题对中文新闻网页进行自动分类,可以大大缩短判断处理时间,节省存储空间,且准确率较高,特别是改进的类目加权法分类效果最好。
This paper describes automatic Chinese news Web pages classification by using news title based on tf-idf weighting scheme, and constructs correlation degree of news title which determines appropriate category for each news Web page. The performance of this proposed method is evaluated in terms of top one score, top two score, and top three score. The experimental evaluation demonstrates that improved tf - idf weighting scheme with categories provides high accuracy with the classification of Chinese news Web pages.
出处
《现代图书情报技术》
CSSCI
北大核心
2008年第10期59-68,共10页
New Technology of Library and Information Service