期刊文献+

基于相似度曲线的新闻网页分类模型研究

A news classification model based on the similarity curve
下载PDF
导出
摘要 随着互联网的快速发展,网络日益成为人们查找有用数据的重要手段。由于WWW上的信息很多存储在HTML页面上,网页分类就显得十分必要。利用各种开源软件,详细设计并实现了一个中文网页分类模型,同时利用元搜索技术实现数据采集,有效地提高了采集的广度和深度。在进行中文分词时利用了专业词库,此方法提高了分词的准确率,在建立VSM时提出了一种基于相似度曲线的网页特征抽取方法,此方法能有效解决特征提取的高维问题,并对提高特征区分度,缩小运算量具有良好的效果。 With the rapid development of Intemet, network became useful tool to search important source of data. Because information on the WWW storages in HTML pages, the web classification becomes a very necessary. This paper used open-source software, designed and implemented of a Chinese web classification model. This paper used meta search technology to achieve data acquisition, effectively raised the breadth and depth of acquisition. Using the professional thesaurus to segment improves the effectiveness of the segment. This paper presents a feature extraction method based on the similarity curve when VSM model is set up, it has effectively resolved the high dimensional feature extraction and raised the distinction, reduced the amount of computation and achieved good result.
出处 《信息技术》 2008年第2期15-18,共4页 Information Technology
基金 国家自然科学基金(60673160)
关键词 相似度曲线 VSM模型 特征抽取 TF-IDF公式 similarity curve VSM model feature extraction TF-IDF formula
  • 相关文献

参考文献7

  • 1中国互联网络信息中心(CNNIC).第19次中国互联网络发展状况统计报告[R].2007.
  • 2[美]Christopher D Manning,[德]Hinrich Sch tze.统计自然语言处理基础[M].苑春法,等译.电子工业出版社,2007:82.
  • 3Riley Michael D. Some Applications of tree-based modeling to speech and language indexing[ J].Proceedings of the DARPA Speech and Natural Language Workshop. Morgan Kaufmann, Reynar and Tatnaparkhi, 1997 : 339 - 352.
  • 4Mikheev, Andrei. Feature lattices for maximum entropy modelling[J]. ACL 1998,36:848 - 854.
  • 5Perkins C E, Royer E M, Das S R. IP Address Autoconfiguration for Ad Hoc Networks [ M ]. Intemet Engineering Task Force, MANET Working Group, July 2000.
  • 6孙晋文,肖建国.基于SVM文本分类中的关键词学习研究[J].计算机科学,2006,33(11):182-184. 被引量:12
  • 7邵华,高凤荣,邢春晓,蒋丽华.基于VSM的分层网页推荐算法[J].计算机科学,2006,33(11):86-88. 被引量:5

二级参考文献16

  • 1Salton G,Wong A, Yang C S. A Vector Space Model for Automatic Indexing. Information Retrieval and Language Processing,1975, 18(11) : 613-620
  • 2Kou Zhong-bao, Ban Tao, Zhang Chang-shui. Dissimilarity reconstruction in information recommendation. In: Proceeding of 15th International Conference on Computational Intelligence and Multimedia Applications(ICCIMA 2003) ,2003. 235-242
  • 3De Campos L M, Fernandez-Luna J M, Gomez M. A decisionbased approach for recommending in hierarchical domains. In:Proceeding of 8th European Con- ference on Symbolic and Quantitative Approaches to Reasoning with Un- certainty, (ECSQARU2005) . Barcelona, 2005. 123-135
  • 4Sugiyama, Kazunari, Hatano, et al. Adaptive Web search based on user profile constructed without any effort from users. In:Proceeding of the 13th international conference on World Wide Web(WWW2004). New York, 2004. 675-684
  • 5Vapnik V. The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995
  • 6Cortes C,Vapnik V. Support Vector Networks. Machine Learning,1995,20
  • 7Osuna E, Freund R, Girosi T. Training Support Vector Machines: An Application to Face Detection. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. New York,1997
  • 8Joachims T. Text Categorization with Support Vector Machines:Learning with Many Relevant Features. In: Proceedings of the European Conference on Machine Learning, Berlin, Springer ,1998
  • 9Yang Yiming, Liu Xin. A re-examination of text categorization methods. In:Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, 1999
  • 10Joachims T. Making Large-Scale SVM Learning Practical. In:Scholkopf B, Burges C, Smola A,eds. Advances in Kernel Methods Support Vector Learning. MIT Press, 1999

共引文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部