期刊文献+

基于Web挖掘的专业文本特征提取方法研究 被引量:1

Study on Features Selection Algorithms of Topic Pages Automatic Classification Based on Web Mining
下载PDF
导出
摘要 通过对专业信息自动分类的文本特征提取方法的分析研究,提出在文本分析时根据Web内容挖掘和结构挖掘的方法提取特征词条来建立文本特征空间,同时利用专业类别向量、专业词典技术可有效解决高维空间问题。 By analyzing and studying automatic classification features selection of topic web pages, this paper presents the point that VSM (Vector Space Model) can be built by web structure mining and content mining together during features collection in web pages analysis, in order to solve high - dimensionality problem, meanwhile, topic categorization vector and topic dictionary too are very practical to solve high -dimensionality problem.
出处 《兰州石化职业技术学院学报》 2007年第3期33-35,共3页 Journal of Lanzhou Petrochemical Polytechnic
基金 2005年甘肃省自然科学基金项目(3ZS051-A25-047)
关键词 WEB挖掘 专业信息 文档自动分类 特征提取 Web mining topic information automatic classification of Web pages features selection
  • 相关文献

参考文献2

二级参考文献10

  • 1Jiawei Han, Micheline Kamber. Data Mining Concepts and Techniques[ M]. Morgan Kaufmann Publishers,2001.
  • 2S Brin, L Page. The Anatomy of a Large-seale Hypertextual Web Search Engine [ A ]. Proc of the 7th World-Wide Web Conf (WWW7) [C]. 1998.
  • 3Arul Prakash Asirvatham,Kraanthi Kumar Ravi. Web Page Classification Based on Document Structure[ EB/OL]. citeseer. ist. psu.edu/asirvatham01 web. html, 2001 - 05.
  • 4Craig Utley. SQL Server 2000 Web Application Developer's Guide [M]. McGraw-Hill, 2001.
  • 5林杰斌 刘明德 陈湘.数据挖掘与OLAP[M].北京:清华大学出版社,2003..
  • 6Yang Y, Wilbur W J. Using Corpus Statistics to Remove Redundant Words in Text Categorization. In J. Amer. Soc. Inf Sci.,1996.
  • 7Yang Y, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization. KDD-2000 Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston,MA,UA, 2000.
  • 8Galavotti L, Sebastiani F, Simi M. Feature Selection and Negative Evidence in Automated Text Categorization. KDD-2000 Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston,MA, UA, 2000.
  • 9Mena J. Data Mining Your Website. America, 2000:368.
  • 10张义忠,赵明生,朱精南.基于内容的中文网页自动分类研究[J].信息与控制,2001,30(5):408-412. 被引量:5

共引文献18

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部