摘要
通过对专业信息自动分类的文本特征提取方法的分析研究,提出在文本分析时根据Web内容挖掘和结构挖掘的方法提取特征词条来建立文本特征空间,同时利用专业类别向量、专业词典技术可有效解决高维空间问题。
By analyzing and studying automatic classification features selection of topic web pages, this paper presents the point that VSM (Vector Space Model) can be built by web structure mining and content mining together during features collection in web pages analysis, in order to solve high - dimensionality problem, meanwhile, topic categorization vector and topic dictionary too are very practical to solve high -dimensionality problem.
出处
《兰州石化职业技术学院学报》
2007年第3期33-35,共3页
Journal of Lanzhou Petrochemical Polytechnic
基金
2005年甘肃省自然科学基金项目(3ZS051-A25-047)
关键词
WEB挖掘
专业信息
文档自动分类
特征提取
Web mining
topic information
automatic classification of Web pages
features selection