期刊文献+

人才网页自动识别系统研究 被引量:1

Study on Talents Description Web Page Automatic Recognition System
原文传递
导出
摘要 提出人才网页自动识别系统设计,实现对Nutch定向采集系统抓取的高校网站页面进行人才描述网页自动识别。识别过程中使用自动获取的网页的URL特征、网页Title标签特征、链接文字特征以及网页文本内容特征,使用人名词表、正面特征词表、负面特征词表对各项识别特征进行匹配以计算特征值,借助开源软件LibSVM实现基于多特征值的人才网页自动识别。 The paper brings forward a talents description Web page automatic recognition system, realizes automatic recognition methods of university talents description Web pages which are crawled by Nutch crawl system. During the automatic recognition process, features of Web page URL, title label content, anchor text content and Web page content are used. The value of those features are computed based on matching of name list, positive feature word list and negative feature word list. Based on multiple feature values, the system uses LibSVM to realize talents description Web page automatic recognition.
作者 徐健 温浩胜
出处 《现代图书情报技术》 CSSCI 北大核心 2011年第6期20-26,共7页 New Technology of Library and Information Service
基金 2010年度中山大学重大项目培育和新兴交叉学科资助计划项目"高层次科技人才信息挖掘和评价方法与系统"的研究成果之一
关键词 LIBSVM 人才网页 自动分类 分类特征提取 LibSVM Talents description Web page Automatic classification Classification feature extraction
  • 相关文献

参考文献15

  • 1Eiekhoff C, Serdyukov P, De Vfies A P. Web Page Classification on Child Suitability [ C ]. In : Proceedings of the 19th ACM Interna- tional Conference on Information and Knowledge Management. New York, NY, USA:ACM, 2010:1425 - 1428.
  • 2Large A, Beheshti J, Rahman T. Design Criteria for Children's Web Portals : The Users Speak Out [ J ]. Journal of the American Society for Information Science and Technology, 2002, 53 (2) : 79- 94.
  • 3Hung B Q, Otsubo M, Hijikata Y, et al. Extraction of Semantic Text Portion Related to Anchor Link [ J ]. IEICE Transactions on Information and Systems, 2006,89 (6) : 1834 - 1847.
  • 4吴思竹,张智雄,李峰.科研机构动态监测的网络资源重要性排序方法研究[J].情报理论与实践,2011,34(3):49-53. 被引量:1
  • 5Wen H, Fang L, Guan L. Automatic Web Page Classification Using Various Features [ C ]. In:Proceedings of the 9th Pacific Rim Conference on Multimedia. Springer Verlag, 2008:368 -376.
  • 6Ozel S A. A Web Page Classification System Based on a Genetic Algorithm Using Tagged - terms as Features [ J ]. Expert Systems with Applications, 2011, 38(4) :3407 -3415.
  • 7许世明,武波,马翠,邸思,徐洪奎,杜如虚.一种基于预分类的高效SVM中文网页分类器[J].计算机工程与应用,2010,46(1):125-128. 被引量:19
  • 8Nutch[ EB/OL]. [ 2011 - 05 - 08 ]. http://wiki, apache, org,/ hutch/.
  • 9Introduction Heritrix[ EB/OL ]. [ 2011 - 05 - 08 ]. http ://crawl- er. archive, org/.
  • 10Web - Harvest [ EB/OL]. [2011 - 05 - 08 ]. http://web - har- vest. sourceforge, net/.

二级参考文献15

  • 1徐凤亚,罗振声.文本自动分类中特征权重算法的改进研究[J].计算机工程与应用,2005,41(1):181-184. 被引量:56
  • 2Yang Yi-ming.An evaluation of statistical approaches to text categorization[J].Information Retrieval, 1999,1 ( 1 ) : 76-88.
  • 3Yang Yi-ming,Slattery S,Ghani R.A study of approaches to hypertext categorization [J].J Intelligent Information System, 2002,18 (2/3):219-241.
  • 4Attardi G,Gull A,Sebastiani F.Automatic Web page categorization by link and.context analysis[C]//Proceedings of 1st European Symposium on Telematics,Hypermedia and Artificial Intelligence, (Varese, IT ), 1999.
  • 5Shih L K,Karger D R.Using URLs and table layout for Web classification tasks[C]//Proceedings of the 13th International Conference on World Wide Web,200g.
  • 6Shen Dou,Chen Zheng,Yang QianglWeb-page classification through summarization[C]//Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2004 : 210-213.
  • 7朱慕华,朱靖波,陈文亮.面向文本分类的多类别SVM组合方式的比较[c]//全国第八届计算语言学联合学术会议,2005:435-441.
  • 8Lin C J,Weng R C,Keerthi S S.Trust region Newton method for large-scale logistic regression[R/OL].2007.http://www.csie.ntu.edu. tw/-cjlirdliblinear.
  • 9Joachims T.Text categorization with support vector machines: Learning with many relevant features[C]//Proceedings of ECML- 98,10th European Conference on Machine Learning,1998.
  • 10XIN J, SCOTT S, RUI M, et al. Topic initiator detection on the World WideWeb [EB/OL].[2010-06-03]. www. cs. uiuc. edu/homes/hanj/pdf/wwwl0_ xjin. pdf.

共引文献18

同被引文献19

引证文献1

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部