期刊文献+

基于频率共现熵的跨语言网页自动分类研究 被引量:3

Web Pages Auto Classification Based on Frequently Co-Occurring Entropy
下载PDF
导出
摘要 研究了基于频率共现熵的跨语言网页自动分类问题,使用翻译软件将所有中文网页翻译为英文,计算中文和英文网页的共现特征频率共现熵值,确定中文和英文网页的共现知识,并与英文网页相结合训练中文分类模型.实验结果表明,该方法与贝叶斯分类模型、向量空间分类模型和信息瓶颈模型相比体现出良好的性能. An approach to address the cross-language web pages automatic classification problem based on frequently co-occurring entropy(FCE) is been proposed.The algorithm first translating all Chinese web pages to English by simple translation software.Second,computing the frequently co-occurring entropy using all Chinese and English web pages.Third,selecting the common part between Chinese pages and English pages based on the FCE ranks.Last,training a Chinese classification model by English pages with the common part.The experimental results in ODP corpus show the method performs well performance than NB,SVM and IB models.
出处 《江西师范大学学报(自然科学版)》 CAS 北大核心 2011年第3期240-245,共6页 Journal of Jiangxi Normal University(Natural Science Edition)
基金 国家自然科学基金(60963014) 江西省教育厅青年科学基金(GJJ10116) 江西省教育厅科技课题(2007-129)资助项目 江西省自然科学基金(2008GZS0052) 江西省科技攻关项目(2006-184)
关键词 跨语言 网页分类 频率共现熵 贝叶斯分类 自适应贝叶斯分类 cross-language web pages classification frequently co-occurring entropy naive Bayes adapted-based naive Bayes
  • 相关文献

参考文献18

  • 1Belur V Dasarathy. Nearest neighbor (NN) norms: NN pattern classification techniques [C]. California: IEEE Computer Society Press, 1991: 809-814.
  • 2Zhu Lanjuan. The theory and experiments on automatic chinese document classification [J]. Journal of the China Society for Sci- entific and Technical Information, 1987(6): 90-111.
  • 3Quinlan J R. Induction of decision trees [J]. Machine Leaning, 1986(1): 81-106.
  • 4Kwok T Y. Automatic text categorization using support vector machine [EB/OL]. [2010-11-22]. http://citeseer.ist.p5u.edu/view- doc/summary? doi= 10.1.1.33.4931.
  • 5Lang K. Newsweeder: learning to filter net-news [C]. California: Morgan Kaufman, 1995:331-339.
  • 6Ni Xiaochuan, Xue Guirong, Ling Xiao, et al. Exploring in the web blog space by detecting informative and affective articles [EB/OL]. [2010-11-18]. http://www2007.org/papers/paper225, pdf.
  • 7Liu Yiqun, Fu Yupeng, Zhang Min, et al. Automatic search engine performance evaluation with click-through data analysis [EB/OL]. [2010-12-15 ]. http://www2007.org/posters/poster911, pdf.
  • 8Ling Xiao, Xue Guirong, Dai Wenyuan, et al. Can chinese web pages be classified with english data source? [EB/OL]. [2101-12-15]. http://www.arnetminer.org/viewpub.do?pid= 675387.
  • 9Tan Songbo, Cheng Xueqi, Wang Yuefen, et al. Adapting naive bayes to domain adaptation for sentiment analysis [EB/OL]. [2010-12-11 ]. http://portal.acm.org/citation.cfm?id= 1533760.
  • 10Ricardo BaezaYates, Berthier RibeiroNeto. Modem information retrieval [M]. NewYork: Addison Wesley, 2005.

二级参考文献18

  • 1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量:20
  • 2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:96
  • 3Aas K., Eikvil L. Text Categorization:A Survey[Z]. http://citeseer.nj.nec.com/aas99text.html, 1999.
  • 4A. Hyvarinen, E. Oja. Independent component analysis: algorithms and applications[J]. Neural Networks 13,2000: 411-430.
  • 5Angela Montanari, Laura Lizzani. a projection pursuit approach to variable selection[J]. Computational Statistic&Data Analysis 35,2001:463-473.
  • 6DudaR. Hart P.E Stock D.G.李宏东 姚天翔等译.Pattern Oassifieation,Second Edition[M].模式分类:2003年9月第1版[M].机械工业出版社,2004年2月..
  • 7Emmanuel A., Iafis O. J., Unsupervised Feature Extraction Using Projection Pursuit[Z]. http://www.censsis.neu.edu/Education/StudentResearch/2001/posters/arzuaga-cruz_e!., 2001.
  • 8Fabrizio Sebastiani. Machine Learning in Automated Text Categorization[J].ACM Computing Surveys, Vol.54, No.1, March 2002.
  • 9Luis O. Jimenez, David Landgrebe. High Dimensional Feature Reduction Via Projection Pursuit[D]. TR-ECE 96-5 April 1995.
  • 10Mizuta M. Projection Pursuit into High Dimensional Space and its Applications[Z]. http://www.stat.fi/isi99/pmceedings/arkisto/varasto/mizu0171.1999.

共引文献398

同被引文献34

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部