期刊文献+

基于小波分析的电子文献分类

Electronic Document Classification Based on Wavelet Analysis
下载PDF
导出
摘要 文献数据的自动化分类,将在数字图书馆中占据越来越重要的地位。一般采用基于支持向量机的核方法,在标准测试集合上进行文献数据分类,具有某些不足。该方法存在文献向量规模庞大、核函数非正交且多义、重现率计算耗时等缺陷;不使用数字图书馆的真实数据测试,算法的实际说服力不强。为了解决这些问题,采用词汇扩展对文献向量进行预处理,得到少而精、正交无歧义的新文献向量;对文献向量按照语义排序,提高访问和计算速度;借助小波核将文献映射到L2空间进行文献分类。采用中国学术期刊网的真实分类数据,从摘要信息和全文文献两个角度进行验证,结果表明该方法优于核方法,具有一定的理论研究和实际应用价值。 The automatic document classification will play an important role in digital library(DL). The common methods classify the standard test collections with the kernel method based on support vector machine ( SVM). There are some drawbacks in this method, such as the large-scale document vectors, non-orthogonal and polysemous kernel function, time-consuming of calculating re-occurrence, low authority derived from not using real DL data. To solve these problems, term expansion is used to generate fewer but better, orthogonal and unambiguous document vectors. These new document vectors are carried out semantic ordering. The wavelet kernel is used to map the documents onto L2 space for classification. The real classification records in China National Knowledge Internet(CNKI) are used to validate this method in aspects of abstract and fulhext. From the experimental results, it can be seen that our method is better than kernel method.
作者 张开选 夏旭
出处 《情报学报》 CSSCI 北大核心 2013年第9期1000-1008,共9页 Journal of the China Society for Scientific and Technical Information
关键词 电子文献分类机器学习 支持向量机 L2空间 小波分析 electronic document classification, machine learning, support vector machine, L2 space, wavelet analysis
  • 相关文献

参考文献32

  • 1Paynter G W. Developing practical automatic metadata assignment and evaluation tools for internet resources [ C ]//Proceedings of JCDL-05, 5th ACM/IEEE-CS joint conference on digital libraries. New York: ACM, 2005 : 291-300.
  • 2瞿靖,刘利萍,赵书城.MARC到其他元数据格式的数据复用软件[J].上海交通大学学报,2003,37(S1):243-246. 被引量:4
  • 3Bethard S, Wetzer P, Butcher K, et al. Automatically characterizing resource quality for educational digital libraries [ C ]// Proceedings of JCDL-09, 9th joint international conference on digital libraries. New York: ACM, 2009:221-230.
  • 4Martins W, Gonalves M, Laender A,et al. Learning to assess the quality of scientific conferences: a case study in computer science[ C] // Proceedings of JCDL-09, 9th joint international conference on digital libraries. New York : ACM ,2009 : 193-202.
  • 5张铭,银平,邓志鸿,杨冬青.SVM+BiHMM:基于统计方法的元数据抽取混合模型[J].软件学报,2008,19(2):358-368. 被引量:27
  • 6Hu Yunhua, Li Hang, Cao Yunbo, et al. Automatic extraction of titles from general documents using machine learning[ C ]//Proceedings of JCDL-05, 5th ACM/IEEE- CS joint conference on digital libraries. New York:ACM, 2005 : 145-154.
  • 7Efron M, Elsas J, Marchionini G, et al. Machine learning for information architecture in a large governmental Web site[ C l// Proceedings of JCDL-04, 4th ACM/IEEE-CSjoint conference on digital libraries. New York: ACM, 2004 : 151-159.
  • 8张玉芳,黄涛,艾东梅,熊忠阳,唐蓉君.Markov逻辑网在重复数据删除中的应用[J].重庆大学学报(自然科学版),2010,33(8):36-41. 被引量:3
  • 9Avancini H, Lavelli A, Sebastiani F, et al. Automatic expansion of domain-specific lexicons by term categorization [ J]. ACM Transaction on Speech and Language Processing ,2006,3 ( 1 ) : 1-30.
  • 10Ramsey M C, Chen Hsinchun, Zhu Bin, et al. A collection of visual thesauri for browsing large collections of geographic images [ J ]. Journal of the American Society for Information Science, 1999,50 ( 9 ) : 826-834.

二级参考文献69

  • 1刘涌泉.中国计算机和自然语言处理的新进展[J].情报科学,1987,8(1):64-70. 被引量:4
  • 2陈振洲,李磊,姚正安.基于SVM的特征加权KNN算法[J].中山大学学报(自然科学版),2005,44(1):17-20. 被引量:51
  • 3战学刚 林鸿飞 等.中文文献的层次分类方法.上海交通大学OA室技术报告[M].,1999..
  • 4刁倩 王永成.中文信息自动分类的仿人算法.Proceedings of ICCIP’98,Nov[M].,1998..
  • 5王永成.中文信息处理技术及其基础[M].上海:上海交通大学出版社,1992..
  • 6DE R I., KERSTING K. Probabilistic logic learning [J]. ACM SIGKDD Explorations.. Special issue on Multi Relational Data Mining, 2003, 5(1): 31-48.
  • 7DZEROSKI S. Relational data mining[M]. US: Springer, 2005:869-898.
  • 8NEWCOMBE H B, KENNEDU J M, AXFORD S J, et al. Automatic linkage of vital records[J]. Science, 1959,130 : 954-959.
  • 9FELLEGI I P, SUNTER A B. A theory for record linkage [J]. Journal of the American Statistical Association,1969, 64(328) : 1183-1210.
  • 10AC-RESTI A. Categorical data analysis (2nd Edition) [M]. NewYork: Wiley, 2002: 372.

共引文献101

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部