期刊文献+

基于LSA的二次降维法在中文法律案情文本分类中的应用 被引量:8

Application of quadratic dimension reduction method based on LSA in classification of the chinese legal text
下载PDF
导出
摘要 利用文本挖掘来表达文本特征,由于文本表现出巨大的维数,从而导致处理过程计算复杂,因此,首先应该对文本进行降维处理。潜在语义分析理论(latent semantican alysis,LSA)作为一种文本聚类的方法,在有效提取文本信息表现出许多特有的优势,在多个领域中被引用。本文构建了中文法律案情文本分类系统,引入LSA方法进行文本向量空间的二次降维,并利用LSA方法处理后的特征集——文档矩阵代替原有矩阵,从而进一步删除噪声,加快分类系统的处理速度。文中给出了具体实现过程及实验数据,通过实验证明该方法能收到较好的效果。 The text feature matrix has large dimensionality in expressing text feature using data mining, and leads to complex computation. So it is needed to reduce dimensionality before data mining. As text clustering method, latent semantic analysis(LSA)has advantage in text information extraction, and have been widely used in many fields. This paper established a primary automatic classification system for chinese legal text with quadratic dimension reduction method based on LSA. In the system LSA is used in increasing the speed of text classification processing with a feature set-text matrix treated by LSA replacing old one for farther denoising. The process of realization and the experiment data were given in this paper. Experiment results show that it has good effects.
机构地区 江西蓝天学院
出处 《电子测量技术》 2007年第10期111-114,共4页 Electronic Measurement Technology
关键词 文本分类 二次降维 法律文本 text classification quadratic dimension reduction legal text
  • 相关文献

参考文献7

  • 1WHITE J. Proc of the seventeenth Int'l ACM SIGIR Conf on research and development in information retrieval[C]. New York: ACM Press, 1994:13-22.
  • 2YANG Y. An evaluation of statistical approaches to text categorization[J]. Information Retrieval, 1999, 1 (112) :69-90.
  • 3何新贵,彭甫阳.中文文本的关键词自动抽取和模糊分类[J].中文信息学报,1999,13(1):9-15. 被引量:54
  • 4MLADENIC D, GROBELNIK M. Feature selection for unbalanced class distribution and Naive Bayes Proe of the 16th Int'l Conf on Machine Learning (ICML'99) [C]. San Francisoo: Morgan Kaufmann Pubfishers,1999:258-26Z
  • 5LEWIS D. An evaluation of phrasal and clustered representations on a text categorization task: In 15th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR92) [C]. 1992 : 37-50.
  • 6陆玉昌,鲁明羽,李凡,周立柱.向量空间法中单词权重函数的分析和构造[J].计算机研究与发展,2002,39(10):1205-1210. 被引量:126
  • 7罗三定,陆文彦,王浩,贾维嘉.基于概念的文本类别特征提取与文本模糊匹配[J].计算机工程与应用,2002,38(16):97-99. 被引量:22

二级参考文献6

共引文献195

同被引文献89

引证文献8

二级引证文献112

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部