期刊文献+

基于交叉熵与困惑度的LDA-SVM主题研究 被引量:2

Research on LDA-SVM subject based on cross entropy and perplexity
下载PDF
导出
摘要 目前对于中文影视剧本的分类主要借助人工经验,具有成本高、效率低等特点.当前没有针对中文影视剧本主题自动分类的相关研究,本文将对主题提取进行研究,传统主题生成模型借助于文档和段落、段落和语句、语句和词的相似性,而忽略了文本语句与语句之间的相似性.首先,采用ISOMAP方法降低样本集的向量空间维度;其次,提出交叉熵结合困惑度的算法模型,进而确定LDA需要提取的最优主题数目;最后,通过剧本-主题的方式,利用LDA算法挖掘剧本的隐含主题词,同时利用SVM对主题词做出进一步的分类. At present,the classification of Chinese film and television scripts mainly relies on manual experience,which has the characteristics of high cost and low efficiency.There is currently no research on the automatic classification of Chinese film and television scripts.This paper explores the topic extraction.The traditional topic generation model relies on the similarity of documents and paragraphs,paragraphs and sentences,sentences and words,while ignoring the similarity between text statements and statements.Firstly,the ISOMAP method is used to reduce the vector space dimension of the sample set.Secondly,the algorithm model of cross entropy combined with perplexity is proposed to determine the optimal number of topics that LDA needs to extract.Based on the above,through the script-theme method,the script is used to mine implicit subject terms of the script,while using SVM to further classify the subject words.
作者 薛佳奇 杨凡 XUE Jiaqi;YANG Fan(School of Information and Control Engineering,Xi'an University of Architecture and Technology,Xi'an 710055,China;School of Science,Xi'an University of Architecture and Technology,Xi'an 710055,China)
出处 《智能计算机与应用》 2019年第4期45-50,共6页 Intelligent Computer and Applications
关键词 中文影视剧本 ISOMAP降维 LDA 交叉熵 困惑度 SVM Chinese film and television script ISOMAP dimension reduction LDA cross entropy perplexity SVM
  • 相关文献

参考文献2

二级参考文献9

  • 1Vapnik V N. The nature of statistical learning theory[M]. New York : Springer Verlag, 1995.
  • 2Muller K R, Mika S, Ratsch G, et al. An introduction to kernel-based learning algorithms [J]. IEEE Transactions on Neural Networks, 2001, 12(2) : 181-201.
  • 3Mika S, Scholkopf B, Smola A J, et al. Kernel PCA and denoising in feature spaces[A]. In:Kearns M S, Solla S A, Cohn D A,Eds. Advances in Neural Information Processing Systems 11[M], Cambridge, MA USA: MIT Press, 1999:536-542.
  • 4Scholkopf B, Smola A J, Muller K R. Non-linear component analysis as a kernel eigenvalue problem[J]. Neural Network,1998,10:1299-1319.
  • 5Scholkopf B, Mika S, Burges C J C, et al. Input space versus feature space in kernel-based methods[J]. IEEE Transactions on Neural Networks, 1999,10(5) : 1000-1017.
  • 6Smola A J. Learning with kernels[D]. Technische Universitat,Berlin, German, 1998.
  • 7Scholkopf B. The kernel trick for distances [R]. Technical Report MSR-TR-2000-51,Microsoft Research, 19 May 2000.
  • 8Burges C J C. A tutorial on support vector machines for pattern recognition[J]. Knowledge Discovery and Data Mining, 1998,2(2) :121-167.
  • 9Hsu Chih-wei, Lin Chih-jen. A comparison of methods for multiclass support vector machines [J]. IEEE Transactions on Neural Networks, 2002,13(2) : 415-425.

共引文献24

同被引文献4

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部