期刊文献+

基于分类算法的专利摘要文本分割技术 被引量:3

Text segmentation of patent summary based on a classification algorithm
原文传递
导出
摘要 专利摘要是对专利的浓缩表述,将专利摘要按内容分割后,能更准确地定位对应的专利。由于专利摘要长度较短,而且不同内容间没有明显标志,使其分割不能使用传统的文本分割方法。本文将专利摘要的分割问题转化为句子分类问题,并尝试采用分类算法解决该问题。通过分析不同分类算法以及不同特征对本问题的解决效果,最终验证了利用句子分类方法进行专利摘要分割的可行性。 Patent summaries are condensed representation of the patents,and if patent summaries are divided by using their contents,the corresponding patents will be more accurately positioned.Because the length of each patent summary is too short and there are no signs between two different contents,the traditional text segmentation methods cannot be used.In this paper,the problem of text segmentation of a patent summary was changed into sentence classification,and the classification algorithms attempted to solve the problem.The effects of solving the problem with different classification algorithms and different features were analyzed,and the results proved that the segmentation method of the patent summaries by using the methods of sentence classification is feasible.
出处 《山东大学学报(理学版)》 CAS CSCD 北大核心 2012年第5期68-72,77,共6页 Journal of Shandong University(Natural Science)
关键词 专利摘要 文本分割 句子单元 分类算法 词性 patent summary text segmentation sentence unit classification algorithm part of speech
  • 相关文献

参考文献10

  • 1郭元艺.论专利文献在企业中的作用[J].现代情报,2004,24(6):174-174. 被引量:3
  • 2KAUCHAK D, CHEN Francine. Feature-based segmentation of narrative documents [ C ]//Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. New York: ACM Press, 2005:32-39.
  • 3石晶,胡明,石鑫,戴国忠.基于LDA模型的文本分割[J].计算机学报,2008,31(10):1865-1873. 被引量:54
  • 4SALTON G, BUCKLEY C. Term weighting approaches in automatic text retrieval [J]. Information Processing and Management, 1988, 24(5) :513-523.
  • 5LI Shoushan, XIA Rui, ZONG Chengqing, et al. A framework of feature selection methods for text categorization[C]//Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP. Singapore: ACL and AFNLP, 2009:692-700.
  • 6ZHAO Jun, LIU Kang, WANG Gen. Adding redundant features for CRFs-based sentence sentiment classification [C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. Honolulu: Association for Computational Linguistics, 2008 : 117-126.
  • 7Marti A Hearst. TextTiling: segmenting text into multiparagraph sub-topic passages[J]. Computational Linguistics, 1997, 23(1) :33-64.
  • 8LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: probabilistic models for segmenting and labeling sequence data[ C]//Proceedings of the Eighteenth International Conference on Machine Learning (ICML'01 ). San Francisco: Morgan Kaufmann Publishers Inc, 2001:282-289.
  • 9SUYKENS J A K, VANDERWALLE J. Least squares support vector machines [ J ]. Neural Processing Letters, 1999, 9(3) :293-300.
  • 10Adam L Berger, Vincent J Della Pietra, Stephen A Della Pietra. A maximum entropy approach to natural language processing [ J ]. Computational Linguistics, 1996, 22 ( 1 ) :39-71.

二级参考文献31

  • 1朱靖波,叶娜,罗海涛.基于多元判别分析的文本分割模型[J].软件学报,2007,18(3):555-564. 被引量:15
  • 2石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量:25
  • 3Ellis, C A, Gibbs, S J, Rein, G L. Groupware: some issues and experiences. Communication of ACM, 1991, 34, (1) :39-58.
  • 4Ellis, C A, Gibbs, S J. Concurrency control in groupware systems. In: Proc of ACM SIGMOD Conf on Management of Data,Seattle, 1989: 399-407.
  • 5Elis, C A, Wainer, JA. Conceptual Model of Groupware.Pwc. ACM CSCW'94, 1994: 79-88.
  • 6Greenberg, S, Marwood, D. Real time groupware as a distributed system: concurrency control and its effect on the interface.In: Proc of ACM Conf on Computer Supported Cooperative Work,Chapel Hill, 1994: 207-217.
  • 7Sun, C Z, Ellis, C. Operational transformation in real-time group editors: issues, algorithms, arid achievements. In: Proc of ACM Cord" on Computer Supported Cooperative Work, Seattle,1998: 59-68.
  • 8Sun, CZ, Jia, XH, Zhang, YC, etal. A C.eneric Opelation Transformation Scheme for Consistency Maimenanee in Realtime Coopemtive Editing Systems. In: Proe of ACM SIGGROUP Cordon Supporting Group Work, Phoenix, 1997. 425-434.
  • 9李峰.远程教学系统中应用程序共享的设计与实现[M].中央文献出版社,2002..
  • 10Bolshakov Igor A, Gelbukh A. Text segmentation into paragraphs based on local text cohesion//Vdclav Matousek, Pavel Mautner, Roman Moucek, Karel Tauser eds Proceed ings of the Text, Speech and Dialogue(TSD 2001): Lecture Notes in Artificial Intelligence, N 2166. Springer-Verlag, 2001: 158- 166

共引文献55

同被引文献35

  • 1龚笔宏.SCC——利用分类技术改进的短摘要比较方法[J].清华大学学报(自然科学版),2005,45(S1):1806-1809. 被引量:1
  • 2孙鑫.自然语言处理中语法分析研究[J].现代图书情报技术,2004(S1):44-46. 被引量:3
  • 3刘远超,王晓龙,徐志明,刘秉权.基于粗集理论的中文关键词短语构成规则挖掘[J].电子学报,2007,35(2):371-374. 被引量:17
  • 4石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量:25
  • 5Kauchak D, Chen F. Feature-based segmentation of narrative documents[ A]. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics [ C]. USA:ACL Press,2005.
  • 6Kehagias Ath, Nicolaou A, Petridis V, et al. Text segmentation by product partition models and dynamic programming[ J]. Mathematical and Computer Modelling, 2004,39 ( 13 ) :209-217.
  • 7TurG, Hakkani Tur D, Shriberg E, et al. Integrating prosodic and lexical cues for automatic topic segmentation [J]. Computational Linguistics, 2001, 27( 1 ) : 31-57.
  • 8Levow Gina-Anne. Prosody-based topic segmentation for mandarin broadcast news/ / Proceedings of the HLT- NAACL 2004 [ J ]. Boston, Massach usetts , USA , 2004, 2(12) : 137 - 140.
  • 9David M. Blei, Michael 1. Jordan. Latent Dirichlet allocation [ J ]. Mach Learn Res,2003,3 (5) : 993-1002.
  • 10Minka Thomas, Lafferty John. Expectation- propagation for the generative aspect model/ / Proceedings of the Uncertainty in Artificial Intelligence(UAI) . Edmonton, Alberta, Canada , 2002,22(13) :352-359.

引证文献3

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部