期刊文献+

基于特征过滤的新词语提取 被引量:1

New Words Extraction Based on Feature Filter
下载PDF
导出
摘要 基于特征过滤的新词语自动提取方法是一种新的新词语提取法。通过对近5年新词语构成特点及在语料中的分布、频率等的分析,确定特征碎片的范围,运用特征过滤的方法获取字符串集。然后,根据新词语构词特点、结构类型等进行过滤,最终提取出新词语的候选集。该方法可以在保证较高召回率的前提下获取较少的字符串,提高垃圾串过滤效率,进而提高准确率。 A new word extraction method was developed that first remove feature segmentation from target marked sentence.The range of feature segmentation set is determined by the traits,frequency of used characters and probability of structure mode of late new words and background knowledge.To collect the set of strings by the method of feature filter.Then,the set of strings is filtered by traits of new words and the probability of single character.Finally,the set of candidate strings of new words are acquired.Under the method are acqurred,the scale of strings will be diminished,then come to improve accuracy while keep the higher recall.
作者 朱波 侯敏
出处 《北华大学学报(社会科学版)》 2012年第5期18-22,共5页 Journal of Beihua University(Social Sciences)
关键词 新词语 特征碎片 特征过滤 自动提取 New word Feature segmentation Feature filter Auto-exxraction
  • 相关文献

参考文献9

  • 1国家语言资源监测与研究中心.中国语言生活状况报告(2007):下编[R].北京:商务印书馆,2008:347.
  • 2亢世勇,徐艳华.基于语料库的新词语识别规则研究[J].烟台师范学院学报(哲学社会科学版),2004,21(4):113-116. 被引量:4
  • 3Chen Aitao. Chinese word segmentation using mini- mal linguistic knowledge [ C ]//SIGHAN' 03. Strouds- berg, PA, USA : ACL,2003 : 148 - 151.
  • 4GUO Zhili. Using mutual information to identify new features for text documents of various domains [C]//Proccedings of 17th Pacific Asia Conference on Language, Information and Computation. Singapore : CO- LIPS Publications ,2003 : 372 - 379.
  • 5Wang Meichu, Huang Churen, Chen Kehjiann. The identification and classification of unknown words in Chi- nese : an n-grams-based approach [ C ]//The Proceedings of the 1994 Kyoto Conference : A Festschrift for Professor Akira Ikeya. Tokyo:The Logico-Liguistic Society of Ja- pan, 1995 : 113 - 123.
  • 6Li T J. Rough approximation operators on two uni- verses of discourse and their fuzzy extensions [ J ]. Fuzzy Sets and Systems ,2008,159:3033 - 3050.
  • 7邹纲,刘洋,刘群,孟遥,于浩,西野文人,亢世勇.面向Internet的中文新词语检测[J].中文信息学报,2004,18(6):1-9. 被引量:59
  • 8何伟,侯敏,文采菊.流行语时空监测模型研究[C]//内容计算的研究与应用前沿-全国第九届计算语言学学术会议论文集.北京:清华大学出版社,2007:267-282.
  • 9侯敏.语言监测与词语的计量研究[C]//中文信息处理前沿进展-中国中文信息学会二十五周年学术会议论文集.北京:清华大学出版社,2006:98-106.

二级参考文献4

  • 1亢世勇.《现代汉语新词语信息(电子)词典》的开发与应用[J].辞书研究,2001(2):55-63. 被引量:11
  • 2亢世勇.《现代汉语新词语信息电子词典》的研究与实现[J].中文计算语言学期刊,2002,(2).
  • 3Hua- Ping ZHANG, Qun LIU. et al, Chinese Name Entity Recognition Using Role Model[ J]. Special issue ''Word Formation and Chinese Language processing'' of the International Journal of Computational Linguistics and Chinese Language Processing, 2003, 8(2):2
  • 4Craig G. Nevill - Manning, Ian H. Witten. Identifying Hierarchical Structure in Sequences: A linear - time algorithm [J]. Journal of Artificial Intelligence Research, 1997, 7:67- 82

共引文献63

同被引文献7

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部