期刊文献+

基于后缀数组的分词技术

Word Segment Based on Suffix Array
下载PDF
导出
摘要 中文分词技术是机器翻译、分类、搜索引擎以及信息检索的基础,但是,互联网上不断出现的新词严重影响了分词的性能,为了提高新词的识别率,建立待分词内容的后缀数组,然后计算其公共前缀共同出现的次数,采用阈值对其进行过滤筛选出候选词语,实验结果表明,该方法在新词识别方面有一定的优势。 Chinese word segmentation technology is the basis of machine translation,classification,search engines,as well as information retrieval.But the Internet emerging new words have seriously affected the performance of word segmentation.To improve the recognition rate of new words,suffix array is used in this paper,and the number of length of common prefix is calculated.The candidates on their words are filtered out by the threshold.Experimental results show that the new word recognition method has advantages.
出处 《计算机系统应用》 2010年第8期229-230,211,共3页 Computer Systems & Applications
基金 曲靖师范学院基金(2008QN007) 云南省教育厅研究课题(09C0188)
关键词 后缀数组 分词 公共前缀长度 suffix array word segment LCP
  • 相关文献

参考文献5

  • 1Mcilroy MD.A Killer Adversary for Quicksort Software-Practice and Experience,1999,29:5-12.
  • 2Larsson NJ,Sadakan K.Faster suffix sorting.LU-CS-TR,1999.99-214.
  • 3Manber U,Myers G.Suffix Arrays:A New Method for On-Line String Searches,SIAM Journal on Computing,1993,22:10-11.
  • 4Lee feng Chien,PAT-Tree-Based Keyword Extraction for Chinese Information Retriveal.[2009-10-15].http://citeseerx.ist.psu.edu/viewdoc/download.
  • 5张长利,赫枫龄,左万利.一种基于后缀数组的无词典分词方法[J].吉林大学学报(理学版),2004,42(4):548-553. 被引量:14

二级参考文献12

  • 1吴胜远.一种汉语分词方法[J].计算机研究与发展,1996,33(4):306-311. 被引量:49
  • 2HUANG De-gen, ZHU He-he, WANG Kun-lun, et al. Chinese automatic words segmentation based on maximum matching and second-maximum matching [J]. Journal of Dalian University of Technology, 1999, 39(6): 831-835. (黄德根, 朱和合, 王昆仑, 等. 基于最长次长匹配的
  • 3Manber Udi, Gene Myers. Suffix arrays: a new method for on-line string searches [J]. SIAM Journal on Computing, 1993, 22(5): 935-948.
  • 4Mikio Yamamoto, Kenneth Church. Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus [J]. Association for Computational Linguistics, 2000, 27(1): 1-30.
  • 5HE Feng-ling, TAO Wen-xue, LI Kai, et al. Implementation of China_vivia new generation Internet search engine [J]. Journal of Jilin University(Science Edition), 2003, 41(2): 192-195. (赫枫龄, 陶文学, 李凯, 等. 新一代网络
  • 6CHEN Gui-lin, WANG Yong-cheng, HAN Ke-song, et al. An improved fast algorithm for Chinese word segmentation [J]. Journal of Computer Research & Development, 2000, 37(4): 418-424. (陈桂林, 王永成, 韩客松, 等. 一种改进
  • 7LIU Kai-ying, XUE Cui-fang, ZHENG Jia-heng, et al. Areas and techiques of signature information extraction from Chinese text [J]. Journal of Chinese Information Processing, 1997, 12(2): 1-7. (刘开瑛, 薛翠芳, 郑家恒, 等.
  • 8FU Sai-xiang, YUAN Ding-rong, HUANG Bo-xiong, et al. Word extraction without dictionary based on statistics [J]. Journal of Guangxi Academy of Sciences, 2002, 18(4): 252-255. (傅赛香, 袁鼎荣, 黄柏雄, 等. 基于统计的无词典
  • 9XU Gui-xian, SU Xiao-wei, CHEN Shu-yan. Arithmetic and application of no dictionary cutting word in Chinese text mining [J]. Journal of Jilin Institute of Technology, 2002, 23(1): 16-18. (胥桂仙, 苏筱蔚, 陈淑艳. 中文文
  • 10付国宏,王晓龙.汉语词语边界自动划分的模型与算法[J].计算机研究与发展,1999,36(9):1144-1147. 被引量:14

共引文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部