期刊文献+

Applying rough sets in word segmentation disambiguation based on maximum entropy model

Applying rough sets in word segmentation disambiguation based on maximum entropy model
下载PDF
导出
摘要 To solve the complicated feature extraction and long distance dependency problem in Word Segmentation Disambiguation (WSD), this paper proposes to apply rough sets in WSD based on the Maximum Entropy model. Firstly, rough set theory is applied to extract the complicated features and long distance features, even from noise or inconsistent corpus. Secondly, these features are added into the Maximum Entropy model, and consequently, the feature weights can be assigned according to the performance of the whole disambiguation model. Finally, the semantic lexicon is adopted to build class-based rough set features to overcome data sparseness. The experiment indicated that our method performed better than previous models, which got top rank in WSD in 863 Evaluation in 2003. This system ranked first and second respectively in MSR and PKU open test in the Second International Chinese Word Segmentation Bakeoff held in 2005. To solve the complicated feature extraction and long distance dependency problem in Word Segmentation Disambiguation (WSD), this paper proposes to apply rough sets ill WSD based on the Maximum Entropy model. Firstly, rough set theory is applied to extract the complicated features and long distance features, even frnm noise or inconsistent corpus. Secondly, these features are added into the Maximum Entropy model, and consequently, the feature weights can be assigned according to the performance of the whole disambiguation mnltel. Finally, tile semantic lexicou is adopted to build class-hased rough set teatures to overcome data spareness. The experiment indicated that our method performed better than previous models, which got top rank in WSD in 863 Evaluation in 2003. This system ranked first and second respcetively in MSR and PKU open test in the Second International Chinese Word Segmentation Bankeoff held in 2005.
出处 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2006年第1期94-98,共5页 哈尔滨工业大学学报(英文版)
基金 SponsoredbytheKeyProgramProjectsofNationalNaturalScienceFoundationofChina(GrantNo.60435020)andNational863Program(GrantNo.2002AA11701090).
关键词 词切分 多义性消除 最大平均信息量模型 粗集理论 特征抽取 word segmentation feature extraction rough sets maximum entropy
  • 相关文献

参考文献7

  • 1XIAOYun,SUNMaosong,TSOUBK.Solvingcombinato rialambiguityinChinesewordsegmentationusingcontextu alinformation[].ComputerEngineeringandApplication.2001
  • 2SUNMaosong,ZOUZhengping.Theroleofhighfrequent maximalcrossingambiguitiesinChinesewordsegmentation[].JournalofChineseInformationProcessing.1999
  • 3GAOJianfeng,GOODMANJ,LIM,etal.Towardauni fiedapproachtostatisticallanguagemodelingforChinese[].ACMTransAsianLangInformProcess.2002
  • 4ZHAOYan,WANGXiaolong.Solutionstrategiesforword senseproblemsbasedonvectorspacemodelandmaximum entropymodel[].ChineseHighTechnologyLetters.2005
  • 5WANGXiaolong,CHENQingcai,YEUNGDS.Mining pinyin-to-characterconversionrulesfromlarge-scale corpus:aroughsetapproach,ieeetransactiononsystems[].ManandCybernetics PartB:Cybernetics.2004
  • 6RATNAPARKHIA.MaximumEntropyModelsForNatural LanguageAmbiguityResolution[]..1998
  • 7ZIARKOW.Variableprecisionroughsetmodel[].Jour nalofComputerandSystemSciences.1993

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部