期刊文献+

A Semi-automatic Method Based on Statistic for Mandarin Semantic Structures Extraction in Specific Domains 被引量:1

A Semi-automatic Method Based on Statistic for Mandarin Semantic Structures Extraction in Specific Domains
下载PDF
导出
摘要 This paper proposed a new method of semi-automatic extraction for semantic structures from unlabelled corpora in specific domains. The approach is statistical in nature. The extracted structures can be used for shallow parsing and semantic labeling. By iteratively extracting new words and clustering words, we get an inital semantic lexicon that groups words of the same semantic meaning together as a class. After that, a bootstrapping algorithm is adopted to extract semantic structures. Then the semantic structures are used to extract This paper proposed a new method of semi-automatic extraction for semantic structures from unlabelled corpora in specific domains. The approach is statistical in nature. The extracted structures can be used for shallow parsing and semantic labeling. By iteratively extracting new words and clustering words, we get an inital semantic lexicon that groups words of the same semantic meaning together as a class. After that, a bootstrapping algorithm is adopted to extract semantic structures. Then the semantic structures are used to extract new
出处 《Journal of Shanghai Jiaotong university(Science)》 EI 2004年第4期25-29,共5页 上海交通大学学报(英文版)
基金 FoundationResearchProgram,Science&TechnologyCommitteeofShanghaiMunicipality(No.01JC14033)
关键词 语义结构 语言模型 半自动提取 语义分组 NLU and augment the semantic lexicon. The resultant semantic structures are interpreted by persons and are amenable to hand-editing for refinement. In this experiment, the semi-automatically extracted structures S SA provide recall rate of 84.
  • 相关文献

参考文献6

  • 1孙静,朱杰,徐向华.一种新的中文词自动聚类算法[J].上海交通大学学报,2003,37(z1):139-142. 被引量:2
  • 2刘秉伟,黄萱菁,郭以昆,吴立德.基于统计方法的中文姓名识别[J].中文信息学报,2000,14(3):16-24. 被引量:48
  • 3Marcus B,Marcinkiewicz M.Building a large annotated corpus of English: The Penn Treebank[].Computational Linguistics.1993
  • 4Arai K,Wright J,Riccardi G,et al.Grammar fragment acquisition using syntactic and semantic clustering[].Speech Communication.1999
  • 5Meng Henlen M,Siu Kai -Chung.Semiautomatic acquisition of semantic structures for understanding domain-specific natural language queries[].I EEE Transactions on Knowledge and Data Engineering.2002
  • 6Chen K J,Liu S H.Word identification for Mandarin Chinese sentences. Proceedings of 15th International Conference on Computational Linguistics COLI NG-92 . 1992

二级参考文献5

  • 1[1]Martin S, Ney H. Algorithms for bigram and trigram word clustering [A]. Proc European Conference Speech and Technology [C]. Madrid, Spain: [s.n.],1995. 1253-1256.
  • 2[3]Farhat A, Isabelle J F. Clustering words for statistical language models based on contextual word similarity[A]. Proc ICASSP'96[C]. Atlanta, GA:IEEE,1996. 180-183.
  • 3[4]Ries K, Bue F D, Waibel A. Class phrase models for language modeling[A]. Proceedings of the ICSLP'96[C]. Philadelphia, USA :[s.n.], 1996. 398-401.
  • 4[5]Stolcje A. Entropy-based pruning of backoff language models [A]. Proc DRAPA News Transcription and Understanding Workshop [C]. Lansdwone, VA:[s.n.], 1998. 270-274.
  • 5张俊盛,陈舜德,郑萦,刘显仲,柯淑津.多语料库作法之中文姓名辨识[J].中文信息学报,1992,6(3):7-15. 被引量:14

共引文献48

同被引文献13

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部