Natural language parsing is a task of great importance and extreme difficulty. In this paper, we present a full Chinese parsing system based on a two-stage approach. Rather than identifying all phrases by a uniform mo...Natural language parsing is a task of great importance and extreme difficulty. In this paper, we present a full Chinese parsing system based on a two-stage approach. Rather than identifying all phrases by a uniform model, we utilize a divide and conquer strategy. We propose an effective and fast method based on Markov model to identify the base phrases. Then we make the first attempt to extend one of the best English parsing models i.e. the head-driven model to recognize Chinese complex phrases. Our two-stage approach is superior to the uniform approach in two aspects. First, it creates synergy between the Markov model and the head-driven model. Second, it reduces the complexity of full Chinese parsing and makes the parsing system space and time efficient. We evaluate our approach in PARSEVAL measures on the open test set, the parsing system performances at 87.53% precision, 87.95% recall.展开更多
In this research paper, we research on the automatic pattern abstraction and recognition method for large-scale database system based on natural language processing. In distributed database, through the network connec...In this research paper, we research on the automatic pattern abstraction and recognition method for large-scale database system based on natural language processing. In distributed database, through the network connection between nodes, data across different nodes and even regional distribution are well recognized. In order to reduce data redundancy and model design of the database will usually contain a lot of forms we combine the NLP theory to optimize the traditional method. The experimental analysis and simulation proves the correctness of our method.展开更多
计算机诞生50多年来,以拉丁字为基础的西文基本可以_在键盘上直接键入,但作为形、声、义有机结合的汉字,数量多、结构复杂、字体字形变化多端,虽有'王码五笔'、'全拼'、'双拼'、'自然码'等各种中文输...计算机诞生50多年来,以拉丁字为基础的西文基本可以_在键盘上直接键入,但作为形、声、义有机结合的汉字,数量多、结构复杂、字体字形变化多端,虽有'王码五笔'、'全拼'、'双拼'、'自然码'等各种中文输入方法,但大多数中国人与电脑这一高科技产品仍无法自如交流。这也恰恰造就了'语音输入'这一特殊的市场。投巨资潜心研究语音识别系统26年的 IBM 公司于1997年9月在中国市场发布了中文语音识别产品ViaVoice。展开更多
In this paper the authors look into the problem of Hidden Markov Models (HMM): the evaluation, the decoding and the learning problem. The authors have explored an approach to increase the effectiveness of HMM in th...In this paper the authors look into the problem of Hidden Markov Models (HMM): the evaluation, the decoding and the learning problem. The authors have explored an approach to increase the effectiveness of HMM in the speech recognition field. Although hidden Markov modeling has significantly improved the performance of current speech-recognition systems, the general problem of completely fluent speaker-independent speech recognition is still far from being solved. For example, there is no system which is capable of reliably recognizing unconstrained conversational speech. Also, there does not exist a good way to infer the language structure from a limited corpus of spoken sentences statistically. Therefore, the authors want to provide an overview of the theory of HMM, discuss the role of statistical methods, and point out a range of theoretical and practical issues that deserve attention and are necessary to understand so as to further advance research in the field of speech recognition.展开更多
Y99-61970-33 2003302大型词典识别(含8篇论文)=SP2:large vocabularyrecognition[会,英]//1999 IEEE International Confer-ence on Acoustics,Speech,and Signal Processing Vol.Ⅰ of Ⅵ.—33~64(HC)本部分共8篇论文,内容涉及到在...Y99-61970-33 2003302大型词典识别(含8篇论文)=SP2:large vocabularyrecognition[会,英]//1999 IEEE International Confer-ence on Acoustics,Speech,and Signal Processing Vol.Ⅰ of Ⅵ.—33~64(HC)本部分共8篇论文,内容涉及到在有电视引导的鱼雷系统中广播新闻转换的发展,广播新闻自动转换的 IBM 语言识别系统的改进,大型词典对话语言识别的试验研究,法语大型词典语言识别,剑桥大学讲话文件恢复系统,电话语言识别的改进,1998年电话语言转换 HTK 系统,以及在 Jupiter 区域基于实时电话的语言识别。展开更多
Y90-62009-67 0005018主题自动识别用的随机方法=Stochastic method forautomatic recognition of topics[会,英]/Scheffler,K.&du Preez,J.A.//Proceedings of the 1998 South AfricanSymposium on Communications and Signal Proc...Y90-62009-67 0005018主题自动识别用的随机方法=Stochastic method forautomatic recognition of topics[会,英]/Scheffler,K.&du Preez,J.A.//Proceedings of the 1998 South AfricanSymposium on Communications and Signal Processing(COMSIG’98).—67~72(PC)Y90-62009—193 0005019南非语言用的语言识别系统=Language identificationsystem for South African languages[会,英]/Mashao,D.J.//Proceedings of the 1998 South African Symposiumon Communications and Sjgnal Processing(COMSIG’98).—193~196(PC)展开更多
基金国家高技术研究发展计划(863计划),the National Natural Science Foundation of China
文摘Natural language parsing is a task of great importance and extreme difficulty. In this paper, we present a full Chinese parsing system based on a two-stage approach. Rather than identifying all phrases by a uniform model, we utilize a divide and conquer strategy. We propose an effective and fast method based on Markov model to identify the base phrases. Then we make the first attempt to extend one of the best English parsing models i.e. the head-driven model to recognize Chinese complex phrases. Our two-stage approach is superior to the uniform approach in two aspects. First, it creates synergy between the Markov model and the head-driven model. Second, it reduces the complexity of full Chinese parsing and makes the parsing system space and time efficient. We evaluate our approach in PARSEVAL measures on the open test set, the parsing system performances at 87.53% precision, 87.95% recall.
文摘In this research paper, we research on the automatic pattern abstraction and recognition method for large-scale database system based on natural language processing. In distributed database, through the network connection between nodes, data across different nodes and even regional distribution are well recognized. In order to reduce data redundancy and model design of the database will usually contain a lot of forms we combine the NLP theory to optimize the traditional method. The experimental analysis and simulation proves the correctness of our method.
文摘计算机诞生50多年来,以拉丁字为基础的西文基本可以_在键盘上直接键入,但作为形、声、义有机结合的汉字,数量多、结构复杂、字体字形变化多端,虽有'王码五笔'、'全拼'、'双拼'、'自然码'等各种中文输入方法,但大多数中国人与电脑这一高科技产品仍无法自如交流。这也恰恰造就了'语音输入'这一特殊的市场。投巨资潜心研究语音识别系统26年的 IBM 公司于1997年9月在中国市场发布了中文语音识别产品ViaVoice。
文摘In this paper the authors look into the problem of Hidden Markov Models (HMM): the evaluation, the decoding and the learning problem. The authors have explored an approach to increase the effectiveness of HMM in the speech recognition field. Although hidden Markov modeling has significantly improved the performance of current speech-recognition systems, the general problem of completely fluent speaker-independent speech recognition is still far from being solved. For example, there is no system which is capable of reliably recognizing unconstrained conversational speech. Also, there does not exist a good way to infer the language structure from a limited corpus of spoken sentences statistically. Therefore, the authors want to provide an overview of the theory of HMM, discuss the role of statistical methods, and point out a range of theoretical and practical issues that deserve attention and are necessary to understand so as to further advance research in the field of speech recognition.
文摘Y99-61970-33 2003302大型词典识别(含8篇论文)=SP2:large vocabularyrecognition[会,英]//1999 IEEE International Confer-ence on Acoustics,Speech,and Signal Processing Vol.Ⅰ of Ⅵ.—33~64(HC)本部分共8篇论文,内容涉及到在有电视引导的鱼雷系统中广播新闻转换的发展,广播新闻自动转换的 IBM 语言识别系统的改进,大型词典对话语言识别的试验研究,法语大型词典语言识别,剑桥大学讲话文件恢复系统,电话语言识别的改进,1998年电话语言转换 HTK 系统,以及在 Jupiter 区域基于实时电话的语言识别。
文摘Y90-62009-67 0005018主题自动识别用的随机方法=Stochastic method forautomatic recognition of topics[会,英]/Scheffler,K.&du Preez,J.A.//Proceedings of the 1998 South AfricanSymposium on Communications and Signal Processing(COMSIG’98).—67~72(PC)Y90-62009—193 0005019南非语言用的语言识别系统=Language identificationsystem for South African languages[会,英]/Mashao,D.J.//Proceedings of the 1998 South African Symposiumon Communications and Sjgnal Processing(COMSIG’98).—193~196(PC)