The inclusion of more potentially correct words in the candidate sets is important to improve the accuracy of Large Vocabulary Continuous Speech Recognition (LVCSR). A candidate expansion algorithm based on the Weig...The inclusion of more potentially correct words in the candidate sets is important to improve the accuracy of Large Vocabulary Continuous Speech Recognition (LVCSR). A candidate expansion algorithm based on the Weighted Syllable Confusion Matrix (WSCM) is proposed. First, WSCM is derived from a confusion network. Then, the reeognised candidates in the confusion network is used to conjeeture the most likely correct words based on WSCM, after which, the conjectured words are combined with the recognised candidates to produce an expanded candidate set. Finally, a combined model having mutual information and a trigram language model is used to rerank the candidates. The experiments on Mandarin film data show that an improvement of 9.57% in the character correction rate is obtained over the initial recognition performance on those light erroneous utterances.展开更多
This paper explores the application of term dependency in information retrieval (IR) and proposes a novel dependency retrieval model. This retrieval model suggests an extension to the existing language modeling (LM) a...This paper explores the application of term dependency in information retrieval (IR) and proposes a novel dependency retrieval model. This retrieval model suggests an extension to the existing language modeling (LM) approach to IR by introducing dependency models for both query and document. Relevance between document and query is then evaluated by reference to the Kullback-Leibler divergence between their dependency models. This paper introduces a novel hybrid dependency structure, which allows integration of various forms of dependency within a single framework. A pseudo relevance feedback based method is also introduced for constructing query dependency model. The basic idea is to use query-relevant top-ranking sentences extracted from the top documents at retrieval time as the augmented representation of query, from which the relationships between query terms are identified. A Markov Random Field (MRF) based approach is presented to ensure the relevance of the extracted sentences, which utilizes the association features between query terms within a sentence to evaluate the relevance of each sentence. This dependency retrieval model was compared with other traditional retrieval models. Experiments indicated that it produces significant improvements in retrieval effectiveness.展开更多
To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new t...To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.展开更多
This paper proposed a new method of semi-automatic extraction for semantic structures from unlabelled corpora in specific domains. The approach is statistical in nature. The extracted structures can be used for shallo...This paper proposed a new method of semi-automatic extraction for semantic structures from unlabelled corpora in specific domains. The approach is statistical in nature. The extracted structures can be used for shallow parsing and semantic labeling. By iteratively extracting new words and clustering words, we get an inital semantic lexicon that groups words of the same semantic meaning together as a class. After that, a bootstrapping algorithm is adopted to extract semantic structures. Then the semantic structures are used to extract new展开更多
This research studies the process of 3D reconstruction and dynamic concision based on 2D medical digital images using virtual reality modelling language (VRML) and JavaScript language, with a focus on how to realize t...This research studies the process of 3D reconstruction and dynamic concision based on 2D medical digital images using virtual reality modelling language (VRML) and JavaScript language, with a focus on how to realize the dynamic concision of 3D medical model with script node and sensor node in VRML. The 3D reconstruction and concision of body internal organs can be built with such high quality that they are better than those obtained from the traditional methods. With the function of dynamic concision, the VRML browser can offer better windows for man-computer interaction in real-time environment than ever before. 3D reconstruction and dynamic concision with VRML can be used to meet the requirement for the medical observation of 3D reconstruction and have a promising prospect in the fields of medical imaging.展开更多
基金supported by the National Natural Science Foundation of China under Grants No.61005004,No.61175011,No.61171193the Next-Generation Broadband Wireless Mobile Communications Network Technology Key Project under Grant No.2011ZX03002-005-01+2 种基金the One Church,One Family,One Purpose(111Project)under Grant No.B08004the Key Project of Ministry of Science and Technology of China under Grant No.2012ZX-03002019-002the National High Techni-cal Research and Development Program of China(863Program)under Grant No.2011A-A01A205
文摘The inclusion of more potentially correct words in the candidate sets is important to improve the accuracy of Large Vocabulary Continuous Speech Recognition (LVCSR). A candidate expansion algorithm based on the Weighted Syllable Confusion Matrix (WSCM) is proposed. First, WSCM is derived from a confusion network. Then, the reeognised candidates in the confusion network is used to conjeeture the most likely correct words based on WSCM, after which, the conjectured words are combined with the recognised candidates to produce an expanded candidate set. Finally, a combined model having mutual information and a trigram language model is used to rerank the candidates. The experiments on Mandarin film data show that an improvement of 9.57% in the character correction rate is obtained over the initial recognition performance on those light erroneous utterances.
基金Project (No. 2006CB303000) supported in part by the National Basic Research Program (973) of China
文摘This paper explores the application of term dependency in information retrieval (IR) and proposes a novel dependency retrieval model. This retrieval model suggests an extension to the existing language modeling (LM) approach to IR by introducing dependency models for both query and document. Relevance between document and query is then evaluated by reference to the Kullback-Leibler divergence between their dependency models. This paper introduces a novel hybrid dependency structure, which allows integration of various forms of dependency within a single framework. A pseudo relevance feedback based method is also introduced for constructing query dependency model. The basic idea is to use query-relevant top-ranking sentences extracted from the top documents at retrieval time as the augmented representation of query, from which the relationships between query terms are identified. A Markov Random Field (MRF) based approach is presented to ensure the relevance of the extracted sentences, which utilizes the association features between query terms within a sentence to evaluate the relevance of each sentence. This dependency retrieval model was compared with other traditional retrieval models. Experiments indicated that it produces significant improvements in retrieval effectiveness.
基金the High Technology Research and Development Program of China(No.2006AA01Z150)the National Natural Science Foundation of China(No.60435020)
文摘To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.
文摘This paper proposed a new method of semi-automatic extraction for semantic structures from unlabelled corpora in specific domains. The approach is statistical in nature. The extracted structures can be used for shallow parsing and semantic labeling. By iteratively extracting new words and clustering words, we get an inital semantic lexicon that groups words of the same semantic meaning together as a class. After that, a bootstrapping algorithm is adopted to extract semantic structures. Then the semantic structures are used to extract new
基金Postdoctoral Fund of China (No. 2003034518), Fund of Health Bureau of Zhejiang Province (No. 2004B042), China
文摘This research studies the process of 3D reconstruction and dynamic concision based on 2D medical digital images using virtual reality modelling language (VRML) and JavaScript language, with a focus on how to realize the dynamic concision of 3D medical model with script node and sensor node in VRML. The 3D reconstruction and concision of body internal organs can be built with such high quality that they are better than those obtained from the traditional methods. With the function of dynamic concision, the VRML browser can offer better windows for man-computer interaction in real-time environment than ever before. 3D reconstruction and dynamic concision with VRML can be used to meet the requirement for the medical observation of 3D reconstruction and have a promising prospect in the fields of medical imaging.