摘要
针对当前关键词检索系统中单阶段系统检索速度慢,基于大词汇量连续语音识别(LVCSR)的两阶段系统又不够稳健的现状,提出一种新的基于拼音图的两阶段检索系统以满足快速、稳健检索的需要。两阶段分为预处理阶段和检索阶段。预处理阶段将语音数据识别成具有高覆盖率的拼音图。检索阶段响应用户的频繁查询,在拼音图中查找出与关键词拼音匹配的拼音串,并采用基于N元拼音文法的前后向算法计算置信度以实现对检索结果的筛选。实验表明:系统的二字词召回率及正确率可达72.19%和72.68%,三字词召回率及正确率可达73.51%和82.98%,均优于LVCSR系统,且检索阶段仅需0.01倍实时,具有良好的实用价值。
One-stage keyword spotting systems are time consuming, while two-stage systems based on large vocabulary continuous speech recognition (LVCSR) are instable. This paper introduces a two-stage keyword spotting system based on syllable graphs for fast and stable information retrieval from speech data. The system includes preprocessing and searching. In the preprocessing stage, the audio data is recognized into the syllable graph with high accuracy syllable candidates. In the search stage, searches for the matched keyword are only performed in the graph for likely syllable strings to answer frequent users queries. A forward-backward algorithm based on syllable N-grammar model is used to calculate confidence measures for further filtering of the search result. Test results show that the system achieves 72.19% recall rate and 72.68% accuracy with 2-syllable words and 73.51% recall rate and 82.98% accuracy with 3-syllable words, which outperforms the LVCSR system. The search stage uses only 1% of the real time, which is needed on practical applications.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2005年第10期1356-1359,共4页
Journal of Tsinghua University(Science and Technology)
基金
国家网络与信息安全保障持续发展计划(917专项)资助
关键词
信息检索
关键词检索
拼音图
置信度
information retrieval
keyword spotting
syllable graph
confidence measure