The visual information comes from speaker's mouth had proved very useful in improving speech recognition, especially in noise environment. In this paper, first introduced one of the main components in audio-visual speech recognition system: visual front end design then proved a machine learning method for mouth region detection which could rapidly process image with high detection rates. This approach includes the introduction of rotated Harr-like feature in integral image, a learning algorithm based on Adaboost with sign value trees as base classifiers, combination of complex classifiers in cascade and regionalization of the face area. At the end, applied this scheme in AVSR system yield high detection rates which may reaches basically real time requirement.
Computer Technology and Development