摘要
该文基于小波域多状态隐马尔科夫树(HMT)模型,引入一种新的文本分割方法。该分割方法是在H.Choi et al.(2001)工作的基础上,将文本按纹理分为背景、文字与图片3种类型,分别建立多状态HMT模型。另外,基于平滑图像将上述方法又作了进一步的改进,引入了多状态IHMT分割方法,最后通过实例阐明了方法的有效性。
This paper proposes a multistate document segmentation method based on wavelet transform and the hidden Markov tree (HMT) model. With the method, each document is segmented into three textures, background, text and image. Furthermore, the method is improved by using the smooth image and a new segmentation method: IHMT segmentation is introduced. Finally, by examples it is illustrated that the methods are more effective than the two-state HMT model proposed by H.Choi et al.
出处
《电子与信息学报》
EI
CSCD
北大核心
2002年第12期1885-1891,共7页
Journal of Electronics & Information Technology