This paper proposes an enhancement of an automatic text recognition system for extracting information from the front side of the Vietnamese citizen identity(CID)card.First,we apply Mask-RCNN to segment and align the C...This paper proposes an enhancement of an automatic text recognition system for extracting information from the front side of the Vietnamese citizen identity(CID)card.First,we apply Mask-RCNN to segment and align the CID card from the background.Next,we present two approaches to detect the CID card’s text lines using traditional image processing techniques compared to the EAST detector.Finally,we introduce a new end-to-end Convolutional Recurrent Neural Network(CRNN)model based on a combination of Connectionist Temporal Classification(CTC)and attention mechanism for Vietnamese text recognition by jointly train the CTC and attention objective functions together.The length of the CTC’s output label sequence is applied to the attention-based decoder prediction to make the final label sequence.This process helps to decrease irregular alignments and speed up the label sequence estimation during training and inference,instead of only relying on a data-driven attention-based encoder-decoder to estimate the label sequence in long sentences.We may directly learn the proposed model from a sequence of words without detailed annotations.We evaluate the proposed system using a real collected Vietnamese CID card dataset and find that our method provides a 4.28%in WER and outperforms the common techniques.展开更多
基金supported by Sai Gon University under Fund(Grant No.TD2020-11).
文摘This paper proposes an enhancement of an automatic text recognition system for extracting information from the front side of the Vietnamese citizen identity(CID)card.First,we apply Mask-RCNN to segment and align the CID card from the background.Next,we present two approaches to detect the CID card’s text lines using traditional image processing techniques compared to the EAST detector.Finally,we introduce a new end-to-end Convolutional Recurrent Neural Network(CRNN)model based on a combination of Connectionist Temporal Classification(CTC)and attention mechanism for Vietnamese text recognition by jointly train the CTC and attention objective functions together.The length of the CTC’s output label sequence is applied to the attention-based decoder prediction to make the final label sequence.This process helps to decrease irregular alignments and speed up the label sequence estimation during training and inference,instead of only relying on a data-driven attention-based encoder-decoder to estimate the label sequence in long sentences.We may directly learn the proposed model from a sequence of words without detailed annotations.We evaluate the proposed system using a real collected Vietnamese CID card dataset and find that our method provides a 4.28%in WER and outperforms the common techniques.