Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentat...Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentation free ligature based recognition approaches. Majority of the prevalent ligature based recognition systems heavily rely on hand-engineered feature extraction techniques. However, such techniques are more error prone and may often lead to a loss of useful information that might hardly be captured later by any manual features. Most of the prevalent Urdu Nastaleeq test recognition was trained and tested on small sets. This paper proposes the use of stacked denoising autoencoder for automatic feature extraction directly from raw pixel values of ligature images. Such deep learning networks have not been applied for the recognition of Urdu text thus far. Different stacked denoising autoencoders have been trained on 178573 ligatures with 3732 classes from un-degraded(noise free) UPTI(Urdu Printed Text Image) data set. Subsequently, trained networks are validated and tested on degraded versions of UPTI data set. The experimental results demonstrate accuracies in range of 93% to 96% which are better than the existing Urdu OCR systems for such large dataset of ligatures.展开更多
Handwriting recognition is one of the most significant problems in pattern recognition,many studies have been proposed to improve this recognition of handwritten text for different languages.Yet,Fewer studies have bee...Handwriting recognition is one of the most significant problems in pattern recognition,many studies have been proposed to improve this recognition of handwritten text for different languages.Yet,Fewer studies have been done for the Arabic language and the processing of its texts remains a particularly distinctive problem due to the variability of writing styles and the nature of Arabic scripts compared to other scripts.The present paper suggests a feature extraction technique for offlineArabic handwriting recognition.A handwriting recognition system for Arabic words using a few important structural features and based on a Radial Basis Function(RBF)neural networks is proposed.The methods of feature extraction are central to achieve high recognition performance.The proposed methodology relies on a feature extraction technique based on many structural characteristics extracted from the word skeleton(subwords,diacritics,loops,ascenders,and descenders).In order to reach our purpose,we built our own word database and the proposed system has been successfully tested on a handwriting database of Algerian city names(wilayas).Finally,a simple classifier based on the radial basis function neural network is presented to recognize certain words to verify the reliability of the proposed feature extraction.The experiments on some images of the benchmark IFN/ENIT database show that the proposed system improves recognition and the results obtained are indicative of the efficiency of our technique.展开更多
基金National Natural Science Foundation of China (Project No. 61273365)111 Project (No. B08004) are gratefully acknowledged
文摘Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentation free ligature based recognition approaches. Majority of the prevalent ligature based recognition systems heavily rely on hand-engineered feature extraction techniques. However, such techniques are more error prone and may often lead to a loss of useful information that might hardly be captured later by any manual features. Most of the prevalent Urdu Nastaleeq test recognition was trained and tested on small sets. This paper proposes the use of stacked denoising autoencoder for automatic feature extraction directly from raw pixel values of ligature images. Such deep learning networks have not been applied for the recognition of Urdu text thus far. Different stacked denoising autoencoders have been trained on 178573 ligatures with 3732 classes from un-degraded(noise free) UPTI(Urdu Printed Text Image) data set. Subsequently, trained networks are validated and tested on degraded versions of UPTI data set. The experimental results demonstrate accuracies in range of 93% to 96% which are better than the existing Urdu OCR systems for such large dataset of ligatures.
文摘Handwriting recognition is one of the most significant problems in pattern recognition,many studies have been proposed to improve this recognition of handwritten text for different languages.Yet,Fewer studies have been done for the Arabic language and the processing of its texts remains a particularly distinctive problem due to the variability of writing styles and the nature of Arabic scripts compared to other scripts.The present paper suggests a feature extraction technique for offlineArabic handwriting recognition.A handwriting recognition system for Arabic words using a few important structural features and based on a Radial Basis Function(RBF)neural networks is proposed.The methods of feature extraction are central to achieve high recognition performance.The proposed methodology relies on a feature extraction technique based on many structural characteristics extracted from the word skeleton(subwords,diacritics,loops,ascenders,and descenders).In order to reach our purpose,we built our own word database and the proposed system has been successfully tested on a handwriting database of Algerian city names(wilayas).Finally,a simple classifier based on the radial basis function neural network is presented to recognize certain words to verify the reliability of the proposed feature extraction.The experiments on some images of the benchmark IFN/ENIT database show that the proposed system improves recognition and the results obtained are indicative of the efficiency of our technique.