Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such...Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM.展开更多
Achieving a good recognition rate for degraded document images is difficult as degraded document images suffer from low contrast,bleedthrough,and nonuniform illumination effects.Unlike the existing baseline thresholdi...Achieving a good recognition rate for degraded document images is difficult as degraded document images suffer from low contrast,bleedthrough,and nonuniform illumination effects.Unlike the existing baseline thresholding techniques that use fixed thresholds and windows,the proposed method introduces a concept for obtaining dynamic windows according to the image content to achieve better binarization.To enhance a low-contrast image,we proposed a new mean histogram stretching method for suppressing noisy pixels in the background and,simultaneously,increasing pixel contrast at edges or near edges,which results in an enhanced image.For the enhanced image,we propose a new method for deriving adaptive local thresholds for dynamic windows.The dynamic window is derived by exploiting the advantage of Otsu thresholding.To assess the performance of the proposed method,we have used standard databases,namely,document image binarization contest(DIBCO),for experimentation.The comparative study on well-known existing methods indicates that the proposed method outperforms the existing methods in terms of quality and recognition rate.展开更多
Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and...Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and programming.To extract semantic structures from document images,we present an end-to-end dilated convolution network architecture.Dilated convolutions have well-known advantages for extracting multi-scale context information without losing spatial resolution.Our model utilizes dilated convolutions with residual network to represent the image features and predicting pixel labels.The convolution part works as feature extractor to obtain multidimensional and hierarchical image features.The consecutive deconvolution is used for producing full resolution segmentation prediction.The probability of each pixel decides its predefined semantic class label.To understand segmentation granularity,we compare performances at three different levels.From fine grained class to coarse class levels,the proposed dilated convolution network architecture is evaluated on three document datasets.The experimental results have shown that both semantic data distribution imbalance and network depth are import factors that influence the document’s semantic segmentation performances.The research is aimed at offering an education resource for teaching artificial intelligence concepts and techniques.展开更多
Background:Contrast enhancement plays an important role in the image processing field.Contrast correction has performed an adjustment on the darkness or brightness of the input image and increases the quality of the i...Background:Contrast enhancement plays an important role in the image processing field.Contrast correction has performed an adjustment on the darkness or brightness of the input image and increases the quality of the image.Objective:This paper proposed a novel method based on statistical data from the local mean and local standard deviation.Method:The proposed method modifies the mean and standard deviation of a neighbourhood at each pixel and divides it into three categories:background,foreground,and problematic(contrast&luminosity)region.Experimental results from both visual and objective aspects show that the proposed method can normalize the contrast variation problem effectively compared to Histogram Equalization(HE),Difference of Gaussian(DoG),and Butterworth Homomorphic Filtering(BHF).Seven(7)types of binarization methods were tested on the corrected image and produced a positive and impressive result.Result:Finally,a comparison in terms of Signal Noise Ratio(SNR),Misclassification Error(ME),F-measure,Peak Signal Noise Ratio(PSNR),Misclassification Penalty Metric(MPM),and Accuracy was calculated.Each binarization method shows an incremented result after applying it onto the corrected image compared to the original image.The SNR result of our proposed image is 9.350 higher than the three(3)other methods.The average increment after five(5)types of evaluation are:(Otsu=41.64%,Local Adaptive=7.05%,Niblack=30.28%,Bernsen=25%,Bradley=3.54%,Nick=1.59%,Gradient-Based=14.6%).Conclusion:The results presented in this paper effectively solve the contrast problem and finally produce better quality images.展开更多
The document image segmentation is very useful for printing, faxing and data processing. An algorithm is developed for segmenting and classifying document image. Feature used for classification is based on the histogr...The document image segmentation is very useful for printing, faxing and data processing. An algorithm is developed for segmenting and classifying document image. Feature used for classification is based on the histogram distribution pattern of different image classes. The important attribute of the algorithm is using wavelet correlation image to enhance raw image's pattern, so the classification accuracy is improved. In this paper document image is divided into four types; background, photo, text and graph. Firstly, the document image background has been distingusished easily by former normally method;secondly, three image types will be distinguished by their typical histograms, in order to make histograms feature clearer, each resolution's HH wavelet subimage is used to add to the raw image at their resolution. At last, the photo, text and praph have been devided according to how the feature fit to the Laplacian distrbution by 2 and L . Simulations show that classification accuracy is significantly improved. The comparison with related shows that our algorithm provides both lower classification error rates and better visual results.展开更多
In this paper the progress of document image Point Spread Function (PSF) estimation will be presented. At the beginning of the paper, an overview of PSF estimation methods will be introduced and the reason why knife...In this paper the progress of document image Point Spread Function (PSF) estimation will be presented. At the beginning of the paper, an overview of PSF estimation methods will be introduced and the reason why knife-edge input PSF estimation method is chosen will be explained. Then in the next section, the knife-edge input PSF estimation method will be detailed. After that, a simulation experiment is performed in order to verify the implemented PSF estimation method. Based on the simulation experiment, in next section we propose a procedure that makes automatic PSF estimation possible. A real document image is firstly taken as an example to illustrate the procedure and then be restored with the estimated PSF and Lucy-Richardson deconvolution method, and its OCR accuracy before and after deconvolution will be compared. Finally, we conclude the paper with the outlook for the future work.展开更多
The development of document image databases is becoming a challenge for document image retrieval tech-niques.Traditional layout-reconstructed-based methods rely on high quality document images as well as an optical ch...The development of document image databases is becoming a challenge for document image retrieval tech-niques.Traditional layout-reconstructed-based methods rely on high quality document images as well as an optical char-acter recognition(OCR)precision,and can only deal with several widely used languages.The complexity of document layouts greatly hinders layout analysis-based approaches.This paper describes a multi-density feature based algorithm for binary document images,which is independent of OCR or layout analyses.The text area was extracted after prepro-cessing such as skew correction and marginal noise removal.Then the aspect ratio and multi-density features were extract-ed from the text area to select the best candidates from the document image database.Experimental results show that this approach is simple with loss rates less than 3%and can efficiently analyze images with different resolutions and dif-ferent input systems.The system is also robust to noise due to its notes and complex layouts,etc.展开更多
In the digital world,a wide range of handwritten and printed documents should be converted to digital format using a variety of tools,including mobile phones and scanners.Unfortunately,this is not an optimal procedure...In the digital world,a wide range of handwritten and printed documents should be converted to digital format using a variety of tools,including mobile phones and scanners.Unfortunately,this is not an optimal procedure,and the entire document image might be degraded.Imperfect conversion effects due to noise,motion blur,and skew distortion can lead to significant impact on the accuracy and effectiveness of document image segmentation and analysis in Optical Character Recognition(OCR)systems.In Document Image Analysis Systems(DIAS),skew estimation of images is a crucial step.In this paper,a novel,fast,and reliable skew detection algorithm based on the Radon Transform and Curve Length Fitness Function(CLF),so-called Radon CLF,was proposed.The Radon CLF model aims to take advantage of the properties of Radon spaces.The Radon CLF explores the dominating angle more effectively for a 1D signal than it does for a 2D input image due to an innovative fitness function formulation for a projected signal of the Radon space.Several significant performance indicators,including Mean Square Error(MSE),Mean Absolute Error(MAE),Peak Signal-to-Noise Ratio(PSNR),Structural Similarity Measure(SSIM),Accuracy,and run-time,were taken into consideration when assessing the performance of our model.In addition,a new dataset named DSI5000 was constructed to assess the accuracy of the CLF model.Both two-dimensional image signal and the Radon space have been used in our simulations to compare the noise effect.Obtained results show that the proposed method is more effective than other approaches already in use,with an accuracy of roughly 99.87%and a run-time of 0.048(s).The introduced model is far more accurate and timeefficient than current approaches in detecting image skew.展开更多
Rule selection has long been a problem of great challenge that has to be solved when developing a rule-based knowledge learning system. Many methods have been proposed to evaluate the eligibility of a single rule base...Rule selection has long been a problem of great challenge that has to be solved when developing a rule-based knowledge learning system. Many methods have been proposed to evaluate the eligibility of a single rule based on some criteria. However, in a knowledge learning system there is usually a set of rules. These rules are not independent, but interactive. They tend to affect each other and form a rulesystem. In such case, it is no longer reasonable to isolate each rule from others for evaluation. A best rule according to certain criterion is not always the best one for the whole system. Furthermore, the data in the real world from which people want to create their learning system are often ill-defined and inconsistent. In this case, the completeness and consistency criteria for rule selection are no longer essential. In this paper, some ideas about how to solve the rule-selection problem in a systematic way are proposed. These ideas have been applied in the design of a Chinese business card layout analysis system and gained a good result on the training data set of 425 images. The implementation of the system and the result are presented in this paper.展开更多
This paper proposes a new approach to the water flow algorithm for text line segmentation. In the basic method the hypothetical water flows under few specified angles which have been defined by water flow angle as par...This paper proposes a new approach to the water flow algorithm for text line segmentation. In the basic method the hypothetical water flows under few specified angles which have been defined by water flow angle as parameter. It is applied to the document image frame from left to right and vice versa. As a result, the unwetted and wetted areas are established. These areas separate text from non-text elements in each text line, respectively. Hence, they represent the control areas that are of major importance for text line segmentation. Primarily, an extended approach means extraction of the connected-components by bounding boxes over text. By this way, each connected component is mutually separated. Hence, the water flow angle, which defines the unwetted areas, is determined adaptively. By choosing appropriate water flow angle, the unwetted areas are lengthening which leads to the better text line segmentation. Results of this approach are encouraging due to the text line segmentation improvement which is the most challenging step in document image processing.展开更多
文摘Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM.
基金funded by the Ministry of Higher Education,Malaysia for providing facilities and financial support under the Long Research Grant Scheme LRGS-1-2019-UKM-UKM-2-7.
文摘Achieving a good recognition rate for degraded document images is difficult as degraded document images suffer from low contrast,bleedthrough,and nonuniform illumination effects.Unlike the existing baseline thresholding techniques that use fixed thresholds and windows,the proposed method introduces a concept for obtaining dynamic windows according to the image content to achieve better binarization.To enhance a low-contrast image,we proposed a new mean histogram stretching method for suppressing noisy pixels in the background and,simultaneously,increasing pixel contrast at edges or near edges,which results in an enhanced image.For the enhanced image,we propose a new method for deriving adaptive local thresholds for dynamic windows.The dynamic window is derived by exploiting the advantage of Otsu thresholding.To assess the performance of the proposed method,we have used standard databases,namely,document image binarization contest(DIBCO),for experimentation.The comparative study on well-known existing methods indicates that the proposed method outperforms the existing methods in terms of quality and recognition rate.
基金Project(61806107)supported by the National Natural Science Foundation of ChinaProject supported by the Shandong Key Laboratory of Wisdom Mine Information Technology,ChinaProject supported by the Opening Project of State Key Laboratory of Digital Publishing Technology,China。
文摘Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and programming.To extract semantic structures from document images,we present an end-to-end dilated convolution network architecture.Dilated convolutions have well-known advantages for extracting multi-scale context information without losing spatial resolution.Our model utilizes dilated convolutions with residual network to represent the image features and predicting pixel labels.The convolution part works as feature extractor to obtain multidimensional and hierarchical image features.The consecutive deconvolution is used for producing full resolution segmentation prediction.The probability of each pixel decides its predefined semantic class label.To understand segmentation granularity,we compare performances at three different levels.From fine grained class to coarse class levels,the proposed dilated convolution network architecture is evaluated on three document datasets.The experimental results have shown that both semantic data distribution imbalance and network depth are import factors that influence the document’s semantic segmentation performances.The research is aimed at offering an education resource for teaching artificial intelligence concepts and techniques.
文摘Background:Contrast enhancement plays an important role in the image processing field.Contrast correction has performed an adjustment on the darkness or brightness of the input image and increases the quality of the image.Objective:This paper proposed a novel method based on statistical data from the local mean and local standard deviation.Method:The proposed method modifies the mean and standard deviation of a neighbourhood at each pixel and divides it into three categories:background,foreground,and problematic(contrast&luminosity)region.Experimental results from both visual and objective aspects show that the proposed method can normalize the contrast variation problem effectively compared to Histogram Equalization(HE),Difference of Gaussian(DoG),and Butterworth Homomorphic Filtering(BHF).Seven(7)types of binarization methods were tested on the corrected image and produced a positive and impressive result.Result:Finally,a comparison in terms of Signal Noise Ratio(SNR),Misclassification Error(ME),F-measure,Peak Signal Noise Ratio(PSNR),Misclassification Penalty Metric(MPM),and Accuracy was calculated.Each binarization method shows an incremented result after applying it onto the corrected image compared to the original image.The SNR result of our proposed image is 9.350 higher than the three(3)other methods.The average increment after five(5)types of evaluation are:(Otsu=41.64%,Local Adaptive=7.05%,Niblack=30.28%,Bernsen=25%,Bradley=3.54%,Nick=1.59%,Gradient-Based=14.6%).Conclusion:The results presented in this paper effectively solve the contrast problem and finally produce better quality images.
文摘The document image segmentation is very useful for printing, faxing and data processing. An algorithm is developed for segmenting and classifying document image. Feature used for classification is based on the histogram distribution pattern of different image classes. The important attribute of the algorithm is using wavelet correlation image to enhance raw image's pattern, so the classification accuracy is improved. In this paper document image is divided into four types; background, photo, text and graph. Firstly, the document image background has been distingusished easily by former normally method;secondly, three image types will be distinguished by their typical histograms, in order to make histograms feature clearer, each resolution's HH wavelet subimage is used to add to the raw image at their resolution. At last, the photo, text and praph have been devided according to how the feature fit to the Laplacian distrbution by 2 and L . Simulations show that classification accuracy is significantly improved. The comparison with related shows that our algorithm provides both lower classification error rates and better visual results.
文摘In this paper the progress of document image Point Spread Function (PSF) estimation will be presented. At the beginning of the paper, an overview of PSF estimation methods will be introduced and the reason why knife-edge input PSF estimation method is chosen will be explained. Then in the next section, the knife-edge input PSF estimation method will be detailed. After that, a simulation experiment is performed in order to verify the implemented PSF estimation method. Based on the simulation experiment, in next section we propose a procedure that makes automatic PSF estimation possible. A real document image is firstly taken as an example to illustrate the procedure and then be restored with the estimated PSF and Lucy-Richardson deconvolution method, and its OCR accuracy before and after deconvolution will be compared. Finally, we conclude the paper with the outlook for the future work.
基金supported by the National Natural Science Foundation of China(Grant No.60472028)the Specialized Research Fund for the Doctoral Program of Higher Education(No.20040003015).
文摘The development of document image databases is becoming a challenge for document image retrieval tech-niques.Traditional layout-reconstructed-based methods rely on high quality document images as well as an optical char-acter recognition(OCR)precision,and can only deal with several widely used languages.The complexity of document layouts greatly hinders layout analysis-based approaches.This paper describes a multi-density feature based algorithm for binary document images,which is independent of OCR or layout analyses.The text area was extracted after prepro-cessing such as skew correction and marginal noise removal.Then the aspect ratio and multi-density features were extract-ed from the text area to select the best candidates from the document image database.Experimental results show that this approach is simple with loss rates less than 3%and can efficiently analyze images with different resolutions and dif-ferent input systems.The system is also robust to noise due to its notes and complex layouts,etc.
文摘In the digital world,a wide range of handwritten and printed documents should be converted to digital format using a variety of tools,including mobile phones and scanners.Unfortunately,this is not an optimal procedure,and the entire document image might be degraded.Imperfect conversion effects due to noise,motion blur,and skew distortion can lead to significant impact on the accuracy and effectiveness of document image segmentation and analysis in Optical Character Recognition(OCR)systems.In Document Image Analysis Systems(DIAS),skew estimation of images is a crucial step.In this paper,a novel,fast,and reliable skew detection algorithm based on the Radon Transform and Curve Length Fitness Function(CLF),so-called Radon CLF,was proposed.The Radon CLF model aims to take advantage of the properties of Radon spaces.The Radon CLF explores the dominating angle more effectively for a 1D signal than it does for a 2D input image due to an innovative fitness function formulation for a projected signal of the Radon space.Several significant performance indicators,including Mean Square Error(MSE),Mean Absolute Error(MAE),Peak Signal-to-Noise Ratio(PSNR),Structural Similarity Measure(SSIM),Accuracy,and run-time,were taken into consideration when assessing the performance of our model.In addition,a new dataset named DSI5000 was constructed to assess the accuracy of the CLF model.Both two-dimensional image signal and the Radon space have been used in our simulations to compare the noise effect.Obtained results show that the proposed method is more effective than other approaches already in use,with an accuracy of roughly 99.87%and a run-time of 0.048(s).The introduced model is far more accurate and timeefficient than current approaches in detecting image skew.
文摘Rule selection has long been a problem of great challenge that has to be solved when developing a rule-based knowledge learning system. Many methods have been proposed to evaluate the eligibility of a single rule based on some criteria. However, in a knowledge learning system there is usually a set of rules. These rules are not independent, but interactive. They tend to affect each other and form a rulesystem. In such case, it is no longer reasonable to isolate each rule from others for evaluation. A best rule according to certain criterion is not always the best one for the whole system. Furthermore, the data in the real world from which people want to create their learning system are often ill-defined and inconsistent. In this case, the completeness and consistency criteria for rule selection are no longer essential. In this paper, some ideas about how to solve the rule-selection problem in a systematic way are proposed. These ideas have been applied in the design of a Chinese business card layout analysis system and gained a good result on the training data set of 425 images. The implementation of the system and the result are presented in this paper.
文摘This paper proposes a new approach to the water flow algorithm for text line segmentation. In the basic method the hypothetical water flows under few specified angles which have been defined by water flow angle as parameter. It is applied to the document image frame from left to right and vice versa. As a result, the unwetted and wetted areas are established. These areas separate text from non-text elements in each text line, respectively. Hence, they represent the control areas that are of major importance for text line segmentation. Primarily, an extended approach means extraction of the connected-components by bounding boxes over text. By this way, each connected component is mutually separated. Hence, the water flow angle, which defines the unwetted areas, is determined adaptively. By choosing appropriate water flow angle, the unwetted areas are lengthening which leads to the better text line segmentation. Results of this approach are encouraging due to the text line segmentation improvement which is the most challenging step in document image processing.