In the digital world,a wide range of handwritten and printed documents should be converted to digital format using a variety of tools,including mobile phones and scanners.Unfortunately,this is not an optimal procedure...In the digital world,a wide range of handwritten and printed documents should be converted to digital format using a variety of tools,including mobile phones and scanners.Unfortunately,this is not an optimal procedure,and the entire document image might be degraded.Imperfect conversion effects due to noise,motion blur,and skew distortion can lead to significant impact on the accuracy and effectiveness of document image segmentation and analysis in Optical Character Recognition(OCR)systems.In Document Image Analysis Systems(DIAS),skew estimation of images is a crucial step.In this paper,a novel,fast,and reliable skew detection algorithm based on the Radon Transform and Curve Length Fitness Function(CLF),so-called Radon CLF,was proposed.The Radon CLF model aims to take advantage of the properties of Radon spaces.The Radon CLF explores the dominating angle more effectively for a 1D signal than it does for a 2D input image due to an innovative fitness function formulation for a projected signal of the Radon space.Several significant performance indicators,including Mean Square Error(MSE),Mean Absolute Error(MAE),Peak Signal-to-Noise Ratio(PSNR),Structural Similarity Measure(SSIM),Accuracy,and run-time,were taken into consideration when assessing the performance of our model.In addition,a new dataset named DSI5000 was constructed to assess the accuracy of the CLF model.Both two-dimensional image signal and the Radon space have been used in our simulations to compare the noise effect.Obtained results show that the proposed method is more effective than other approaches already in use,with an accuracy of roughly 99.87%and a run-time of 0.048(s).The introduced model is far more accurate and timeefficient than current approaches in detecting image skew.展开更多
This paper proposes a new approach to the water flow algorithm for text line segmentation. In the basic method the hypothetical water flows under few specified angles which have been defined by water flow angle as par...This paper proposes a new approach to the water flow algorithm for text line segmentation. In the basic method the hypothetical water flows under few specified angles which have been defined by water flow angle as parameter. It is applied to the document image frame from left to right and vice versa. As a result, the unwetted and wetted areas are established. These areas separate text from non-text elements in each text line, respectively. Hence, they represent the control areas that are of major importance for text line segmentation. Primarily, an extended approach means extraction of the connected-components by bounding boxes over text. By this way, each connected component is mutually separated. Hence, the water flow angle, which defines the unwetted areas, is determined adaptively. By choosing appropriate water flow angle, the unwetted areas are lengthening which leads to the better text line segmentation. Results of this approach are encouraging due to the text line segmentation improvement which is the most challenging step in document image processing.展开更多
文摘In the digital world,a wide range of handwritten and printed documents should be converted to digital format using a variety of tools,including mobile phones and scanners.Unfortunately,this is not an optimal procedure,and the entire document image might be degraded.Imperfect conversion effects due to noise,motion blur,and skew distortion can lead to significant impact on the accuracy and effectiveness of document image segmentation and analysis in Optical Character Recognition(OCR)systems.In Document Image Analysis Systems(DIAS),skew estimation of images is a crucial step.In this paper,a novel,fast,and reliable skew detection algorithm based on the Radon Transform and Curve Length Fitness Function(CLF),so-called Radon CLF,was proposed.The Radon CLF model aims to take advantage of the properties of Radon spaces.The Radon CLF explores the dominating angle more effectively for a 1D signal than it does for a 2D input image due to an innovative fitness function formulation for a projected signal of the Radon space.Several significant performance indicators,including Mean Square Error(MSE),Mean Absolute Error(MAE),Peak Signal-to-Noise Ratio(PSNR),Structural Similarity Measure(SSIM),Accuracy,and run-time,were taken into consideration when assessing the performance of our model.In addition,a new dataset named DSI5000 was constructed to assess the accuracy of the CLF model.Both two-dimensional image signal and the Radon space have been used in our simulations to compare the noise effect.Obtained results show that the proposed method is more effective than other approaches already in use,with an accuracy of roughly 99.87%and a run-time of 0.048(s).The introduced model is far more accurate and timeefficient than current approaches in detecting image skew.
文摘This paper proposes a new approach to the water flow algorithm for text line segmentation. In the basic method the hypothetical water flows under few specified angles which have been defined by water flow angle as parameter. It is applied to the document image frame from left to right and vice versa. As a result, the unwetted and wetted areas are established. These areas separate text from non-text elements in each text line, respectively. Hence, they represent the control areas that are of major importance for text line segmentation. Primarily, an extended approach means extraction of the connected-components by bounding boxes over text. By this way, each connected component is mutually separated. Hence, the water flow angle, which defines the unwetted areas, is determined adaptively. By choosing appropriate water flow angle, the unwetted areas are lengthening which leads to the better text line segmentation. Results of this approach are encouraging due to the text line segmentation improvement which is the most challenging step in document image processing.