This study aims to review the latest contributions in Arabic Optical Character Recognition(OCR)during the last decade,which helps interested researchers know the existing techniques and extend or adapt them accordingl...This study aims to review the latest contributions in Arabic Optical Character Recognition(OCR)during the last decade,which helps interested researchers know the existing techniques and extend or adapt them accordingly.The study describes the characteristics of the Arabic language,different types of OCR systems,different stages of the Arabic OCR system,the researcher’s contributions in each step,and the evaluationmetrics for OCR.The study reviews the existing datasets for the Arabic OCR and their characteristics.Additionally,this study implemented some preprocessing and segmentation stages of Arabic OCR.The study compares the performance of the existing methods in terms of recognition accuracy.In addition to researchers’OCRmethods,commercial and open-source systems are used in the comparison.The Arabic language is morphologically rich and written cursive with dots and diacritics above and under the characters.Most of the existing approaches in the literature were evaluated on isolated characters or isolated words under a controlled environment,and few approaches were tested on pagelevel scripts.Some comparative studies show that the accuracy of the existing Arabic OCR commercial systems is low,under 75%for printed text,and further improvement is needed.Moreover,most of the current approaches are offline OCR systems,and there is no remarkable contribution to online OCR systems.展开更多
Optical Character Recognition(OCR)refers to a technology that uses image processing technology and character recognition algorithms to identify characters on an image.This paper is a deep study on the recognition effe...Optical Character Recognition(OCR)refers to a technology that uses image processing technology and character recognition algorithms to identify characters on an image.This paper is a deep study on the recognition effect of OCR based on Artificial Intelligence(AI)algorithms,in which the different AI algorithms for OCR analysis are classified and reviewed.Firstly,the mechanisms and characteristics of artificial neural network-based OCR are summarized.Secondly,this paper explores machine learning-based OCR,and draws the conclusion that the algorithms available for this form of OCR are still in their infancy,with low generalization and fixed recognition errors,albeit with better recognition effect and higher recognition accuracy.Finally,this paper explores several of the latest algorithms such as deep learning and pattern recognition algorithms.This paper concludes that OCR requires algorithms with higher recognition accuracy.展开更多
The purpose of the paper is to develop a mobile Android application--"Car Log" that gives to users the ability to track all the costs for a vehicle and the ability to add fuel cost data by taking a photo of the cash...The purpose of the paper is to develop a mobile Android application--"Car Log" that gives to users the ability to track all the costs for a vehicle and the ability to add fuel cost data by taking a photo of the cash receipt from the respective gas station where the charging was performed. OCR (optical character recognition) is the conversion of images of typed, handwritten or printed text into machine-encoded text. Once we have the text machine-encoded we can further use it in machine processes, like translation, or extracted, meaning text-to-speech transformed, helping people in simple everyday tasks. Users of the application will be able to enter other completely different costs grouped into categories and other charges. Car Log application quickly and easily can visualize, edit and add different costs for a ear. It also supports the ability to add multiple profiles, by entering data for all ears in a single family, for example, or a small business. The test results are positive thus we intend to further develop a cloud ready application.展开更多
Handwritten character recognition is considered challenging compared with machine-printed characters due to the different human writing styles.Arabic is morphologically rich,and its characters have a high similarity.T...Handwritten character recognition is considered challenging compared with machine-printed characters due to the different human writing styles.Arabic is morphologically rich,and its characters have a high similarity.The Arabic language includes 28 characters.Each character has up to four shapes according to its location in the word(at the beginning,middle,end,and isolated).This paper proposed 12 CNN architectures for recognizing handwritten Arabic characters.The proposed architectures were derived from the popular CNN architectures,such as VGG,ResNet,and Inception,to make them applicable to recognizing character-size images.The experimental results on three well-known datasets showed that the proposed architectures significantly enhanced the recognition rate compared to the baseline models.The experiments showed that data augmentation improved the models’accuracies on all tested datasets.The proposed model outperformed most of the existing approaches.The best achieved results were 93.05%,98.30%,and 96.88%on the HIJJA,AHCD,and AIA9K datasets.展开更多
An optical imaging system and a configuration characteristic algorithm are presented to reduce the difficulties in extracting intact characters image with weak contrast, in recognizing characters on fast moving beer b...An optical imaging system and a configuration characteristic algorithm are presented to reduce the difficulties in extracting intact characters image with weak contrast, in recognizing characters on fast moving beer bottles. The system consists of a hardware subsystem, including a rotating device, CCD, 16 mm focus lens, a frame grabber card, a penetrating lighting and a computer, and a software subsystem. The software subsystem performs pretreatment, character segmentation and character recognition. In the pretreatment, the original image is filtered with preset threshold to remove isolated spots. Then the horizontal projection and the vertical projection are used respectively to retrieve the character segmentation. Subsequently, the configuration characteristic algorithm is applied to recognize the characters. The experimental results demonstrate that this system can recognize the characters on beer bottles accurately and effectively; the algorithm is proven fast, stable and robust, making it suitable in the industrial environment.展开更多
In today’s digital era,the text may be in form of images.This research aims to deal with the problem by recognizing such text and utilizing the support vector machine(SVM).A lot of work has been done on the English l...In today’s digital era,the text may be in form of images.This research aims to deal with the problem by recognizing such text and utilizing the support vector machine(SVM).A lot of work has been done on the English language for handwritten character recognition but very less work on the under-resourced Hindi language.A method is developed for identifying Hindi language characters that use morphology,edge detection,histograms of oriented gradients(HOG),and SVM classes for summary creation.SVM rank employs the summary to extract essential phrases based on paragraph position,phrase position,numerical data,inverted comma,sentence length,and keywords features.The primary goal of the SVM optimization function is to reduce the number of features by eliminating unnecessary and redundant features.The second goal is to maintain or improve the classification system’s performance.The experiment included news articles from various genres,such as Bollywood,politics,and sports.The proposed method’s accuracy for Hindi character recognition is 96.97%,which is good compared with baseline approaches,and system-generated summaries are compared to human summaries.The evaluated results show a precision of 72%at a compression ratio of 50%and a precision of 60%at a compression ratio of 25%,in comparison to state-of-the-art methods,this is a decent result.展开更多
This study presents a single-class and multi-class instance segmentation approach applied to ancient Palmyrene inscriptions,employing two state-of-the-art deep learning algorithms,namely YOLOv8 and Roboflow 3.0.The go...This study presents a single-class and multi-class instance segmentation approach applied to ancient Palmyrene inscriptions,employing two state-of-the-art deep learning algorithms,namely YOLOv8 and Roboflow 3.0.The goal is to contribute to the preservation and understanding of historical texts,showcasing the potential of modern deep learning methods in archaeological research.Our research culminates in several key findings and scientific contributions.We comprehensively compare the performance of YOLOv8 and Roboflow 3.0 in the context of Palmyrene character segmentation—this comparative analysis mainly focuses on the strengths and weaknesses of each algorithm in this context.We also created and annotated an extensive dataset of Palmyrene inscriptions,a crucial resource for further research in the field.The dataset serves for training and evaluating the segmentation models.We employ comparative evaluation metrics to quantitatively assess the segmentation results,ensuring the reliability and reproducibility of our findings and we present custom visualization tools for predicted segmentation masks.Our study advances the state of the art in semi-automatic reading of Palmyrene inscriptions and establishes a benchmark for future research.The availability of the Palmyrene dataset and the insights into algorithm performance contribute to the broader understanding of historical text analysis.展开更多
License plate recognition (LPR) applies image processing and character recognition technology to identify vehicles by automatically reading their license plates. The work presented in this paper aims to create a compu...License plate recognition (LPR) applies image processing and character recognition technology to identify vehicles by automatically reading their license plates. The work presented in this paper aims to create a computer vision system capable of taking real-time input image from a static camera and identifying the license plate from extracted image. This problem is examined in two stages: First the license plate region detection and extraction from background and plate segmentation to sub-images, and second the character recognition stage. The method used for the license plate region detection is based on the assumption that the license plate area is a high concentration of smaller details, making it a region of high intensity of edges. The Sobel filter and their vertical and horizontal projections are used to identify the plate region. The result of testing this stage was an accuracy of 67.5%. The final stage of the LPR system is optical character recognition (OCR). The method adopted for this stage is based on template matching using correlation. Testing the performance of OCR resulted in an overall recognition rate of 87.76%.展开更多
Cards Recognition Systems,(CRSs)are representative computer vision-based applications.They have a broad range of usage scenarios.For example,they can be used to recognize images containing business cards,personal iden...Cards Recognition Systems,(CRSs)are representative computer vision-based applications.They have a broad range of usage scenarios.For example,they can be used to recognize images containing business cards,personal identification cards,and bank cards etc.Even though CRSs have been studied for many years,it is still difficult to recognize cards in camera-based images taken by ordinary devices,e.g.,mobile phones.Diversity of viewpoints and complex backgrounds in the images make the recognition task challenging.Existing systems employing traditional image processing schemes are not robust to varied environment,and are inefficient in dealing with natural images,e.g.,taken by mobile phones.To tackle the problem,we propose a novel framework for card recognition by employing a Convolutional Neutral Network(CNN)based approach.The system localizes the foreground of the image by utilizing a Fully Convolutional Network(FCN).With the help of the foreground map,the system localizes the corners of the card region and employs perspective transformation to alleviate the effects from distortion.Text lines in the card region are detected and recognized by utilizing CNN and Long Short Term Memory,(LSTM).To evaluate the proposed scheme,we collect a large dataset which contains 4,065 images in a variety of shooting scenarios.Experimental results demonstrate the efficacy of the proposed scheme.Specifically,it is able to achieve an accuracy of 90.62%in the end-toend test,outperforming the state-of-the-art.展开更多
The optical character recognition for the right to left and cursive languages such as Arabic is challenging and received little attention from researchers in the past compared to the other Latin languages.Moreover,the...The optical character recognition for the right to left and cursive languages such as Arabic is challenging and received little attention from researchers in the past compared to the other Latin languages.Moreover,the absence of a standard publicly available dataset for several low-resource lan-guages,including the Pashto language remained a hurdle in the advancement of language processing.Realizing that,a clean dataset is the fundamental and core requirement of character recognition,this research begins with dataset generation and aims at a system capable of complete language understanding.Keeping in view the complete and full autonomous recognition of the cursive Pashto script.The first achievement of this research is a clean and standard dataset for the isolated characters of the Pashto script.In this paper,a database of isolated Pashto characters for forty four alphabets using various font styles has been introduced.In order to overcome the font style shortage,the graphical software Inkscape has been used to generate sufficient image data samples for each character.The dataset has been pre-processed and reduced in dimensions to 32×32 pixels,and further converted into the binary format with a black background and white text so that it resembles the Modified National Institute of Standards and Technology(MNIST)database.The benchmark database is publicly available for further research on the standard GitHub and Kaggle database servers both in pixel and Comma Separated Values(CSV)formats.展开更多
随着城市矿产资源循环利用技术的不断发展,废旧手机回收已成为当前研究热点。受限于计算资源和数据资源的相对缺乏,目前基于线下智能回收装备的废旧手机识别精度难以达到实际应用。针对上述问题,提出一种基于多元特征异构集成深度学习...随着城市矿产资源循环利用技术的不断发展,废旧手机回收已成为当前研究热点。受限于计算资源和数据资源的相对缺乏,目前基于线下智能回收装备的废旧手机识别精度难以达到实际应用。针对上述问题,提出一种基于多元特征异构集成深度学习的图像识别模型。首先,利用字符级文本检测算法(character region awareness for text detection,CRAFT)提取手机背部字符区域,再利用ImageNet预训练的VGG19模型作为图像特征嵌入模型,利用迁移学习理念提取待回收手机的局部字符特征和全局图像特征;然后,利用局部特征构建神经网络模式光学字符识别(optical character recognition,OCR)模型,利用全局和局部特征构建非神经网络模式深度森林分类(deep forest classification,DFC)模型;最后,将异构OCR和DFC识别模型输出的结果与向量组合后输入Softmax进行集成,基于权重向量得分最大准则获取最终识别结果。基于废旧手机回收装备的真实图像验证了所提方法的有效性。展开更多
目的:设计一种基于光学字符识别(optical character recognition,OCR)模型的医疗救治装备数据采集平台,以实现应急灾害救援条件下医疗数据的自动化采集。方法:该平台以医疗物联网“感知—网络—平台”架构为基础构建。首先,选取Raspberr...目的:设计一种基于光学字符识别(optical character recognition,OCR)模型的医疗救治装备数据采集平台,以实现应急灾害救援条件下医疗数据的自动化采集。方法:该平台以医疗物联网“感知—网络—平台”架构为基础构建。首先,选取Raspberry Pi 4B作为边缘节点,使用视频采集卡、摄像头、平板计算机等搭建硬件环境。其次,基于卷积循环神经网络(convolutional recurrent neural network,CRNN)优化OCR模型,通过软硬件协同方式实现医疗终端视频流处理与数据提取。最后,采用FineBI工具实现交互界面设计与数据库链接。结果:经实验验证,该平台的硬件环境可靠、稳定,优化后的OCR模型文本识别准确率提升,且采用该平台能够实现对医疗设备数据的快速、自动化采集。结论:采用该平台能够为医护人员提供全面、准确的医疗救治装备数据支撑,有利于提升医疗救治效率。展开更多
目的在影像归档和通信系统(Picture Archiving and Communication System,PACS)数据库文件丢失或损坏后,实现影像资料和PDF报告关键信息的快速识别和重组,供患者回诊使用。方法利用基于深度学习的光学字符识别技术和Pydicom技术分别读取...目的在影像归档和通信系统(Picture Archiving and Communication System,PACS)数据库文件丢失或损坏后,实现影像资料和PDF报告关键信息的快速识别和重组,供患者回诊使用。方法利用基于深度学习的光学字符识别技术和Pydicom技术分别读取PDF和DCOM文件中的基本信息,重新建立起患者、影像、报告三者之间的联系,并将关联数据写入数据库。结果经抽样验证,该方法识别同类图像精度的准确度、精准度及召回率均为100%,综合指标F1值为1,在不同组别独立样本间的识别精度表现出一致性。平均每份报告识别时间约为0.14 s(t=-1.005,P=0.315),说明不同组别独立样本间的识别时间表现出一致性。结论该方法的使用能有效缩短数据库故障后患者等待时长,能够在短时间内恢复医疗秩序,可用于PACS数据库数据丢失后的应急处置,也为PACS的数据整合提供依据,为医学影像数据恢复和数据整合提供一种新思路。展开更多
目前通信机房图片归档,人工操作占据了主导地位,然而这种方式存在效率低、易出错等缺陷。在此背景下,文章提出了一种基于光学字符识别(Optical Character Recognition,OCR)模型的通信机房图片归档系统。该系统通过自动识别图片中的文字...目前通信机房图片归档,人工操作占据了主导地位,然而这种方式存在效率低、易出错等缺陷。在此背景下,文章提出了一种基于光学字符识别(Optical Character Recognition,OCR)模型的通信机房图片归档系统。该系统通过自动识别图片中的文字信息,分析图片所属的机房位置,进而按照机柜位置分类归档图片,实现自动化管理。经过测试,该系统的归档准确率达到了98%以上,显著提高了通信机房图片归档的效率。展开更多
文摘This study aims to review the latest contributions in Arabic Optical Character Recognition(OCR)during the last decade,which helps interested researchers know the existing techniques and extend or adapt them accordingly.The study describes the characteristics of the Arabic language,different types of OCR systems,different stages of the Arabic OCR system,the researcher’s contributions in each step,and the evaluationmetrics for OCR.The study reviews the existing datasets for the Arabic OCR and their characteristics.Additionally,this study implemented some preprocessing and segmentation stages of Arabic OCR.The study compares the performance of the existing methods in terms of recognition accuracy.In addition to researchers’OCRmethods,commercial and open-source systems are used in the comparison.The Arabic language is morphologically rich and written cursive with dots and diacritics above and under the characters.Most of the existing approaches in the literature were evaluated on isolated characters or isolated words under a controlled environment,and few approaches were tested on pagelevel scripts.Some comparative studies show that the accuracy of the existing Arabic OCR commercial systems is low,under 75%for printed text,and further improvement is needed.Moreover,most of the current approaches are offline OCR systems,and there is no remarkable contribution to online OCR systems.
基金supported by science and technology projects of Gansu State Grid Corporation of China(52272220002U).
文摘Optical Character Recognition(OCR)refers to a technology that uses image processing technology and character recognition algorithms to identify characters on an image.This paper is a deep study on the recognition effect of OCR based on Artificial Intelligence(AI)algorithms,in which the different AI algorithms for OCR analysis are classified and reviewed.Firstly,the mechanisms and characteristics of artificial neural network-based OCR are summarized.Secondly,this paper explores machine learning-based OCR,and draws the conclusion that the algorithms available for this form of OCR are still in their infancy,with low generalization and fixed recognition errors,albeit with better recognition effect and higher recognition accuracy.Finally,this paper explores several of the latest algorithms such as deep learning and pattern recognition algorithms.This paper concludes that OCR requires algorithms with higher recognition accuracy.
文摘The purpose of the paper is to develop a mobile Android application--"Car Log" that gives to users the ability to track all the costs for a vehicle and the ability to add fuel cost data by taking a photo of the cash receipt from the respective gas station where the charging was performed. OCR (optical character recognition) is the conversion of images of typed, handwritten or printed text into machine-encoded text. Once we have the text machine-encoded we can further use it in machine processes, like translation, or extracted, meaning text-to-speech transformed, helping people in simple everyday tasks. Users of the application will be able to enter other completely different costs grouped into categories and other charges. Car Log application quickly and easily can visualize, edit and add different costs for a ear. It also supports the ability to add multiple profiles, by entering data for all ears in a single family, for example, or a small business. The test results are positive thus we intend to further develop a cloud ready application.
文摘Handwritten character recognition is considered challenging compared with machine-printed characters due to the different human writing styles.Arabic is morphologically rich,and its characters have a high similarity.The Arabic language includes 28 characters.Each character has up to four shapes according to its location in the word(at the beginning,middle,end,and isolated).This paper proposed 12 CNN architectures for recognizing handwritten Arabic characters.The proposed architectures were derived from the popular CNN architectures,such as VGG,ResNet,and Inception,to make them applicable to recognizing character-size images.The experimental results on three well-known datasets showed that the proposed architectures significantly enhanced the recognition rate compared to the baseline models.The experiments showed that data augmentation improved the models’accuracies on all tested datasets.The proposed model outperformed most of the existing approaches.The best achieved results were 93.05%,98.30%,and 96.88%on the HIJJA,AHCD,and AIA9K datasets.
基金This project is supported by Municipal Science Foundation of Wuhan(No.T20001101005).
文摘An optical imaging system and a configuration characteristic algorithm are presented to reduce the difficulties in extracting intact characters image with weak contrast, in recognizing characters on fast moving beer bottles. The system consists of a hardware subsystem, including a rotating device, CCD, 16 mm focus lens, a frame grabber card, a penetrating lighting and a computer, and a software subsystem. The software subsystem performs pretreatment, character segmentation and character recognition. In the pretreatment, the original image is filtered with preset threshold to remove isolated spots. Then the horizontal projection and the vertical projection are used respectively to retrieve the character segmentation. Subsequently, the configuration characteristic algorithm is applied to recognize the characters. The experimental results demonstrate that this system can recognize the characters on beer bottles accurately and effectively; the algorithm is proven fast, stable and robust, making it suitable in the industrial environment.
文摘In today’s digital era,the text may be in form of images.This research aims to deal with the problem by recognizing such text and utilizing the support vector machine(SVM).A lot of work has been done on the English language for handwritten character recognition but very less work on the under-resourced Hindi language.A method is developed for identifying Hindi language characters that use morphology,edge detection,histograms of oriented gradients(HOG),and SVM classes for summary creation.SVM rank employs the summary to extract essential phrases based on paragraph position,phrase position,numerical data,inverted comma,sentence length,and keywords features.The primary goal of the SVM optimization function is to reduce the number of features by eliminating unnecessary and redundant features.The second goal is to maintain or improve the classification system’s performance.The experiment included news articles from various genres,such as Bollywood,politics,and sports.The proposed method’s accuracy for Hindi character recognition is 96.97%,which is good compared with baseline approaches,and system-generated summaries are compared to human summaries.The evaluated results show a precision of 72%at a compression ratio of 50%and a precision of 60%at a compression ratio of 25%,in comparison to state-of-the-art methods,this is a decent result.
基金The results and knowledge included herein have been obtained owing to support from the following institutional grant.Internal grant agency of the Faculty of Economics and Management,Czech University of Life Sciences Prague,Grant No.2023A0004-“Text Segmentation Methods of Historical Alphabets in OCR Development”.https://iga.pef.czu.cz/.Funds were granted to T.Novák,A.Hamplová,O.Svojše,and A.Veselýfrom the author team.
文摘This study presents a single-class and multi-class instance segmentation approach applied to ancient Palmyrene inscriptions,employing two state-of-the-art deep learning algorithms,namely YOLOv8 and Roboflow 3.0.The goal is to contribute to the preservation and understanding of historical texts,showcasing the potential of modern deep learning methods in archaeological research.Our research culminates in several key findings and scientific contributions.We comprehensively compare the performance of YOLOv8 and Roboflow 3.0 in the context of Palmyrene character segmentation—this comparative analysis mainly focuses on the strengths and weaknesses of each algorithm in this context.We also created and annotated an extensive dataset of Palmyrene inscriptions,a crucial resource for further research in the field.The dataset serves for training and evaluating the segmentation models.We employ comparative evaluation metrics to quantitatively assess the segmentation results,ensuring the reliability and reproducibility of our findings and we present custom visualization tools for predicted segmentation masks.Our study advances the state of the art in semi-automatic reading of Palmyrene inscriptions and establishes a benchmark for future research.The availability of the Palmyrene dataset and the insights into algorithm performance contribute to the broader understanding of historical text analysis.
文摘License plate recognition (LPR) applies image processing and character recognition technology to identify vehicles by automatically reading their license plates. The work presented in this paper aims to create a computer vision system capable of taking real-time input image from a static camera and identifying the license plate from extracted image. This problem is examined in two stages: First the license plate region detection and extraction from background and plate segmentation to sub-images, and second the character recognition stage. The method used for the license plate region detection is based on the assumption that the license plate area is a high concentration of smaller details, making it a region of high intensity of edges. The Sobel filter and their vertical and horizontal projections are used to identify the plate region. The result of testing this stage was an accuracy of 67.5%. The final stage of the LPR system is optical character recognition (OCR). The method adopted for this stage is based on template matching using correlation. Testing the performance of OCR resulted in an overall recognition rate of 87.76%.
基金This work was supported by the National Natural Science Foundation of China(Grant No.61702046)National Key R&D Program of China(Grant No.2017YFB1401500 and 2017YFB1400800).
文摘Cards Recognition Systems,(CRSs)are representative computer vision-based applications.They have a broad range of usage scenarios.For example,they can be used to recognize images containing business cards,personal identification cards,and bank cards etc.Even though CRSs have been studied for many years,it is still difficult to recognize cards in camera-based images taken by ordinary devices,e.g.,mobile phones.Diversity of viewpoints and complex backgrounds in the images make the recognition task challenging.Existing systems employing traditional image processing schemes are not robust to varied environment,and are inefficient in dealing with natural images,e.g.,taken by mobile phones.To tackle the problem,we propose a novel framework for card recognition by employing a Convolutional Neutral Network(CNN)based approach.The system localizes the foreground of the image by utilizing a Fully Convolutional Network(FCN).With the help of the foreground map,the system localizes the corners of the card region and employs perspective transformation to alleviate the effects from distortion.Text lines in the card region are detected and recognized by utilizing CNN and Long Short Term Memory,(LSTM).To evaluate the proposed scheme,we collect a large dataset which contains 4,065 images in a variety of shooting scenarios.Experimental results demonstrate the efficacy of the proposed scheme.Specifically,it is able to achieve an accuracy of 90.62%in the end-toend test,outperforming the state-of-the-art.
文摘The optical character recognition for the right to left and cursive languages such as Arabic is challenging and received little attention from researchers in the past compared to the other Latin languages.Moreover,the absence of a standard publicly available dataset for several low-resource lan-guages,including the Pashto language remained a hurdle in the advancement of language processing.Realizing that,a clean dataset is the fundamental and core requirement of character recognition,this research begins with dataset generation and aims at a system capable of complete language understanding.Keeping in view the complete and full autonomous recognition of the cursive Pashto script.The first achievement of this research is a clean and standard dataset for the isolated characters of the Pashto script.In this paper,a database of isolated Pashto characters for forty four alphabets using various font styles has been introduced.In order to overcome the font style shortage,the graphical software Inkscape has been used to generate sufficient image data samples for each character.The dataset has been pre-processed and reduced in dimensions to 32×32 pixels,and further converted into the binary format with a black background and white text so that it resembles the Modified National Institute of Standards and Technology(MNIST)database.The benchmark database is publicly available for further research on the standard GitHub and Kaggle database servers both in pixel and Comma Separated Values(CSV)formats.
文摘随着城市矿产资源循环利用技术的不断发展,废旧手机回收已成为当前研究热点。受限于计算资源和数据资源的相对缺乏,目前基于线下智能回收装备的废旧手机识别精度难以达到实际应用。针对上述问题,提出一种基于多元特征异构集成深度学习的图像识别模型。首先,利用字符级文本检测算法(character region awareness for text detection,CRAFT)提取手机背部字符区域,再利用ImageNet预训练的VGG19模型作为图像特征嵌入模型,利用迁移学习理念提取待回收手机的局部字符特征和全局图像特征;然后,利用局部特征构建神经网络模式光学字符识别(optical character recognition,OCR)模型,利用全局和局部特征构建非神经网络模式深度森林分类(deep forest classification,DFC)模型;最后,将异构OCR和DFC识别模型输出的结果与向量组合后输入Softmax进行集成,基于权重向量得分最大准则获取最终识别结果。基于废旧手机回收装备的真实图像验证了所提方法的有效性。
文摘目的:设计一种基于光学字符识别(optical character recognition,OCR)模型的医疗救治装备数据采集平台,以实现应急灾害救援条件下医疗数据的自动化采集。方法:该平台以医疗物联网“感知—网络—平台”架构为基础构建。首先,选取Raspberry Pi 4B作为边缘节点,使用视频采集卡、摄像头、平板计算机等搭建硬件环境。其次,基于卷积循环神经网络(convolutional recurrent neural network,CRNN)优化OCR模型,通过软硬件协同方式实现医疗终端视频流处理与数据提取。最后,采用FineBI工具实现交互界面设计与数据库链接。结果:经实验验证,该平台的硬件环境可靠、稳定,优化后的OCR模型文本识别准确率提升,且采用该平台能够实现对医疗设备数据的快速、自动化采集。结论:采用该平台能够为医护人员提供全面、准确的医疗救治装备数据支撑,有利于提升医疗救治效率。
文摘目的在影像归档和通信系统(Picture Archiving and Communication System,PACS)数据库文件丢失或损坏后,实现影像资料和PDF报告关键信息的快速识别和重组,供患者回诊使用。方法利用基于深度学习的光学字符识别技术和Pydicom技术分别读取PDF和DCOM文件中的基本信息,重新建立起患者、影像、报告三者之间的联系,并将关联数据写入数据库。结果经抽样验证,该方法识别同类图像精度的准确度、精准度及召回率均为100%,综合指标F1值为1,在不同组别独立样本间的识别精度表现出一致性。平均每份报告识别时间约为0.14 s(t=-1.005,P=0.315),说明不同组别独立样本间的识别时间表现出一致性。结论该方法的使用能有效缩短数据库故障后患者等待时长,能够在短时间内恢复医疗秩序,可用于PACS数据库数据丢失后的应急处置,也为PACS的数据整合提供依据,为医学影像数据恢复和数据整合提供一种新思路。
文摘目前通信机房图片归档,人工操作占据了主导地位,然而这种方式存在效率低、易出错等缺陷。在此背景下,文章提出了一种基于光学字符识别(Optical Character Recognition,OCR)模型的通信机房图片归档系统。该系统通过自动识别图片中的文字信息,分析图片所属的机房位置,进而按照机柜位置分类归档图片,实现自动化管理。经过测试,该系统的归档准确率达到了98%以上,显著提高了通信机房图片归档的效率。