Research on intelligent and robotic excavator has become a focus both at home and abroad, and this type of excavator becomes more and more important in application. In this paper, we developed a control system which c...Research on intelligent and robotic excavator has become a focus both at home and abroad, and this type of excavator becomes more and more important in application. In this paper, we developed a control system which can make the intelligent robotic excavator perform excavating operation autonomously. It can recognize the excavating targets by itself, program the operation automatically based on the original parameter, and finish all the tasks. Experimental results indicate the validity in real-time performance and precision of the control system. The intelligent robotic excavator can remarkably ease the labor intensity and enhance the working efficiency.展开更多
AIM: To quantitatively evaluate the effect of a simulated smog environment on human visual function by psychophysical methods.METHODS: The smog environment was simulated in a 40×40×60 cm3 glass chamber fil...AIM: To quantitatively evaluate the effect of a simulated smog environment on human visual function by psychophysical methods.METHODS: The smog environment was simulated in a 40×40×60 cm3 glass chamber filled with a PM2.5 aerosol, and 14 subjects with normal visual function were examined by psychophysical methods with the foggy smog box placed in front of their eyes. The transmission of light through the smog box, an indication of the percentage concentration of smog, was determined with a luminance meter. Visual function under different smog concentrations was evaluated by the E-visual acuity, crowded E-visual acuity and contrast sensitivity.RESULTS: E-visual acuity, crowded E-visual acuity and contrast sensitivity were all impaired with a decrease in the transmission rate(TR) according to power functions, with invariable exponents of-1.41,-1.62 and-0.7, respectively, and R2 values of 0.99 for E and crowded E-visual acuity, 0.96 for contrast sensitivity. Crowded E-visual acuity decreased faster than E-visual acuity. There was a good correlation between the TR, extinction coefficient and visibility under heavy-smog conditions.CONCLUSION: Increases in smog concentration have a strong effect on visual function.展开更多
Brood parasitic birds lay eggs in the nests of other birds,and the parasitized hosts can reduce the cost of raising unrelated offspring through the recognition of parasitic eggs.Hosts can adopt vision-based cognitive ...Brood parasitic birds lay eggs in the nests of other birds,and the parasitized hosts can reduce the cost of raising unrelated offspring through the recognition of parasitic eggs.Hosts can adopt vision-based cognitive mechanisms to recognize foreign eggs by comparing the colors of foreign and host eggs.However,there is currently no uniform conclusion as to whether this comparison involves the single or multiple threshold decision rules.In this study,we tested both hypotheses by adding model eggs of different colors to the nests of Barn Swallows(Hirundo rustica)of two geographical populations breeding in Hainan and Heilongjiang Provinces in China.Results showed that Barn Swallows rejected more white model eggs(moderate mimetic to their own eggs)and blue model eggs(highly non-mimetic eggs with shorter reflectance spectrum)than red model eggs(highly nonmimetic eggs with longer reflectance spectrum).There was no difference in the rejection rate of model eggs between the two populations of Barn Swallows,and clutch size was not a factor affecting egg recognition.Our results are consistent with the single rejection threshold model.This study provides strong experimental evidence that the color of model eggs can has an important effect on egg recognition in Barn Swallows,opening up new avenues to uncover the evolution of cuckoo egg mimicry and explore the cognitive mechanisms underlying the visual recognition of foreign eggs by hosts.展开更多
Flatness pattern recognition is the key of the flatness control. The accuracy of the present flatness pattern recognition is limited and the shape defects cannot be reflected intuitively. In order to improve it, a nov...Flatness pattern recognition is the key of the flatness control. The accuracy of the present flatness pattern recognition is limited and the shape defects cannot be reflected intuitively. In order to improve it, a novel method via T-S cloud inference network optimized by genetic algorithm(GA) is proposed. T-S cloud inference network is constructed with T-S fuzzy neural network and the cloud model. So, the rapid of fuzzy logic and the uncertainty of cloud model for processing data are both taken into account. What's more, GA possesses good parallel design structure and global optimization characteristics. Compared with the simulation recognition results of traditional BP Algorithm, GA is more accurate and effective. Moreover, virtual reality technology is introduced into the field of shape control by Lab VIEW, MATLAB mixed programming. And virtual flatness pattern recognition interface is designed.Therefore, the data of engineering analysis and the actual model are combined with each other, and the shape defects could be seen more lively and intuitively.展开更多
Visual recognition is currently one of the most important and active research areas in computer vision,pattern recognition,and even the general field of artificial intelligence.It has great fundamental importance and ...Visual recognition is currently one of the most important and active research areas in computer vision,pattern recognition,and even the general field of artificial intelligence.It has great fundamental importance and strong industrial needs,particularly the modern deep neural networks(DNNs)and some brain-inspired methodologies,have largely boosted the recognition performance on many concrete tasks,with the help of large amounts of training data and new powerful computation resources.Although recognition accuracy is usually the first concern for new progresses,efficiency is actually rather important and sometimes critical for both academic research and industrial applications.Moreover,insightful views on the opportunities and challenges of efficiency are also highly required for the entire community.While general surveys on the efficiency issue have been done from various perspectives,as far as we are aware,scarcely any of them focused on visual recognition systematically,and thus it is unclear which progresses are applicable to it and what else should be concerned.In this survey,we present the review of recent advances with our suggestions on the new possible directions towards improving the efficiency of DNN-related and brain-inspired visual recognition approaches,including efficient network compression and dynamic brain-inspired networks.We investigate not only from the model but also from the data point of view(which is not the case in existing surveys)and focus on four typical data types(images,video,points,and events).This survey attempts to provide a systematic summary via a comprehensive survey that can serve as a valuable reference and inspire both researchers and practitioners working on visual recognition problems.展开更多
The methods of visual recognition,positioning and orienting with simple 3 D geometric workpieces are presented in this paper.The principle and operating process of multiple orientation run le...The methods of visual recognition,positioning and orienting with simple 3 D geometric workpieces are presented in this paper.The principle and operating process of multiple orientation run length coding based on general orientation run length coding and visual recognition method are described elaborately.The method of positioning and orientating based on the moment of inertia of the workpiece binary image is stated also.It has been applied in a research on flexible automatic coordinate measuring system formed by integrating computer aided design,computer vision and computer aided inspection planning,with a coordinate measuring machine.The results show that integrating computer vision with measurement system is a feasible and effective approach to improve their flexibility and automation.展开更多
A series of novel six-coordinated terpyridine zinc complexes,containing ammonium salts and thymine fragment at the two terminals,have been designed and synthesized,which can function as highly sensitive visualized sen...A series of novel six-coordinated terpyridine zinc complexes,containing ammonium salts and thymine fragment at the two terminals,have been designed and synthesized,which can function as highly sensitive visualized sensors for melamine detection via selective metallo-hydrogel formation.After fully characterization by various techniques,the complementary triple-hydrogen-bonding between the thymine fragment and melamine,as well as π-π stacking interactions may be responsible for the selective metallo-hydrogel formation.In light of the possible interference aroused by milk ingredients(proteins,peptides and amino acids) and legal/illegal additives(urine,sugars and vitamins),a series of control experiments are therefore involved.To our delight,this visual recognition is highly selective,no gelation was observed with the selected milk ingredients or additives.Remarkably,this new developed protocol enables convenient and highly selective visual recognition of melamine at a concentration as low as 10 ppm in raw milk without any tedious pretreatment.展开更多
The optical character recognition for the right to left and cursive languages such as Arabic is challenging and received little attention from researchers in the past compared to the other Latin languages.Moreover,the...The optical character recognition for the right to left and cursive languages such as Arabic is challenging and received little attention from researchers in the past compared to the other Latin languages.Moreover,the absence of a standard publicly available dataset for several low-resource lan-guages,including the Pashto language remained a hurdle in the advancement of language processing.Realizing that,a clean dataset is the fundamental and core requirement of character recognition,this research begins with dataset generation and aims at a system capable of complete language understanding.Keeping in view the complete and full autonomous recognition of the cursive Pashto script.The first achievement of this research is a clean and standard dataset for the isolated characters of the Pashto script.In this paper,a database of isolated Pashto characters for forty four alphabets using various font styles has been introduced.In order to overcome the font style shortage,the graphical software Inkscape has been used to generate sufficient image data samples for each character.The dataset has been pre-processed and reduced in dimensions to 32×32 pixels,and further converted into the binary format with a black background and white text so that it resembles the Modified National Institute of Standards and Technology(MNIST)database.The benchmark database is publicly available for further research on the standard GitHub and Kaggle database servers both in pixel and Comma Separated Values(CSV)formats.展开更多
Lip-reading technologies are rapidly progressing following the breakthrough of deep learning.It plays a vital role in its many applications,such as:human-machine communication practices or security applications.In thi...Lip-reading technologies are rapidly progressing following the breakthrough of deep learning.It plays a vital role in its many applications,such as:human-machine communication practices or security applications.In this paper,we propose to develop an effective lip-reading recognition model for Arabic visual speech recognition by implementing deep learning algorithms.The Arabic visual datasets that have been collected contains 2400 records of Arabic digits and 960 records of Arabic phrases from 24 native speakers.The primary purpose is to provide a high-performance model in terms of enhancing the preprocessing phase.Firstly,we extract keyframes from our dataset.Secondly,we produce a Concatenated Frame Images(CFIs)that represent the utterance sequence in one single image.Finally,the VGG-19 is employed for visual features extraction in our proposed model.We have examined different keyframes:10,15,and 20 for comparing two types of approaches in the proposed model:(1)the VGG-19 base model and(2)VGG-19 base model with batch normalization.The results show that the second approach achieves greater accuracy:94%for digit recognition,97%for phrase recognition,and 93%for digits and phrases recognition in the test dataset.Therefore,our proposed model is superior to models based on CFIs input.展开更多
The continuing advances in deep learning have paved the way for several challenging ideas.One such idea is visual lip-reading,which has recently drawn many research interests.Lip-reading,often referred to as visual sp...The continuing advances in deep learning have paved the way for several challenging ideas.One such idea is visual lip-reading,which has recently drawn many research interests.Lip-reading,often referred to as visual speech recognition,is the ability to understand and predict spoken speech based solely on lip movements without using sounds.Due to the lack of research studies on visual speech recognition for the Arabic language in general,and its absence in the Quranic research,this research aims to fill this gap.This paper introduces a new publicly available Arabic lip-reading dataset containing 10490 videos captured from multiple viewpoints and comprising data samples at the letter level(i.e.,single letters(single alphabets)and Quranic disjoined letters)and in the word level based on the content and context of the book Al-Qaida Al-Noorania.This research uses visual speech recognition to recognize spoken Arabic letters(Arabic alphabets),Quranic disjoined letters,and Quranic words,mainly phonetic as they are recited in the Holy Quran according to Quranic study aid entitled Al-Qaida Al-Noorania.This study could further validate the correctness of pronunciation and,subsequently,assist people in correctly reciting Quran.Furthermore,a detailed description of the created dataset and its construction methodology is provided.This new dataset is used to train an effective pre-trained deep learning CNN model throughout transfer learning for lip-reading,achieving the accuracies of 83.3%,80.5%,and 77.5%on words,disjoined letters,and single letters,respectively,where an extended analysis of the results is provided.Finally,the experimental outcomes,different research aspects,and dataset collection consistency and challenges are discussed and concluded with several new promising trends for future work.展开更多
In this study, we propose an incremental learning approach based on a machine-machine interaction via relative attribute feedbacks that exploit comparative relationships among top level image categories. One machine a...In this study, we propose an incremental learning approach based on a machine-machine interaction via relative attribute feedbacks that exploit comparative relationships among top level image categories. One machine acts as 'Student (S)' with initially limited information and it endeavors to capture the task domain gradually by questioning its mentor on a pool of unlabeled data. The other machine is 'Teacher (T)' with the implicit knowledge for helping S on learning the class models. T initiates relative attributes as a communication channel by randomly sorting the classes on attribute space in an unsupervised manner. S starts modeling the categories in this intermediate level by using only a limited number of labeled data. Thereafter, it first selects an entropy-based sample from the pool of unlabeled data and triggers the conversation by propagating the selected image with its belief class in a query. Since T already knows the ground truth labels, it not only decides whether the belief is true or false, but it also provides an attribute-based feedback to S in each case without revealing the true label of the query sample if the belief is false. So the number of training data is increased virtually by dropping the falsely predicted sample back into the unlabeled pool. Next, S updates the attribute space which, in fact, has an impact on T's future responses, and then the category models are updated concurrently for the next run. We experience the weakly supervised algorithm on the real world datasets of faces and natural scenes in comparison with direct attribute prediction and semi-supervised learning approaches, and a noteworthy performance increase is achieved.展开更多
This paper presents a vision-based fingertip-writing character recognition system. The overall system is implemented through a CMOS image camera on a FPGA chip. A blue cover is mounted on the top of a finger to simpli...This paper presents a vision-based fingertip-writing character recognition system. The overall system is implemented through a CMOS image camera on a FPGA chip. A blue cover is mounted on the top of a finger to simplify fingertip detection and to enhance recognition accuracy. For each character stroke, 8 sample points (including start and end points) are recorded. 7 tangent angles between consecutive sampled points are also recorded as features. In addition, 3 features angles are extracted: angles of the triangle consisting of the start point, end point and average point of all (8 total) sampled points. According to these key feature angles, a simple template matching K-nearest-neighbor classifier is applied to distinguish each character stroke. Experimental result showed that the system can successfully recognize fingertip-writing character strokes of digits and small lower case letter alphabets with an accuracy of almost 100%. Overall, the proposed finger-tip-writing recognition system provides an easy-to-use and accurate visual character input method.展开更多
Garbage incineration is an ideal method for the harmless and resource-oriented treatment of urban domestic waste.However,current domestic waste incineration power plants often face challenges related to maintaining co...Garbage incineration is an ideal method for the harmless and resource-oriented treatment of urban domestic waste.However,current domestic waste incineration power plants often face challenges related to maintaining consistent steam production and high operational costs.This article capitalizes on the technical advantages of big data artificial intelligence,optimizing the power generation process of domestic waste incineration as the entry point,and adopts four main engine modules of Alibaba Cloud reinforcement learning algorithm engine,operating parameter prediction engine,anomaly recognition engine,and video visual recognition algorithm engine.The reinforcement learning algorithm extracts the operational parameters of each incinerator to obtain a control benchmark.Through the operating parameter prediction algorithm,prediction models for drum pressure,primary steam flow,NOx,SO2,and HCl are constructed to achieve short-term prediction of operational parameters,ultimately improving control performance.The anomaly recognition algorithm develops a thickness identification model for the material layer in the drying section,allowing for rapid and effective assessment of feed material thickness to ensure uniformity control.Meanwhile,the visual recognition algorithm identifies flame images and assesses the combustion status and location of the combustion fire line within the furnace.This real-time understanding of furnace flame combustion conditions guides adjustments to the grate and air volume.Integrating AI technology into the waste incineration sector empowers the environmental protection industry with the potential to leverage big data.This development holds practical significance in optimizing the harmless and resource-oriented treatment of urban domestic waste,reducing operational costs,and increasing efficiency.展开更多
Lip reading is typically regarded as visually interpreting the speaker’s lip movements during the speaking.This is a task of decoding the text from the speaker’s mouth movement.This paper proposes a lip-reading mode...Lip reading is typically regarded as visually interpreting the speaker’s lip movements during the speaking.This is a task of decoding the text from the speaker’s mouth movement.This paper proposes a lip-reading model that helps deaf people and persons with hearing problems to understand a speaker by capturing a video of the speaker and inputting it into the proposed model to obtain the corresponding subtitles.Using deep learning technologies makes it easier for users to extract a large number of different features,which can then be converted to probabilities of letters to obtain accurate results.Recently proposed methods for lip reading are based on sequence-to-sequence architectures that are designed for natural machine translation and audio speech recognition.However,in this paper,a deep convolutional neural network model called the hybrid lip-reading(HLR-Net)model is developed for lip reading from a video.The proposed model includes three stages,namely,preprocessing,encoder,and decoder stages,which produce the output subtitle.The inception,gradient,and bidirectional GRU layers are used to build the encoder,and the attention,fully-connected,activation function layers are used to build the decoder,which performs the connectionist temporal classification(CTC).In comparison with the three recent models,namely,the LipNet model,the lip-reading model with cascaded attention(LCANet),and attention-CTC(A-ACA)model,on the GRID corpus dataset,the proposed HLR-Net model can achieve significant improvements,achieving the CER of 4.9%,WER of 9.7%,and Bleu score of 92%in the case of unseen speakers,and the CER of 1.4%,WER of 3.3%,and Bleu score of 99%in the case of overlapped speakers.展开更多
This article investigates virtual reality (VR)-based teleoperation with robustness against modeling errors. VR technology is an effective way to overcome the large time delay during space robot teleoperation. However,...This article investigates virtual reality (VR)-based teleoperation with robustness against modeling errors. VR technology is an effective way to overcome the large time delay during space robot teleoperation. However, it depends highly on the accuracy of model. Model errors between the virtual and real environment exist inevitably. The existing way to deal with the problem is by means of either model matching or robot compliance control. As distinct from the existing methods, this article tries to combine m...展开更多
文摘Research on intelligent and robotic excavator has become a focus both at home and abroad, and this type of excavator becomes more and more important in application. In this paper, we developed a control system which can make the intelligent robotic excavator perform excavating operation autonomously. It can recognize the excavating targets by itself, program the operation automatically based on the original parameter, and finish all the tasks. Experimental results indicate the validity in real-time performance and precision of the control system. The intelligent robotic excavator can remarkably ease the labor intensity and enhance the working efficiency.
基金Supported by National Nature Science Foundation of China (No. 81570880)
文摘AIM: To quantitatively evaluate the effect of a simulated smog environment on human visual function by psychophysical methods.METHODS: The smog environment was simulated in a 40×40×60 cm3 glass chamber filled with a PM2.5 aerosol, and 14 subjects with normal visual function were examined by psychophysical methods with the foggy smog box placed in front of their eyes. The transmission of light through the smog box, an indication of the percentage concentration of smog, was determined with a luminance meter. Visual function under different smog concentrations was evaluated by the E-visual acuity, crowded E-visual acuity and contrast sensitivity.RESULTS: E-visual acuity, crowded E-visual acuity and contrast sensitivity were all impaired with a decrease in the transmission rate(TR) according to power functions, with invariable exponents of-1.41,-1.62 and-0.7, respectively, and R2 values of 0.99 for E and crowded E-visual acuity, 0.96 for contrast sensitivity. Crowded E-visual acuity decreased faster than E-visual acuity. There was a good correlation between the TR, extinction coefficient and visibility under heavy-smog conditions.CONCLUSION: Increases in smog concentration have a strong effect on visual function.
基金supported by the National Natural Science Foundation of China(Nos.31970427 and 32270526 to W.L.)。
文摘Brood parasitic birds lay eggs in the nests of other birds,and the parasitized hosts can reduce the cost of raising unrelated offspring through the recognition of parasitic eggs.Hosts can adopt vision-based cognitive mechanisms to recognize foreign eggs by comparing the colors of foreign and host eggs.However,there is currently no uniform conclusion as to whether this comparison involves the single or multiple threshold decision rules.In this study,we tested both hypotheses by adding model eggs of different colors to the nests of Barn Swallows(Hirundo rustica)of two geographical populations breeding in Hainan and Heilongjiang Provinces in China.Results showed that Barn Swallows rejected more white model eggs(moderate mimetic to their own eggs)and blue model eggs(highly non-mimetic eggs with shorter reflectance spectrum)than red model eggs(highly nonmimetic eggs with longer reflectance spectrum).There was no difference in the rejection rate of model eggs between the two populations of Barn Swallows,and clutch size was not a factor affecting egg recognition.Our results are consistent with the single rejection threshold model.This study provides strong experimental evidence that the color of model eggs can has an important effect on egg recognition in Barn Swallows,opening up new avenues to uncover the evolution of cuckoo egg mimicry and explore the cognitive mechanisms underlying the visual recognition of foreign eggs by hosts.
基金Project(LJRC013)supported by the University Innovation Team of Hebei Province Leading Talent Cultivation,China
文摘Flatness pattern recognition is the key of the flatness control. The accuracy of the present flatness pattern recognition is limited and the shape defects cannot be reflected intuitively. In order to improve it, a novel method via T-S cloud inference network optimized by genetic algorithm(GA) is proposed. T-S cloud inference network is constructed with T-S fuzzy neural network and the cloud model. So, the rapid of fuzzy logic and the uncertainty of cloud model for processing data are both taken into account. What's more, GA possesses good parallel design structure and global optimization characteristics. Compared with the simulation recognition results of traditional BP Algorithm, GA is more accurate and effective. Moreover, virtual reality technology is introduced into the field of shape control by Lab VIEW, MATLAB mixed programming. And virtual flatness pattern recognition interface is designed.Therefore, the data of engineering analysis and the actual model are combined with each other, and the shape defects could be seen more lively and intuitively.
基金supported by National Key R&D Program of China(No.2018AAA0102600)Beijing Natural Science Foundation,China(No.JQ21015)+1 种基金Beijing Academy of Artificial Intelligence(BAAI),ChinaPengcheng Laboratory,China。
文摘Visual recognition is currently one of the most important and active research areas in computer vision,pattern recognition,and even the general field of artificial intelligence.It has great fundamental importance and strong industrial needs,particularly the modern deep neural networks(DNNs)and some brain-inspired methodologies,have largely boosted the recognition performance on many concrete tasks,with the help of large amounts of training data and new powerful computation resources.Although recognition accuracy is usually the first concern for new progresses,efficiency is actually rather important and sometimes critical for both academic research and industrial applications.Moreover,insightful views on the opportunities and challenges of efficiency are also highly required for the entire community.While general surveys on the efficiency issue have been done from various perspectives,as far as we are aware,scarcely any of them focused on visual recognition systematically,and thus it is unclear which progresses are applicable to it and what else should be concerned.In this survey,we present the review of recent advances with our suggestions on the new possible directions towards improving the efficiency of DNN-related and brain-inspired visual recognition approaches,including efficient network compression and dynamic brain-inspired networks.We investigate not only from the model but also from the data point of view(which is not the case in existing surveys)and focus on four typical data types(images,video,points,and events).This survey attempts to provide a systematic summary via a comprehensive survey that can serve as a valuable reference and inspire both researchers and practitioners working on visual recognition problems.
文摘The methods of visual recognition,positioning and orienting with simple 3 D geometric workpieces are presented in this paper.The principle and operating process of multiple orientation run length coding based on general orientation run length coding and visual recognition method are described elaborately.The method of positioning and orientating based on the moment of inertia of the workpiece binary image is stated also.It has been applied in a research on flexible automatic coordinate measuring system formed by integrating computer aided design,computer vision and computer aided inspection planning,with a coordinate measuring machine.The results show that integrating computer vision with measurement system is a feasible and effective approach to improve their flexibility and automation.
基金Financial support from the State General Administration of the People’s Republic of China for Quality Supervision and Inspection and Quarantine (No.2016QK122)Shanghai Institute of Quality Inspection and Technical Research+1 种基金the National Natural Science Foundation of China (Nos.21572036 and 21861132002)the Department of Chemistry,Fudan University
文摘A series of novel six-coordinated terpyridine zinc complexes,containing ammonium salts and thymine fragment at the two terminals,have been designed and synthesized,which can function as highly sensitive visualized sensors for melamine detection via selective metallo-hydrogel formation.After fully characterization by various techniques,the complementary triple-hydrogen-bonding between the thymine fragment and melamine,as well as π-π stacking interactions may be responsible for the selective metallo-hydrogel formation.In light of the possible interference aroused by milk ingredients(proteins,peptides and amino acids) and legal/illegal additives(urine,sugars and vitamins),a series of control experiments are therefore involved.To our delight,this visual recognition is highly selective,no gelation was observed with the selected milk ingredients or additives.Remarkably,this new developed protocol enables convenient and highly selective visual recognition of melamine at a concentration as low as 10 ppm in raw milk without any tedious pretreatment.
文摘The optical character recognition for the right to left and cursive languages such as Arabic is challenging and received little attention from researchers in the past compared to the other Latin languages.Moreover,the absence of a standard publicly available dataset for several low-resource lan-guages,including the Pashto language remained a hurdle in the advancement of language processing.Realizing that,a clean dataset is the fundamental and core requirement of character recognition,this research begins with dataset generation and aims at a system capable of complete language understanding.Keeping in view the complete and full autonomous recognition of the cursive Pashto script.The first achievement of this research is a clean and standard dataset for the isolated characters of the Pashto script.In this paper,a database of isolated Pashto characters for forty four alphabets using various font styles has been introduced.In order to overcome the font style shortage,the graphical software Inkscape has been used to generate sufficient image data samples for each character.The dataset has been pre-processed and reduced in dimensions to 32×32 pixels,and further converted into the binary format with a black background and white text so that it resembles the Modified National Institute of Standards and Technology(MNIST)database.The benchmark database is publicly available for further research on the standard GitHub and Kaggle database servers both in pixel and Comma Separated Values(CSV)formats.
文摘Lip-reading technologies are rapidly progressing following the breakthrough of deep learning.It plays a vital role in its many applications,such as:human-machine communication practices or security applications.In this paper,we propose to develop an effective lip-reading recognition model for Arabic visual speech recognition by implementing deep learning algorithms.The Arabic visual datasets that have been collected contains 2400 records of Arabic digits and 960 records of Arabic phrases from 24 native speakers.The primary purpose is to provide a high-performance model in terms of enhancing the preprocessing phase.Firstly,we extract keyframes from our dataset.Secondly,we produce a Concatenated Frame Images(CFIs)that represent the utterance sequence in one single image.Finally,the VGG-19 is employed for visual features extraction in our proposed model.We have examined different keyframes:10,15,and 20 for comparing two types of approaches in the proposed model:(1)the VGG-19 base model and(2)VGG-19 base model with batch normalization.The results show that the second approach achieves greater accuracy:94%for digit recognition,97%for phrase recognition,and 93%for digits and phrases recognition in the test dataset.Therefore,our proposed model is superior to models based on CFIs input.
基金This research was supported and funded by KAU Scientific Endowment,King Abdulaziz University,Jeddah,Saudi Arabia.
文摘The continuing advances in deep learning have paved the way for several challenging ideas.One such idea is visual lip-reading,which has recently drawn many research interests.Lip-reading,often referred to as visual speech recognition,is the ability to understand and predict spoken speech based solely on lip movements without using sounds.Due to the lack of research studies on visual speech recognition for the Arabic language in general,and its absence in the Quranic research,this research aims to fill this gap.This paper introduces a new publicly available Arabic lip-reading dataset containing 10490 videos captured from multiple viewpoints and comprising data samples at the letter level(i.e.,single letters(single alphabets)and Quranic disjoined letters)and in the word level based on the content and context of the book Al-Qaida Al-Noorania.This research uses visual speech recognition to recognize spoken Arabic letters(Arabic alphabets),Quranic disjoined letters,and Quranic words,mainly phonetic as they are recited in the Holy Quran according to Quranic study aid entitled Al-Qaida Al-Noorania.This study could further validate the correctness of pronunciation and,subsequently,assist people in correctly reciting Quran.Furthermore,a detailed description of the created dataset and its construction methodology is provided.This new dataset is used to train an effective pre-trained deep learning CNN model throughout transfer learning for lip-reading,achieving the accuracies of 83.3%,80.5%,and 77.5%on words,disjoined letters,and single letters,respectively,where an extended analysis of the results is provided.Finally,the experimental outcomes,different research aspects,and dataset collection consistency and challenges are discussed and concluded with several new promising trends for future work.
文摘In this study, we propose an incremental learning approach based on a machine-machine interaction via relative attribute feedbacks that exploit comparative relationships among top level image categories. One machine acts as 'Student (S)' with initially limited information and it endeavors to capture the task domain gradually by questioning its mentor on a pool of unlabeled data. The other machine is 'Teacher (T)' with the implicit knowledge for helping S on learning the class models. T initiates relative attributes as a communication channel by randomly sorting the classes on attribute space in an unsupervised manner. S starts modeling the categories in this intermediate level by using only a limited number of labeled data. Thereafter, it first selects an entropy-based sample from the pool of unlabeled data and triggers the conversation by propagating the selected image with its belief class in a query. Since T already knows the ground truth labels, it not only decides whether the belief is true or false, but it also provides an attribute-based feedback to S in each case without revealing the true label of the query sample if the belief is false. So the number of training data is increased virtually by dropping the falsely predicted sample back into the unlabeled pool. Next, S updates the attribute space which, in fact, has an impact on T's future responses, and then the category models are updated concurrently for the next run. We experience the weakly supervised algorithm on the real world datasets of faces and natural scenes in comparison with direct attribute prediction and semi-supervised learning approaches, and a noteworthy performance increase is achieved.
文摘This paper presents a vision-based fingertip-writing character recognition system. The overall system is implemented through a CMOS image camera on a FPGA chip. A blue cover is mounted on the top of a finger to simplify fingertip detection and to enhance recognition accuracy. For each character stroke, 8 sample points (including start and end points) are recorded. 7 tangent angles between consecutive sampled points are also recorded as features. In addition, 3 features angles are extracted: angles of the triangle consisting of the start point, end point and average point of all (8 total) sampled points. According to these key feature angles, a simple template matching K-nearest-neighbor classifier is applied to distinguish each character stroke. Experimental result showed that the system can successfully recognize fingertip-writing character strokes of digits and small lower case letter alphabets with an accuracy of almost 100%. Overall, the proposed finger-tip-writing recognition system provides an easy-to-use and accurate visual character input method.
文摘Garbage incineration is an ideal method for the harmless and resource-oriented treatment of urban domestic waste.However,current domestic waste incineration power plants often face challenges related to maintaining consistent steam production and high operational costs.This article capitalizes on the technical advantages of big data artificial intelligence,optimizing the power generation process of domestic waste incineration as the entry point,and adopts four main engine modules of Alibaba Cloud reinforcement learning algorithm engine,operating parameter prediction engine,anomaly recognition engine,and video visual recognition algorithm engine.The reinforcement learning algorithm extracts the operational parameters of each incinerator to obtain a control benchmark.Through the operating parameter prediction algorithm,prediction models for drum pressure,primary steam flow,NOx,SO2,and HCl are constructed to achieve short-term prediction of operational parameters,ultimately improving control performance.The anomaly recognition algorithm develops a thickness identification model for the material layer in the drying section,allowing for rapid and effective assessment of feed material thickness to ensure uniformity control.Meanwhile,the visual recognition algorithm identifies flame images and assesses the combustion status and location of the combustion fire line within the furnace.This real-time understanding of furnace flame combustion conditions guides adjustments to the grate and air volume.Integrating AI technology into the waste incineration sector empowers the environmental protection industry with the potential to leverage big data.This development holds practical significance in optimizing the harmless and resource-oriented treatment of urban domestic waste,reducing operational costs,and increasing efficiency.
文摘Lip reading is typically regarded as visually interpreting the speaker’s lip movements during the speaking.This is a task of decoding the text from the speaker’s mouth movement.This paper proposes a lip-reading model that helps deaf people and persons with hearing problems to understand a speaker by capturing a video of the speaker and inputting it into the proposed model to obtain the corresponding subtitles.Using deep learning technologies makes it easier for users to extract a large number of different features,which can then be converted to probabilities of letters to obtain accurate results.Recently proposed methods for lip reading are based on sequence-to-sequence architectures that are designed for natural machine translation and audio speech recognition.However,in this paper,a deep convolutional neural network model called the hybrid lip-reading(HLR-Net)model is developed for lip reading from a video.The proposed model includes three stages,namely,preprocessing,encoder,and decoder stages,which produce the output subtitle.The inception,gradient,and bidirectional GRU layers are used to build the encoder,and the attention,fully-connected,activation function layers are used to build the decoder,which performs the connectionist temporal classification(CTC).In comparison with the three recent models,namely,the LipNet model,the lip-reading model with cascaded attention(LCANet),and attention-CTC(A-ACA)model,on the GRID corpus dataset,the proposed HLR-Net model can achieve significant improvements,achieving the CER of 4.9%,WER of 9.7%,and Bleu score of 92%in the case of unseen speakers,and the CER of 1.4%,WER of 3.3%,and Bleu score of 99%in the case of overlapped speakers.
基金National Natural Science Foundation of China (60675054)National High-Tech Research and Development Program (2006AA04Z228)"111" Project (B07018)
文摘This article investigates virtual reality (VR)-based teleoperation with robustness against modeling errors. VR technology is an effective way to overcome the large time delay during space robot teleoperation. However, it depends highly on the accuracy of model. Model errors between the virtual and real environment exist inevitably. The existing way to deal with the problem is by means of either model matching or robot compliance control. As distinct from the existing methods, this article tries to combine m...