Humans,as intricate beings driven by a multitude of emotions,possess a remarkable ability to decipher and respond to socio-affective cues.However,many individuals and machines struggle to interpret such nuanced signal...Humans,as intricate beings driven by a multitude of emotions,possess a remarkable ability to decipher and respond to socio-affective cues.However,many individuals and machines struggle to interpret such nuanced signals,including variations in tone of voice.This paper explores the potential of intelligent technologies to bridge this gap and improve the quality of conversations.In particular,the authors propose a real-time processing method that captures and evaluates emotions in speech,utilizing a terminal device like the Raspberry Pi computer.Furthermore,the authors provide an overview of the current research landscape surrounding speech emotional recognition and delve into our methodology,which involves analyzing audio files from renowned emotional speech databases.To aid incomprehension,the authors present visualizations of these audio files in situ,employing dB-scaled Mel spectrograms generated through TensorFlow and Matplotlib.The authors use a support vector machine kernel and a Convolutional Neural Network with transfer learning to classify emotions.Notably,the classification accuracies achieved are 70% and 77%,respectively,demonstrating the efficacy of our approach when executed on an edge device rather than relying on a server.The system can evaluate pure emotion in speech and provide corresponding visualizations to depict the speaker’s emotional state in less than one second on a Raspberry Pi.These findings pave the way for more effective and emotionally intelligent human-machine interactions in various domains.展开更多
Real-time detection of kiwifruits in natural environments is essential for automated kiwifruit harvesting. In this study, a lightweight convolutional neural network called the YOLOv4-GS algorithm was proposed for kiwi...Real-time detection of kiwifruits in natural environments is essential for automated kiwifruit harvesting. In this study, a lightweight convolutional neural network called the YOLOv4-GS algorithm was proposed for kiwifruit detection. The backbone network CSPDarknet-53 of YOLOv4 was replaced with GhostNet to improve accuracy and reduce network computation. To improve the detection accuracy of small targets, the upsampling of feature map fusion was performed for network layers 151 and 154, and the spatial pyramid pooling network was removed to reduce redundant computation. A total of 2766 kiwifruit images from different environments were used as the dataset for training and testing. The experiment results showed that the F1-score, average accuracy, and Intersection over Union (IoU) of YOLOv4-GS were 98.00%, 99.22%, and 88.92%, respectively. The average time taken to detect a 416×416 kiwifruit image was 11.95 ms, and the model’s weight was 28.8 MB. The average detection time of GhostNet was 31.44 ms less than that of CSPDarknet-53. In addition, the model weight of GhostNet was 227.2 MB less than that of CSPDarknet-53. YOLOv4-GS improved the detection accuracy by 8.39% over Faster R-CNN and 8.36% over SSD-300. The detection speed of YOLOv4-GS was 11.3 times and 2.6 times higher than Faster R-CNN and SSD-300, respectively. In the indoor picking experiment and the orchard picking experiment, the average speed of the YOLOv4-GS processing video was 28.4 fps. The recognition accuracy was above 90%. The average time spent for recognition and positioning was 6.09 s, accounting for about 29.03% of the total picking time. The overall results showed that the YOLOv4-GS proposed in this study can be applied for kiwifruit detection in natural environments because it improves the detection speed without compromising detection accuracy.展开更多
At present, almost all the systems and products for speech recognition are working in quiet environment and their performances are degraded or even can′t work when they are operated in high noisy environment. In this...At present, almost all the systems and products for speech recognition are working in quiet environment and their performances are degraded or even can′t work when they are operated in high noisy environment. In this paper, after analyzing the features of speech and noise, a speech enhancement method for LPC autoregressive model for command words recognition used in noisy environment is proposed, and an experimental system is realized. In different background noisy environments, we conduct experiments about SNR, basic accuracy, noise resistant ability and system environment adaptability with different microphones. The experimental results show that the system has good recognition performance in high noisy environments. The system can resist many kinds of noises and meet the needs of application areas on the whole such as military, traffic, marketplace and factory etc.展开更多
The development of scientific inquiry and research has yielded numerous benefits in the realm of intelligent traffic control systems, particularly in the realm of automatic license plate recognition for vehicles. The ...The development of scientific inquiry and research has yielded numerous benefits in the realm of intelligent traffic control systems, particularly in the realm of automatic license plate recognition for vehicles. The design of license plate recognition algorithms has undergone digitalization through the utilization of neural networks. In contemporary times, there is a growing demand for vehicle surveillance due to the need for efficient vehicle processing and traffic management. The design, development, and implementation of a license plate recognition system hold significant social, economic, and academic importance. The study aims to present contemporary methodologies and empirical findings pertaining to automated license plate recognition. The primary focus of the automatic license plate recognition algorithm was on image extraction, character segmentation, and recognition. The task of character segmentation has been identified as the most challenging function based on my observations. The license plate recognition project that we designed demonstrated the effectiveness of this method across various observed conditions. Particularly in low-light environments, such as during periods of limited illumination or inclement weather characterized by precipitation. The method has been subjected to testing using a sample size of fifty images, resulting in a 100% accuracy rate. The findings of this study demonstrate the project’s ability to effectively determine the optimal outcomes of simulations.展开更多
With the advancement of technology and the increase in user demands, gesture recognition played a pivotal role in the field of human-computer interaction. Among various sensing devices, Time-of-Flight (ToF) sensors we...With the advancement of technology and the increase in user demands, gesture recognition played a pivotal role in the field of human-computer interaction. Among various sensing devices, Time-of-Flight (ToF) sensors were widely applied due to their low cost. This paper explored the implementation of a human hand posture recognition system using ToF sensors and residual neural networks. Firstly, this paper reviewed the typical applications of human hand recognition. Secondly, this paper designed a hand gesture recognition system using a ToF sensor VL53L5. Subsequently, data preprocessing was conducted, followed by training the constructed residual neural network. Then, the recognition results were analyzed, indicating that gesture recognition based on the residual neural network achieved an accuracy of 98.5% in a 5-class classification scenario. Finally, the paper discussed existing issues and future research directions.展开更多
The near future has been envisioned as a collaboration of humans with mobile robots to help in the day-to-day tasks.In this paper,we present a viable approach for a real-time computer vision based object detection and...The near future has been envisioned as a collaboration of humans with mobile robots to help in the day-to-day tasks.In this paper,we present a viable approach for a real-time computer vision based object detection and recognition for efficient indoor navigation of a mobile robot.The mobile robotic systems are utilized mainly for home assistance,emergency services and surveillance,in which critical action needs to be taken within a fraction of second or real-time.The object detection and recognition is enhanced with utilization of the proposed algorithm based on the modification of You Look Only Once(YOLO)algorithm,with lesser computational requirements and relatively smaller weight size of the network structure.The proposed computer-vision based algorithm has been compared with the other conventional object detection/recognition algorithms,in terms of mean Average Precision(mAP)score,mean inference time,weight size and false positive percentage.The presented framework also makes use of the result of efficient object detection/recognition,to aid the mobile robot navigate in an indoor environment with the utilization of the results produced by the proposed algorithm.The presented framework can be further utilized for a wide variety of applications involving indoor navigation robots for different services.展开更多
Reflective real-time component model is a special component model, which can identify timing constraint characteristics of component and support dynamic design-time amendment of real-time component according to users...Reflective real-time component model is a special component model, which can identify timing constraint characteristics of component and support dynamic design-time amendment of real-time component according to users' requirements. The reflective real-time component runtime environment is a bearing space and reflective infrastructure for this special component model. It consists of three parts and manages the lifecycle and various relevant services of reflective real-time component. In this paper its mechanism and relevant key techniques in design and realization are formally specified with the communicating sequential processing (CSP) and the extended timed communicating sequential processing (TCSP). Finally a prototype is established. Experimental study shows that this runtime environment can introduce a relevant reflective infrastructure guaranteeing dynamic and real-time features of software component.展开更多
Distributed speech recognition (DSR) applications have certain QoS (Quality of service) requirements in terms of latency, packet loss rate, etc. To deliver quality guaranteed DSR application over wirelined or wireless...Distributed speech recognition (DSR) applications have certain QoS (Quality of service) requirements in terms of latency, packet loss rate, etc. To deliver quality guaranteed DSR application over wirelined or wireless links, some QoS mechanisms should be provided. We put forward a RTP/RSVP transmission scheme with DSR-specific payload and QoS parameters by modifying the present WAP protocol stack. The simulation result shows that this scheme will provide adequate network bandwidth to keep the real-time transport of DSR data over either wirelined or wireless channels.展开更多
Traffic sign recognition (TSR, or Road Sign Recognition, RSR) is one of the Advanced Driver Assistance System (ADAS) devices in modern cars. To concern the most important issues, which are real-time and resource effic...Traffic sign recognition (TSR, or Road Sign Recognition, RSR) is one of the Advanced Driver Assistance System (ADAS) devices in modern cars. To concern the most important issues, which are real-time and resource efficiency, we propose a high efficiency hardware implementation for TSR. We divide the TSR procedure into two stages, detection and recognition. In the detection stage, under the assumption that most German traffic signs have red or blue colors with circle, triangle or rectangle shapes, we use Normalized RGB color transform and Single-Pass Connected Component Labeling (CCL) to find the potential traffic signs efficiently. For Single-Pass CCL, our contribution is to eliminate the “merge-stack” operations by recording connected relations of region in the scan phase and updating the labels in the iterating phase. In the recognition stage, the Histogram of Oriented Gradient (HOG) is used to generate the descriptor of the signs, and we classify the signs with Support Vector Machine (SVM). In the HOG module, we analyze the required minimum bits under different recognition rate. The proposed method achieves 96.61% detection rate and 90.85% recognition rate while testing with the GTSDB dataset. Our hardware implementation reduces the storage of CCL and simplifies the HOG computation. Main CCL storage size is reduced by 20% comparing to the most advanced design under typical condition. By using TSMC 90 nm technology, the proposed design operates at 105 MHz clock rate and processes in 135 fps with the image size of 1360 × 800. The chip size is about 1 mm2 and the power consumption is close to 8 mW. Therefore, this work is resource efficient and achieves real-time requirement.展开更多
Seabed sediment recognition is vital for the exploitation of marine resources.Side-scan sonar(SSS)is an excellent tool for acquiring the imagery of seafloor topography.Combined with ocean surface sampling,it provides ...Seabed sediment recognition is vital for the exploitation of marine resources.Side-scan sonar(SSS)is an excellent tool for acquiring the imagery of seafloor topography.Combined with ocean surface sampling,it provides detailed and accurate images of marine substrate features.Most of the processing of SSS imagery works around limited sampling stations and requires manual interpretation to complete the classification of seabed sediment imagery.In complex sea areas,with manual interpretation,small targets are often lost due to a large amount of information.To date,studies related to the automatic recognition of seabed sediments are still few.This paper proposes a seabed sediment recognition method based on You Only Look Once version 5 and SSS imagery to perform real-time sedi-ment classification and localization for accuracy,particularly on small targets and faster speeds.We used methods such as changing the dataset size,epoch,and optimizer and adding multiscale training to overcome the challenges of having a small sample and a low accuracy.With these methods,we improved the results on mean average precision by 8.98%and F1 score by 11.12%compared with the original method.In addition,the detection speed was approximately 100 frames per second,which is faster than that of previous methods.This speed enabled us to achieve real-time seabed sediment recognition from SSS imagery.展开更多
The gender recognition problem has attracted the attention of the computer vision community due to its importance in many applications(e.g.,sur-veillance and human–computer interaction[HCI]).Images of varying levels ...The gender recognition problem has attracted the attention of the computer vision community due to its importance in many applications(e.g.,sur-veillance and human–computer interaction[HCI]).Images of varying levels of illumination,occlusion,and other factors are captured in uncontrolled environ-ments.Iris and facial recognition technology cannot be used on these images because iris texture is unclear in these instances,and faces may be covered by a scarf,hijab,or mask due to the COVID-19 pandemic.The periocular region is a reliable source of information because it features rich discriminative biometric features.However,most existing gender classification approaches have been designed based on hand-engineered features or validated in controlled environ-ments.Motivated by the superior performance of deep learning,we proposed a new method,PeriGender,inspired by the design principles of the ResNet and DenseNet models,that can classify gender using features from the periocular region.The proposed system utilizes a dense concept in a residual model.Through skip connections,it reuses features on different scales to strengthen dis-criminative features.Evaluations of the proposed system on challenging datasets indicated that it outperformed state-of-the-art methods.It achieved 87.37%,94.90%,94.14%,99.14%,and 95.17%accuracy on the GROUPS,UFPR-Periocular,Ethnic-Ocular,IMP,and UBIPr datasets,respectively,in the open-world(OW)protocol.It further achieved 97.57%and 93.20%accuracy for adult periocular images from the GROUPS dataset in the closed-world(CW)and OW protocols,respectively.The results showed that the middle region between the eyes plays a crucial role in the recognition of masculine features,and feminine features can be identified through the eyebrow,upper eyelids,and corners of the eyes.Furthermore,using a whole region without cropping enhances PeriGender’s learning capability,improving its understanding of both eyes’global structure without discontinuity.展开更多
To realize the visual navigation of agricultural robots in the complex environment of orchards,this study proposed a method for fruit tree recognition and navigation based on YOLOv5.The YOLOv5s model was selected and ...To realize the visual navigation of agricultural robots in the complex environment of orchards,this study proposed a method for fruit tree recognition and navigation based on YOLOv5.The YOLOv5s model was selected and trained to identify the trunks of the left and right rows of fruit trees;the quadratic curve was fitted to the bottom center of the fruit tree recognition box,and the identified fruit trees were divided into left and right columns by using the extreme value point of the quadratic curve to obtain the left and right rows of fruit trees;the straight-line equation of the left and right fruit tree rows was further solved,the median line of the two straight lines was taken as the expected navigation path of the robot,and the path tracing navigation experiment was carried out by using the improved LQR control algorithm.The experimental results show that under the guidance of the machine vision system and guided by the improved LQR control algorithm,the lateral error and heading error can converge quickly to the desired navigation path in the four initial states of[0 m,−0.34 rad],[0.10 m,0.34 rad],[0.15 m,0 rad]and[0.20 m,−0.34 rad].When the initial speed was 0.5 m/s,the average lateral error was 0.059 m and the average heading error was 0.2787 rad for the navigation trials in the four different initial states.Its average driving was 5.3 m into the steady state,the average value of steady state lateral error was 0.0102 m,the average value of steady state heading error was 0.0253 rad,and the average relative error of the robot driving along the desired navigation path was 4.6%.The results indicate that the navigation algorithm proposed in this study has good robustness,meets the operational requirements of robot autonomous navigation in orchard environment,and improves the reliability of robot driving in orchard.展开更多
As a new technical means that can detect abnormal signs of water inrush in advance and give an early warning,the automatic monitoring and early warning of water inrush in mines has been widely valued in recent years.D...As a new technical means that can detect abnormal signs of water inrush in advance and give an early warning,the automatic monitoring and early warning of water inrush in mines has been widely valued in recent years.Due to the many factors affecting water inrush and the complicated water inrush mechanism,many factors close to water inrush may have precursory abnormal changes.At present,the existing monitoring and early warning system mainly uses a few monitoring indicators such as groundwater level,water influx,and temperature,and performs water inrush early warning through the abnormal change of a single factor.However,there are relatively few multi-factor comprehensive early warning identification models.Based on the analysis of the abnormal changes of precursor factors in multiple water inrush cases,11 measurable and effective indicators including groundwater flow field,hydrochemical field and temperature field are proposed.Finally,taking Hengyuan coal mine as an example,6 indicators with long-term monitoring data sequences were selected to establish a single-index hierarchical early-warning recognition model,a multi-factor linear recognition model,and a comprehensive intelligent early-warning recognition model.The results show that the correct rate of early warning can reach 95.2%.展开更多
Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbase...Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbased gesture recognition due to its various applications.This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network(3D-CNN)and a Long Short-Term Memory(LSTM)network.The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation.The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out.The proposed model is a light-weight architecture with only 3.7 million training parameters.The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly.The model was trained on 2000 video-clips per class which were separated into 80%training and 20%validation sets.An accuracy of 99%and 97%was achieved on training and testing data,respectively.We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2+LSTM.展开更多
As multimedia data sharing increases,data security in mobile devices and its mechanism can be seen as critical.Biometrics combines the physiological and behavioral qualities of an individual to validate their characte...As multimedia data sharing increases,data security in mobile devices and its mechanism can be seen as critical.Biometrics combines the physiological and behavioral qualities of an individual to validate their character in real-time.Humans incorporate physiological attributes like a fingerprint,face,iris,palm print,finger knuckle print,Deoxyribonucleic Acid(DNA),and behavioral qualities like walk,voice,mark,or keystroke.The main goal of this paper is to design a robust framework for automatic face recognition.Scale Invariant Feature Transform(SIFT)and Speeded-up Robust Features(SURF)are employed for face recognition.Also,we propose a modified Gabor Wavelet Transform for SIFT/SURF(GWT-SIFT/GWT-SURF)to increase the recognition accuracy of human faces.The proposed scheme is composed of three steps.First,the entropy of the image is removed using Discrete Wavelet Transform(DWT).Second,the computational complexity of the SIFT/SURF is reduced.Third,the accuracy is increased for authentication by the proposed GWT-SIFT/GWT-SURF algorithm.A comparative analysis of the proposed scheme is done on real-time Olivetti Research Laboratory(ORL)and Poznan University of Technology(PUT)databases.When compared to the traditional SIFT/SURF methods,we verify that the GWT-SIFT achieves the better accuracy of 99.32%and the better approach is the GWT-SURF as the run time of the GWT-SURF for 100 images is 3.4 seconds when compared to the GWT-SIFT which has a run time of 4.9 seconds for 100 images.展开更多
To solve the problem of stray interference to star point target identification while a star sensor imaging to the sky, a study on space luminous environment adaptability of missile-borne star sensor was carried out. B...To solve the problem of stray interference to star point target identification while a star sensor imaging to the sky, a study on space luminous environment adaptability of missile-borne star sensor was carried out. By Plank blackbody radiation law and some astronomic knowledge, irradiancies of the stray at the star sensor working height were estimated. By relative astrophysical and mathematics knowledge, included angles between the star sensor optical axis point and the stray at any moment were calculated. The calculation correctness was verified with the star map software of Stellarium. By combining the upper analysis with the baffle suppression effect, a real-time model for space luminous environment of missile-borne star sensor was proposed. By signal-noise rate (SNR) criterion, the adaptability of missile-borne star sensor to space luminous environment was studied. As an example, a certain type of star sensor was considered when imaging to the starry sky on June 22, 2011 (the Summer Solstice) and September 20, 2011 (August 23 of the lunar year, last quarter moon) in Beijing. The space luminous environment and the adaptability to it were simulated and analyzed at the star sensor working height. In each period of time, the stray suppression of the baffle is analyzed by comparing the calculated included angle between the star sensor optical axis point and the stray with the shielded provided by system index. When the included angle is larger than the shielded angle and less than 90~, the stray is restrained by the baffle. The stray effect on star point target identification is analyzed by comparing the irradiancy of 6 magnitude star with that of the stray on star sensor sensitization surface. When the irradiancy of 6 magnitude star is 5 times more than that of the stray, there is no effect on the star point target identification. The simulation results are identicat with the actual situation. The space luminous environment of the missile-borne star sensor can be estimated real-timely by this model. The adaptability of the star sensor to space luminous environment can be analyzed conveniently. A basis for determining the relative star sensor indexes, the navigation star chosen strategy and the missile launch window can be provided.展开更多
This paper translates the recognifion of curvatare radius in robot moving in bent pipe into an issue of shape-from-shading, and introduces genetic algorithms into the optimizaton process to improve the efficiency of o...This paper translates the recognifion of curvatare radius in robot moving in bent pipe into an issue of shape-from-shading, and introduces genetic algorithms into the optimizaton process to improve the efficiency of optimization.Experiments prove that thes method can satisfy the autonomous control requrement for robot moving in bent pipe in both speed and accuray.展开更多
In this paper, we investigate the influences of network delay on QoE (Quality of Experience) such as the operability of haptic interface device and the fairness between players for soft objects in a networked real-tim...In this paper, we investigate the influences of network delay on QoE (Quality of Experience) such as the operability of haptic interface device and the fairness between players for soft objects in a networked real-time game subjectively and objectively. We handle a networked balloon bursting game in which two players burst balloons (i.e., soft objects) in a 3D virtual space by using haptic interface devices, and the players compete for the number of burst balloons. As a result, we find that the operability depends on the network delay from the local terminal to the other terminal, and the fairness is mainly dependent on the difference in network delay between the players’ terminals. We confirm that there exists a trade-off relationship between the operability and the fairness. We also see that the contribution of the fairness is larger than that of the operability to the comprehensive quality (i.e., the weighted sum of the operability and fairness). Assessment results further show that the output timing of terminals should be adjusted to the terminal which has the latest output timing to maintain the fairness when the difference in network delay between the terminals is large. In this way, the comprehensive quality at each terminal can be maintained as high as possible.展开更多
This Automatic Speech Recognition (ASR) is the process which converts an acoustic signal captured by the microphone to written text. The motivation of the paper is to create a speech based Integrated Development Envir...This Automatic Speech Recognition (ASR) is the process which converts an acoustic signal captured by the microphone to written text. The motivation of the paper is to create a speech based Integrated Development Environment (IDE) for C program. This paper proposes a technique to facilitate the visually impaired people or the person with arm injuries with excellent programming skills that can code the C program through voice input. The proposed system accepts the C program as voice input and produces compiled C program as output. The user should utter each line of the C program through voice input. First the voice input is recognized as text. The recognized text will be converted into C program by using syntactic constructs of the C language. After conversion, C program will be fetched as input to the IDE. Furthermore, the IDE commands like open, save, close, compile, run are also given through voice input only. If any error occurs during the compilation process, the error is corrected through voice input only. The errors can be corrected by specifying the line number through voice input. Performance of the speech recognition system is analyzed by varying the vocabulary size as well as number of mixture components in HMM.展开更多
文摘Humans,as intricate beings driven by a multitude of emotions,possess a remarkable ability to decipher and respond to socio-affective cues.However,many individuals and machines struggle to interpret such nuanced signals,including variations in tone of voice.This paper explores the potential of intelligent technologies to bridge this gap and improve the quality of conversations.In particular,the authors propose a real-time processing method that captures and evaluates emotions in speech,utilizing a terminal device like the Raspberry Pi computer.Furthermore,the authors provide an overview of the current research landscape surrounding speech emotional recognition and delve into our methodology,which involves analyzing audio files from renowned emotional speech databases.To aid incomprehension,the authors present visualizations of these audio files in situ,employing dB-scaled Mel spectrograms generated through TensorFlow and Matplotlib.The authors use a support vector machine kernel and a Convolutional Neural Network with transfer learning to classify emotions.Notably,the classification accuracies achieved are 70% and 77%,respectively,demonstrating the efficacy of our approach when executed on an edge device rather than relying on a server.The system can evaluate pure emotion in speech and provide corresponding visualizations to depict the speaker’s emotional state in less than one second on a Raspberry Pi.These findings pave the way for more effective and emotionally intelligent human-machine interactions in various domains.
基金funded by the Jiangsu Province Agricultural Science and Technology Independent Innovation Project (CX(22)3099)the Emergency Science and Technology Project of National Forestry and Grassland Administration (202202-3)+2 种基金the Key R&D Program of Jiangsu Modern Agricultural Machinery Equipment and Technology Promotion Project (Grant NJ2021-18)the Key R&D plan of Jiangsu Province (Grant BE2021016-2)the 2021 Self-made Experimental Teaching Instrument Project of Nanjing Forestry University (Grant nlzzyq202406).
文摘Real-time detection of kiwifruits in natural environments is essential for automated kiwifruit harvesting. In this study, a lightweight convolutional neural network called the YOLOv4-GS algorithm was proposed for kiwifruit detection. The backbone network CSPDarknet-53 of YOLOv4 was replaced with GhostNet to improve accuracy and reduce network computation. To improve the detection accuracy of small targets, the upsampling of feature map fusion was performed for network layers 151 and 154, and the spatial pyramid pooling network was removed to reduce redundant computation. A total of 2766 kiwifruit images from different environments were used as the dataset for training and testing. The experiment results showed that the F1-score, average accuracy, and Intersection over Union (IoU) of YOLOv4-GS were 98.00%, 99.22%, and 88.92%, respectively. The average time taken to detect a 416×416 kiwifruit image was 11.95 ms, and the model’s weight was 28.8 MB. The average detection time of GhostNet was 31.44 ms less than that of CSPDarknet-53. In addition, the model weight of GhostNet was 227.2 MB less than that of CSPDarknet-53. YOLOv4-GS improved the detection accuracy by 8.39% over Faster R-CNN and 8.36% over SSD-300. The detection speed of YOLOv4-GS was 11.3 times and 2.6 times higher than Faster R-CNN and SSD-300, respectively. In the indoor picking experiment and the orchard picking experiment, the average speed of the YOLOv4-GS processing video was 28.4 fps. The recognition accuracy was above 90%. The average time spent for recognition and positioning was 6.09 s, accounting for about 29.03% of the total picking time. The overall results showed that the YOLOv4-GS proposed in this study can be applied for kiwifruit detection in natural environments because it improves the detection speed without compromising detection accuracy.
文摘At present, almost all the systems and products for speech recognition are working in quiet environment and their performances are degraded or even can′t work when they are operated in high noisy environment. In this paper, after analyzing the features of speech and noise, a speech enhancement method for LPC autoregressive model for command words recognition used in noisy environment is proposed, and an experimental system is realized. In different background noisy environments, we conduct experiments about SNR, basic accuracy, noise resistant ability and system environment adaptability with different microphones. The experimental results show that the system has good recognition performance in high noisy environments. The system can resist many kinds of noises and meet the needs of application areas on the whole such as military, traffic, marketplace and factory etc.
文摘The development of scientific inquiry and research has yielded numerous benefits in the realm of intelligent traffic control systems, particularly in the realm of automatic license plate recognition for vehicles. The design of license plate recognition algorithms has undergone digitalization through the utilization of neural networks. In contemporary times, there is a growing demand for vehicle surveillance due to the need for efficient vehicle processing and traffic management. The design, development, and implementation of a license plate recognition system hold significant social, economic, and academic importance. The study aims to present contemporary methodologies and empirical findings pertaining to automated license plate recognition. The primary focus of the automatic license plate recognition algorithm was on image extraction, character segmentation, and recognition. The task of character segmentation has been identified as the most challenging function based on my observations. The license plate recognition project that we designed demonstrated the effectiveness of this method across various observed conditions. Particularly in low-light environments, such as during periods of limited illumination or inclement weather characterized by precipitation. The method has been subjected to testing using a sample size of fifty images, resulting in a 100% accuracy rate. The findings of this study demonstrate the project’s ability to effectively determine the optimal outcomes of simulations.
文摘With the advancement of technology and the increase in user demands, gesture recognition played a pivotal role in the field of human-computer interaction. Among various sensing devices, Time-of-Flight (ToF) sensors were widely applied due to their low cost. This paper explored the implementation of a human hand posture recognition system using ToF sensors and residual neural networks. Firstly, this paper reviewed the typical applications of human hand recognition. Secondly, this paper designed a hand gesture recognition system using a ToF sensor VL53L5. Subsequently, data preprocessing was conducted, followed by training the constructed residual neural network. Then, the recognition results were analyzed, indicating that gesture recognition based on the residual neural network achieved an accuracy of 98.5% in a 5-class classification scenario. Finally, the paper discussed existing issues and future research directions.
文摘The near future has been envisioned as a collaboration of humans with mobile robots to help in the day-to-day tasks.In this paper,we present a viable approach for a real-time computer vision based object detection and recognition for efficient indoor navigation of a mobile robot.The mobile robotic systems are utilized mainly for home assistance,emergency services and surveillance,in which critical action needs to be taken within a fraction of second or real-time.The object detection and recognition is enhanced with utilization of the proposed algorithm based on the modification of You Look Only Once(YOLO)algorithm,with lesser computational requirements and relatively smaller weight size of the network structure.The proposed computer-vision based algorithm has been compared with the other conventional object detection/recognition algorithms,in terms of mean Average Precision(mAP)score,mean inference time,weight size and false positive percentage.The presented framework also makes use of the result of efficient object detection/recognition,to aid the mobile robot navigate in an indoor environment with the utilization of the results produced by the proposed algorithm.The presented framework can be further utilized for a wide variety of applications involving indoor navigation robots for different services.
基金the National Defence Foundation of China(Grant No.10104010201)
文摘Reflective real-time component model is a special component model, which can identify timing constraint characteristics of component and support dynamic design-time amendment of real-time component according to users' requirements. The reflective real-time component runtime environment is a bearing space and reflective infrastructure for this special component model. It consists of three parts and manages the lifecycle and various relevant services of reflective real-time component. In this paper its mechanism and relevant key techniques in design and realization are formally specified with the communicating sequential processing (CSP) and the extended timed communicating sequential processing (TCSP). Finally a prototype is established. Experimental study shows that this runtime environment can introduce a relevant reflective infrastructure guaranteeing dynamic and real-time features of software component.
文摘Distributed speech recognition (DSR) applications have certain QoS (Quality of service) requirements in terms of latency, packet loss rate, etc. To deliver quality guaranteed DSR application over wirelined or wireless links, some QoS mechanisms should be provided. We put forward a RTP/RSVP transmission scheme with DSR-specific payload and QoS parameters by modifying the present WAP protocol stack. The simulation result shows that this scheme will provide adequate network bandwidth to keep the real-time transport of DSR data over either wirelined or wireless channels.
文摘Traffic sign recognition (TSR, or Road Sign Recognition, RSR) is one of the Advanced Driver Assistance System (ADAS) devices in modern cars. To concern the most important issues, which are real-time and resource efficiency, we propose a high efficiency hardware implementation for TSR. We divide the TSR procedure into two stages, detection and recognition. In the detection stage, under the assumption that most German traffic signs have red or blue colors with circle, triangle or rectangle shapes, we use Normalized RGB color transform and Single-Pass Connected Component Labeling (CCL) to find the potential traffic signs efficiently. For Single-Pass CCL, our contribution is to eliminate the “merge-stack” operations by recording connected relations of region in the scan phase and updating the labels in the iterating phase. In the recognition stage, the Histogram of Oriented Gradient (HOG) is used to generate the descriptor of the signs, and we classify the signs with Support Vector Machine (SVM). In the HOG module, we analyze the required minimum bits under different recognition rate. The proposed method achieves 96.61% detection rate and 90.85% recognition rate while testing with the GTSDB dataset. Our hardware implementation reduces the storage of CCL and simplifies the HOG computation. Main CCL storage size is reduced by 20% comparing to the most advanced design under typical condition. By using TSMC 90 nm technology, the proposed design operates at 105 MHz clock rate and processes in 135 fps with the image size of 1360 × 800. The chip size is about 1 mm2 and the power consumption is close to 8 mW. Therefore, this work is resource efficient and achieves real-time requirement.
基金funded by the Natural Science Foundation of Fujian Province(No.2018J01063)the Project of Deep Learning Based Underwater Cultural Relics Recognization(No.38360041)the Project of the State Administration of Cultural Relics(No.2018300).
文摘Seabed sediment recognition is vital for the exploitation of marine resources.Side-scan sonar(SSS)is an excellent tool for acquiring the imagery of seafloor topography.Combined with ocean surface sampling,it provides detailed and accurate images of marine substrate features.Most of the processing of SSS imagery works around limited sampling stations and requires manual interpretation to complete the classification of seabed sediment imagery.In complex sea areas,with manual interpretation,small targets are often lost due to a large amount of information.To date,studies related to the automatic recognition of seabed sediments are still few.This paper proposes a seabed sediment recognition method based on You Only Look Once version 5 and SSS imagery to perform real-time sedi-ment classification and localization for accuracy,particularly on small targets and faster speeds.We used methods such as changing the dataset size,epoch,and optimizer and adding multiscale training to overcome the challenges of having a small sample and a low accuracy.With these methods,we improved the results on mean average precision by 8.98%and F1 score by 11.12%compared with the original method.In addition,the detection speed was approximately 100 frames per second,which is faster than that of previous methods.This speed enabled us to achieve real-time seabed sediment recognition from SSS imagery.
基金The authors are thankful to the Deanship of Scientific Research,King Saud University,Riyadh,Saudi Arabia for funding this work through the Research Group No.RGP-1439-067.
文摘The gender recognition problem has attracted the attention of the computer vision community due to its importance in many applications(e.g.,sur-veillance and human–computer interaction[HCI]).Images of varying levels of illumination,occlusion,and other factors are captured in uncontrolled environ-ments.Iris and facial recognition technology cannot be used on these images because iris texture is unclear in these instances,and faces may be covered by a scarf,hijab,or mask due to the COVID-19 pandemic.The periocular region is a reliable source of information because it features rich discriminative biometric features.However,most existing gender classification approaches have been designed based on hand-engineered features or validated in controlled environ-ments.Motivated by the superior performance of deep learning,we proposed a new method,PeriGender,inspired by the design principles of the ResNet and DenseNet models,that can classify gender using features from the periocular region.The proposed system utilizes a dense concept in a residual model.Through skip connections,it reuses features on different scales to strengthen dis-criminative features.Evaluations of the proposed system on challenging datasets indicated that it outperformed state-of-the-art methods.It achieved 87.37%,94.90%,94.14%,99.14%,and 95.17%accuracy on the GROUPS,UFPR-Periocular,Ethnic-Ocular,IMP,and UBIPr datasets,respectively,in the open-world(OW)protocol.It further achieved 97.57%and 93.20%accuracy for adult periocular images from the GROUPS dataset in the closed-world(CW)and OW protocols,respectively.The results showed that the middle region between the eyes plays a crucial role in the recognition of masculine features,and feminine features can be identified through the eyebrow,upper eyelids,and corners of the eyes.Furthermore,using a whole region without cropping enhances PeriGender’s learning capability,improving its understanding of both eyes’global structure without discontinuity.
基金funded by the National Key Research and Development Program of China Project(Grant No.2021YFD2000700)the National Natural Science Funds for Young Scholars of China(Grant No.51905154)the Luoyang Public Welfare Special Project(Grant No.2302031A).
文摘To realize the visual navigation of agricultural robots in the complex environment of orchards,this study proposed a method for fruit tree recognition and navigation based on YOLOv5.The YOLOv5s model was selected and trained to identify the trunks of the left and right rows of fruit trees;the quadratic curve was fitted to the bottom center of the fruit tree recognition box,and the identified fruit trees were divided into left and right columns by using the extreme value point of the quadratic curve to obtain the left and right rows of fruit trees;the straight-line equation of the left and right fruit tree rows was further solved,the median line of the two straight lines was taken as the expected navigation path of the robot,and the path tracing navigation experiment was carried out by using the improved LQR control algorithm.The experimental results show that under the guidance of the machine vision system and guided by the improved LQR control algorithm,the lateral error and heading error can converge quickly to the desired navigation path in the four initial states of[0 m,−0.34 rad],[0.10 m,0.34 rad],[0.15 m,0 rad]and[0.20 m,−0.34 rad].When the initial speed was 0.5 m/s,the average lateral error was 0.059 m and the average heading error was 0.2787 rad for the navigation trials in the four different initial states.Its average driving was 5.3 m into the steady state,the average value of steady state lateral error was 0.0102 m,the average value of steady state heading error was 0.0253 rad,and the average relative error of the robot driving along the desired navigation path was 4.6%.The results indicate that the navigation algorithm proposed in this study has good robustness,meets the operational requirements of robot autonomous navigation in orchard environment,and improves the reliability of robot driving in orchard.
基金financially supported by the National Key Research and Development Program of China(No.2019YFC1805400)。
文摘As a new technical means that can detect abnormal signs of water inrush in advance and give an early warning,the automatic monitoring and early warning of water inrush in mines has been widely valued in recent years.Due to the many factors affecting water inrush and the complicated water inrush mechanism,many factors close to water inrush may have precursory abnormal changes.At present,the existing monitoring and early warning system mainly uses a few monitoring indicators such as groundwater level,water influx,and temperature,and performs water inrush early warning through the abnormal change of a single factor.However,there are relatively few multi-factor comprehensive early warning identification models.Based on the analysis of the abnormal changes of precursor factors in multiple water inrush cases,11 measurable and effective indicators including groundwater flow field,hydrochemical field and temperature field are proposed.Finally,taking Hengyuan coal mine as an example,6 indicators with long-term monitoring data sequences were selected to establish a single-index hierarchical early-warning recognition model,a multi-factor linear recognition model,and a comprehensive intelligent early-warning recognition model.The results show that the correct rate of early warning can reach 95.2%.
文摘Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbased gesture recognition due to its various applications.This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network(3D-CNN)and a Long Short-Term Memory(LSTM)network.The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation.The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out.The proposed model is a light-weight architecture with only 3.7 million training parameters.The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly.The model was trained on 2000 video-clips per class which were separated into 80%training and 20%validation sets.An accuracy of 99%and 97%was achieved on training and testing data,respectively.We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2+LSTM.
文摘As multimedia data sharing increases,data security in mobile devices and its mechanism can be seen as critical.Biometrics combines the physiological and behavioral qualities of an individual to validate their character in real-time.Humans incorporate physiological attributes like a fingerprint,face,iris,palm print,finger knuckle print,Deoxyribonucleic Acid(DNA),and behavioral qualities like walk,voice,mark,or keystroke.The main goal of this paper is to design a robust framework for automatic face recognition.Scale Invariant Feature Transform(SIFT)and Speeded-up Robust Features(SURF)are employed for face recognition.Also,we propose a modified Gabor Wavelet Transform for SIFT/SURF(GWT-SIFT/GWT-SURF)to increase the recognition accuracy of human faces.The proposed scheme is composed of three steps.First,the entropy of the image is removed using Discrete Wavelet Transform(DWT).Second,the computational complexity of the SIFT/SURF is reduced.Third,the accuracy is increased for authentication by the proposed GWT-SIFT/GWT-SURF algorithm.A comparative analysis of the proposed scheme is done on real-time Olivetti Research Laboratory(ORL)and Poznan University of Technology(PUT)databases.When compared to the traditional SIFT/SURF methods,we verify that the GWT-SIFT achieves the better accuracy of 99.32%and the better approach is the GWT-SURF as the run time of the GWT-SURF for 100 images is 3.4 seconds when compared to the GWT-SIFT which has a run time of 4.9 seconds for 100 images.
文摘To solve the problem of stray interference to star point target identification while a star sensor imaging to the sky, a study on space luminous environment adaptability of missile-borne star sensor was carried out. By Plank blackbody radiation law and some astronomic knowledge, irradiancies of the stray at the star sensor working height were estimated. By relative astrophysical and mathematics knowledge, included angles between the star sensor optical axis point and the stray at any moment were calculated. The calculation correctness was verified with the star map software of Stellarium. By combining the upper analysis with the baffle suppression effect, a real-time model for space luminous environment of missile-borne star sensor was proposed. By signal-noise rate (SNR) criterion, the adaptability of missile-borne star sensor to space luminous environment was studied. As an example, a certain type of star sensor was considered when imaging to the starry sky on June 22, 2011 (the Summer Solstice) and September 20, 2011 (August 23 of the lunar year, last quarter moon) in Beijing. The space luminous environment and the adaptability to it were simulated and analyzed at the star sensor working height. In each period of time, the stray suppression of the baffle is analyzed by comparing the calculated included angle between the star sensor optical axis point and the stray with the shielded provided by system index. When the included angle is larger than the shielded angle and less than 90~, the stray is restrained by the baffle. The stray effect on star point target identification is analyzed by comparing the irradiancy of 6 magnitude star with that of the stray on star sensor sensitization surface. When the irradiancy of 6 magnitude star is 5 times more than that of the stray, there is no effect on the star point target identification. The simulation results are identicat with the actual situation. The space luminous environment of the missile-borne star sensor can be estimated real-timely by this model. The adaptability of the star sensor to space luminous environment can be analyzed conveniently. A basis for determining the relative star sensor indexes, the navigation star chosen strategy and the missile launch window can be provided.
文摘This paper translates the recognifion of curvatare radius in robot moving in bent pipe into an issue of shape-from-shading, and introduces genetic algorithms into the optimizaton process to improve the efficiency of optimization.Experiments prove that thes method can satisfy the autonomous control requrement for robot moving in bent pipe in both speed and accuray.
文摘In this paper, we investigate the influences of network delay on QoE (Quality of Experience) such as the operability of haptic interface device and the fairness between players for soft objects in a networked real-time game subjectively and objectively. We handle a networked balloon bursting game in which two players burst balloons (i.e., soft objects) in a 3D virtual space by using haptic interface devices, and the players compete for the number of burst balloons. As a result, we find that the operability depends on the network delay from the local terminal to the other terminal, and the fairness is mainly dependent on the difference in network delay between the players’ terminals. We confirm that there exists a trade-off relationship between the operability and the fairness. We also see that the contribution of the fairness is larger than that of the operability to the comprehensive quality (i.e., the weighted sum of the operability and fairness). Assessment results further show that the output timing of terminals should be adjusted to the terminal which has the latest output timing to maintain the fairness when the difference in network delay between the terminals is large. In this way, the comprehensive quality at each terminal can be maintained as high as possible.
文摘This Automatic Speech Recognition (ASR) is the process which converts an acoustic signal captured by the microphone to written text. The motivation of the paper is to create a speech based Integrated Development Environment (IDE) for C program. This paper proposes a technique to facilitate the visually impaired people or the person with arm injuries with excellent programming skills that can code the C program through voice input. The proposed system accepts the C program as voice input and produces compiled C program as output. The user should utter each line of the C program through voice input. First the voice input is recognized as text. The recognized text will be converted into C program by using syntactic constructs of the C language. After conversion, C program will be fetched as input to the IDE. Furthermore, the IDE commands like open, save, close, compile, run are also given through voice input only. If any error occurs during the compilation process, the error is corrected through voice input only. The errors can be corrected by specifying the line number through voice input. Performance of the speech recognition system is analyzed by varying the vocabulary size as well as number of mixture components in HMM.