This paper focuses on the task of few-shot 3D point cloud semantic segmentation.Despite some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation an...This paper focuses on the task of few-shot 3D point cloud semantic segmentation.Despite some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation and inaccurate semantic discrimination.To tackle these issues,we first leverage part-whole relationships into the task of 3D point cloud semantic segmentation to capture semantic integrity,which is empowered by the dynamic capsule routing with the module of 3D Capsule Networks(CapsNets)in the embedding network.Concretely,the dynamic routing amalgamates geometric information of the 3D point cloud data to construct higher-level feature representations,which capture the relationships between object parts and their wholes.Secondly,we designed a multi-prototype enhancement module to enhance the prototype discriminability.Specifically,the single-prototype enhancement mechanism is expanded to the multi-prototype enhancement version for capturing rich semantics.Besides,the shot-correlation within the category is calculated via the interaction of different samples to enhance the intra-category similarity.Ablation studies prove that the involved part-whole relations and proposed multi-prototype enhancement module help to achieve complete object segmentation and improve semantic discrimination.Moreover,under the integration of these two modules,quantitative and qualitative experiments on two public benchmarks,including S3DIS and ScanNet,indicate the superior performance of the proposed framework on the task of 3D point cloud semantic segmentation,compared to some state-of-the-art methods.展开更多
Face recognition (FR) technology has numerous applications in artificial intelligence including biometrics, security,authentication, law enforcement, and surveillance. Deep learning (DL) models, notably convolutional ...Face recognition (FR) technology has numerous applications in artificial intelligence including biometrics, security,authentication, law enforcement, and surveillance. Deep learning (DL) models, notably convolutional neuralnetworks (CNNs), have shown promising results in the field of FR. However CNNs are easily fooled since theydo not encode position and orientation correlations between features. Hinton et al. envisioned Capsule Networksas a more robust design capable of retaining pose information and spatial correlations to recognize objects morelike the brain does. Lower-level capsules hold 8-dimensional vectors of attributes like position, hue, texture, andso on, which are routed to higher-level capsules via a new routing by agreement algorithm. This provides capsulenetworks with viewpoint invariance, which has previously evaded CNNs. This research presents a FR model basedon capsule networks that was tested using the LFW dataset, COMSATS face dataset, and own acquired photos usingcameras measuring 128 × 128 pixels, 40 × 40 pixels, and 30 × 30 pixels. The trained model outperforms state-ofthe-art algorithms, achieving 95.82% test accuracy and performing well on unseen faces that have been blurred orrotated. Additionally, the suggested model outperformed the recently released approaches on the COMSATS facedataset, achieving a high accuracy of 92.47%. Based on the results of this research as well as previous results, capsulenetworks perform better than deeper CNNs on unobserved altered data because of their special equivarianceproperties.展开更多
The current advancement in cloud computing,Artificial Intelligence(AI),and the Internet of Things(IoT)transformed the traditional healthcare system into smart healthcare.Healthcare services could be enhanced by incorp...The current advancement in cloud computing,Artificial Intelligence(AI),and the Internet of Things(IoT)transformed the traditional healthcare system into smart healthcare.Healthcare services could be enhanced by incorporating key techniques like AI and IoT.The convergence of AI and IoT provides distinct opportunities in the medical field.Fall is regarded as a primary cause of death or post-traumatic complication for the ageing population.Therefore,earlier detection of older person falls in smart homes is required to improve the survival rate of an individual or provide the necessary support.Lately,the emergence of IoT,AI,smartphones,wearables,and so on making it possible to design fall detection(FD)systems for smart home care.This article introduces a new Teamwork Optimization with Deep Learning based Fall Detection for IoT Enabled Smart Healthcare Systems(TWODLFDSHS).The TWODL-FDSHS technique’s goal is to detect fall events for a smart healthcare system.Initially,the presented TWODL-FDSHS technique exploits IoT devices for the data collection process.Next,the TWODLFDSHS technique applies the TWO with Capsule Network(CapsNet)model for feature extraction.At last,a deep random vector functional link network(DRVFLN)with an Adam optimizer is exploited for fall event detection.A wide range of simulations took place to exhibit the enhanced performance of the presentedTWODL-FDSHS technique.The experimental outcomes stated the enhancements of the TWODL-FDSHS method over other models with increased accuracy of 98.30%on the URFD dataset.展开更多
Rapid development of deepfake technology led to the spread of forged audios and videos across network platforms,presenting risks for numerous countries,societies,and individuals,and posing a serious threat to cyberspa...Rapid development of deepfake technology led to the spread of forged audios and videos across network platforms,presenting risks for numerous countries,societies,and individuals,and posing a serious threat to cyberspace security.To address the problem of insufficient extraction of spatial features and the fact that temporal features are not considered in the deepfake video detection,we propose a detection method based on improved CapsNet and temporal–spatial features(iCapsNet–TSF).First,the dynamic routing algorithm of CapsNet is improved using weight initialization and updating.Then,the optical flow algorithm is used to extract interframe temporal features of the videos to form a dataset of temporal–spatial features.Finally,the iCapsNet model is employed to fully learn the temporal–spatial features of facial videos,and the results are fused.Experimental results show that the detection accuracy of iCapsNet–TSF reaches 94.07%,98.83%,and 98.50%on the Celeb-DF,FaceSwap,and Deepfakes datasets,respectively,displaying a better performance than most existing mainstream algorithms.The iCapsNet–TSF method combines the capsule network and the optical flow algorithm,providing a novel strategy for the deepfake detection,which is of great significance to the prevention of deepfake attacks and the preservation of cyberspace security.展开更多
For human-machine communication to be as effective as human-tohuman communication, research on speech emotion recognition is essential.Among the models and the classifiers used to recognize emotions, neural networks...For human-machine communication to be as effective as human-tohuman communication, research on speech emotion recognition is essential.Among the models and the classifiers used to recognize emotions, neural networks appear to be promising due to the network’s ability to learn and the diversity in configuration. Following the convolutional neural network, a capsuleneural network (CapsNet) with inputs and outputs that are not scalar quantitiesbut vectors allows the network to determine the part-whole relationships thatare specific 6 for an object. This paper performs speech emotion recognition basedon CapsNet. The corpora for speech emotion recognition have been augmented byadding white noise and changing voices. The feature parameters of the recognition system input are mel spectrum images along with the characteristics of thesound source, vocal tract and prosody. For the German emotional corpus EMODB, the average accuracy score for 4 emotions, neutral, boredom, anger and happiness, is 99.69%. For Vietnamese emotional corpus BKEmo, this score is94.23% for 4 emotions, neutral, sadness, anger and happiness. The accuracy scoreis highest when combining all the above feature parameters, and this scoreincreases significantly when combining mel spectrum images with the featuresdirectly related to the fundamental frequency.展开更多
基金This work is supported by the National Natural Science Foundation of China under Grant No.62001341the National Natural Science Foundation of Jiangsu Province under Grant No.BK20221379the Jiangsu Engineering Research Center of Digital Twinning Technology for Key Equipment in Petrochemical Process under Grant No.DTEC202104.
文摘This paper focuses on the task of few-shot 3D point cloud semantic segmentation.Despite some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation and inaccurate semantic discrimination.To tackle these issues,we first leverage part-whole relationships into the task of 3D point cloud semantic segmentation to capture semantic integrity,which is empowered by the dynamic capsule routing with the module of 3D Capsule Networks(CapsNets)in the embedding network.Concretely,the dynamic routing amalgamates geometric information of the 3D point cloud data to construct higher-level feature representations,which capture the relationships between object parts and their wholes.Secondly,we designed a multi-prototype enhancement module to enhance the prototype discriminability.Specifically,the single-prototype enhancement mechanism is expanded to the multi-prototype enhancement version for capturing rich semantics.Besides,the shot-correlation within the category is calculated via the interaction of different samples to enhance the intra-category similarity.Ablation studies prove that the involved part-whole relations and proposed multi-prototype enhancement module help to achieve complete object segmentation and improve semantic discrimination.Moreover,under the integration of these two modules,quantitative and qualitative experiments on two public benchmarks,including S3DIS and ScanNet,indicate the superior performance of the proposed framework on the task of 3D point cloud semantic segmentation,compared to some state-of-the-art methods.
基金Princess Nourah bint Abdulrahman University Riyadh,Saudi Arabia with Researchers Supporting Project Number:PNURSP2024R234.
文摘Face recognition (FR) technology has numerous applications in artificial intelligence including biometrics, security,authentication, law enforcement, and surveillance. Deep learning (DL) models, notably convolutional neuralnetworks (CNNs), have shown promising results in the field of FR. However CNNs are easily fooled since theydo not encode position and orientation correlations between features. Hinton et al. envisioned Capsule Networksas a more robust design capable of retaining pose information and spatial correlations to recognize objects morelike the brain does. Lower-level capsules hold 8-dimensional vectors of attributes like position, hue, texture, andso on, which are routed to higher-level capsules via a new routing by agreement algorithm. This provides capsulenetworks with viewpoint invariance, which has previously evaded CNNs. This research presents a FR model basedon capsule networks that was tested using the LFW dataset, COMSATS face dataset, and own acquired photos usingcameras measuring 128 × 128 pixels, 40 × 40 pixels, and 30 × 30 pixels. The trained model outperforms state-ofthe-art algorithms, achieving 95.82% test accuracy and performing well on unseen faces that have been blurred orrotated. Additionally, the suggested model outperformed the recently released approaches on the COMSATS facedataset, achieving a high accuracy of 92.47%. Based on the results of this research as well as previous results, capsulenetworks perform better than deeper CNNs on unobserved altered data because of their special equivarianceproperties.
基金The Deanship of Scientific Research (DSR)at King Abdulaziz University (KAU),Jeddah,Saudi Arabia has funded this project,under grant no.KEP-4-120-42.
文摘The current advancement in cloud computing,Artificial Intelligence(AI),and the Internet of Things(IoT)transformed the traditional healthcare system into smart healthcare.Healthcare services could be enhanced by incorporating key techniques like AI and IoT.The convergence of AI and IoT provides distinct opportunities in the medical field.Fall is regarded as a primary cause of death or post-traumatic complication for the ageing population.Therefore,earlier detection of older person falls in smart homes is required to improve the survival rate of an individual or provide the necessary support.Lately,the emergence of IoT,AI,smartphones,wearables,and so on making it possible to design fall detection(FD)systems for smart home care.This article introduces a new Teamwork Optimization with Deep Learning based Fall Detection for IoT Enabled Smart Healthcare Systems(TWODLFDSHS).The TWODL-FDSHS technique’s goal is to detect fall events for a smart healthcare system.Initially,the presented TWODL-FDSHS technique exploits IoT devices for the data collection process.Next,the TWODLFDSHS technique applies the TWO with Capsule Network(CapsNet)model for feature extraction.At last,a deep random vector functional link network(DRVFLN)with an Adam optimizer is exploited for fall event detection.A wide range of simulations took place to exhibit the enhanced performance of the presentedTWODL-FDSHS technique.The experimental outcomes stated the enhancements of the TWODL-FDSHS method over other models with increased accuracy of 98.30%on the URFD dataset.
基金supported by the Fundamental Research Funds for the Central Universities under Grant 2020JKF101the Research Funds of Sugon under Grant 2022KY001.
文摘Rapid development of deepfake technology led to the spread of forged audios and videos across network platforms,presenting risks for numerous countries,societies,and individuals,and posing a serious threat to cyberspace security.To address the problem of insufficient extraction of spatial features and the fact that temporal features are not considered in the deepfake video detection,we propose a detection method based on improved CapsNet and temporal–spatial features(iCapsNet–TSF).First,the dynamic routing algorithm of CapsNet is improved using weight initialization and updating.Then,the optical flow algorithm is used to extract interframe temporal features of the videos to form a dataset of temporal–spatial features.Finally,the iCapsNet model is employed to fully learn the temporal–spatial features of facial videos,and the results are fused.Experimental results show that the detection accuracy of iCapsNet–TSF reaches 94.07%,98.83%,and 98.50%on the Celeb-DF,FaceSwap,and Deepfakes datasets,respectively,displaying a better performance than most existing mainstream algorithms.The iCapsNet–TSF method combines the capsule network and the optical flow algorithm,providing a novel strategy for the deepfake detection,which is of great significance to the prevention of deepfake attacks and the preservation of cyberspace security.
文摘For human-machine communication to be as effective as human-tohuman communication, research on speech emotion recognition is essential.Among the models and the classifiers used to recognize emotions, neural networks appear to be promising due to the network’s ability to learn and the diversity in configuration. Following the convolutional neural network, a capsuleneural network (CapsNet) with inputs and outputs that are not scalar quantitiesbut vectors allows the network to determine the part-whole relationships thatare specific 6 for an object. This paper performs speech emotion recognition basedon CapsNet. The corpora for speech emotion recognition have been augmented byadding white noise and changing voices. The feature parameters of the recognition system input are mel spectrum images along with the characteristics of thesound source, vocal tract and prosody. For the German emotional corpus EMODB, the average accuracy score for 4 emotions, neutral, boredom, anger and happiness, is 99.69%. For Vietnamese emotional corpus BKEmo, this score is94.23% for 4 emotions, neutral, sadness, anger and happiness. The accuracy scoreis highest when combining all the above feature parameters, and this scoreincreases significantly when combining mel spectrum images with the featuresdirectly related to the fundamental frequency.