We present a novel approach for the prediction of crystal material properties that is distinct from the computationally complex and expensive density functional theory(DFT)-based calculations.Instead,we utilize an att...We present a novel approach for the prediction of crystal material properties that is distinct from the computationally complex and expensive density functional theory(DFT)-based calculations.Instead,we utilize an attention-based graph neural network that yields high-accuracy predictions.Our approach employs two attention mechanisms that allow for message passing on the crystal graphs,which in turn enable the model to selectively attend to pertinent atoms and their local environments,thereby improving performance.We conduct comprehensive experiments to validate our approach,which demonstrates that our method surpasses existing methods in terms of predictive accuracy.Our results suggest that deep learning,particularly attention-based networks,holds significant promise for predicting crystal material properties,with implications for material discovery and the refined intelligent systems.展开更多
It has long been a challenging task to detect an anomaly in a crowded scene.In this paper,a selfsupervised framework called the abnormal event detection network(AED-Net),which is composed of a principal component anal...It has long been a challenging task to detect an anomaly in a crowded scene.In this paper,a selfsupervised framework called the abnormal event detection network(AED-Net),which is composed of a principal component analysis network(PCAnet)and kernel principal component analysis(kPCA),is proposed to address this problem.Using surveillance video sequences of different scenes as raw data,the PCAnet is trained to extract high-level semantics of the crowd’s situation.Next,kPCA,a one-class classifier,is trained to identify anomalies within the scene.In contrast to some prevailing deep learning methods,this framework is completely self-supervised because it utilizes only video sequences of a normal situation.Experiments in global and local abnormal event detection are carried out on Monitoring Human Activity dataset from University of Minnesota(UMN dataset)and Anomaly Detection dataset from University of California,San Diego(UCSD dataset),and competitive results that yield a better equal error rate(EER)and area under curve(AUC)than other state-of-the-art methods are observed.Furthermore,by adding a local response normalization(LRN)layer,we propose an improvement to the original AED-Net.The results demonstrate that this proposed version performs better by promoting the framework’s generalization capacity.展开更多
Security surveillance of public scene is closely relevant to routine safety of individual.Under the stimulus of this concern,abnormal event detection is becoming one of the most important tasks in computer vision and ...Security surveillance of public scene is closely relevant to routine safety of individual.Under the stimulus of this concern,abnormal event detection is becoming one of the most important tasks in computer vision and video processing.In this paper,we propose a new algorithm to address the visual abnormal detection problem.Our algorithm decouples the problem into a feature descriptor extraction process,followed by an AutoEncoder based network called cascade deep AutoEncoder(CDA).The movement information is represented by a novel descriptor capturing the multi-frame optical flow information.And then,the feature descriptor of the normal samples is fed into the CDA network for training.Finally,the abnormal samples are distinguished by the reconstruction error of the CDA in the testing procedure.We validate the proposed method on several video surveillance datasets.展开更多
Action recognition is an important research topic in video analysis that remains very challenging.Effective recognition relies on learning a good representation of both spatial information(for appearance)and temporal ...Action recognition is an important research topic in video analysis that remains very challenging.Effective recognition relies on learning a good representation of both spatial information(for appearance)and temporal information(for motion).These two kinds of information are highly correlated but have quite different properties,leading to unsatisfying results of both connecting independent models(e.g.,CNN-LSTM)and direct unbiased co-modeling(e.g.,3DCNN).Besides,a long-lasting tradition on this task with deep learning models is to just use 8 or 16 consecutive frames as input,making it hard to extract discriminative motion features.In this work,we propose a novel network structure called ResLNet(Deep Residual LSTM network),which can take longer inputs(e.g.,of 64 frames)and have convolutions collaborate with LSTM more effectively under the residual structure to learn better spatial-temporal representations than ever without the cost of extra computations with the proposed embedded variable stride convolution.The superiority of this proposal and its ablation study are shown on the three most popular benchmark datasets:Kinetics,HMDB51,and UCF101.The proposed network could be adopted for various features,such as RGB and optical flow.Due to the limitation of the computation power of our experiment equipment and the real-time requirement,the proposed network is tested on the RGB only and shows great performance.展开更多
Temporal action proposal generation aims to output the starting and ending times of each potential action for long videos and often suffers from high computation cost.To address the issue,we propose a new temporal con...Temporal action proposal generation aims to output the starting and ending times of each potential action for long videos and often suffers from high computation cost.To address the issue,we propose a new temporal convolution network called Multipath Temporal ConvNet(MTCN).In our work,one novel high performance ring parallel architecture based is further introduced into temporal action proposal generation in order to respond to the requirements of large memory occupation and a large number of videos.Remarkably,the total data transmission is reduced by adding a connection between multiple-computing load in the newly developed architecture.Compared to the traditional Parameter Server architecture,our parallel architecture has higher efficiency on temporal action detection tasks with multiple GPUs.We conduct experiments on ActivityNet-1.3 and THUMOS14,where our method outperforms-other state-of-art temporal action detection methods with high recall and high temporal precision.In addition,a time metric is further proposed here to evaluate the speed performancein the distributed training process.展开更多
基金the National Natural Science Foundation of China(Grant Nos.61972016 and 62032016)the Beijing Nova Program(Grant No.20220484106)。
文摘We present a novel approach for the prediction of crystal material properties that is distinct from the computationally complex and expensive density functional theory(DFT)-based calculations.Instead,we utilize an attention-based graph neural network that yields high-accuracy predictions.Our approach employs two attention mechanisms that allow for message passing on the crystal graphs,which in turn enable the model to selectively attend to pertinent atoms and their local environments,thereby improving performance.We conduct comprehensive experiments to validate our approach,which demonstrates that our method surpasses existing methods in terms of predictive accuracy.Our results suggest that deep learning,particularly attention-based networks,holds significant promise for predicting crystal material properties,with implications for material discovery and the refined intelligent systems.
基金This work is partially supported by the National Key Research and Development Program of China(2016YFE0204200)the National Natural Science Foundation of China(61503017)+3 种基金the Fundamental Research Funds for the Central Universities(YWF-18-BJ-J-221)the Aeronautical Science Foundation of China(2016ZC51022)the Platform CAPSEC(capteurs pour la sécurité)funded by Région Champagne-ArdenneFEDER(fonds européen de développement régional).
文摘It has long been a challenging task to detect an anomaly in a crowded scene.In this paper,a selfsupervised framework called the abnormal event detection network(AED-Net),which is composed of a principal component analysis network(PCAnet)and kernel principal component analysis(kPCA),is proposed to address this problem.Using surveillance video sequences of different scenes as raw data,the PCAnet is trained to extract high-level semantics of the crowd’s situation.Next,kPCA,a one-class classifier,is trained to identify anomalies within the scene.In contrast to some prevailing deep learning methods,this framework is completely self-supervised because it utilizes only video sequences of a normal situation.Experiments in global and local abnormal event detection are carried out on Monitoring Human Activity dataset from University of Minnesota(UMN dataset)and Anomaly Detection dataset from University of California,San Diego(UCSD dataset),and competitive results that yield a better equal error rate(EER)and area under curve(AUC)than other state-of-the-art methods are observed.Furthermore,by adding a local response normalization(LRN)layer,we propose an improvement to the original AED-Net.The results demonstrate that this proposed version performs better by promoting the framework’s generalization capacity.
基金the National Key R&D Program of China(2016YFE0204200)the National Natural Science Foundation of China(Grant Nos.61503017,U1435220)+2 种基金the Fundamental Research Funds for the Central Universities(YWF-14-RSC-102)the Aeronautical Science Foundation of China(2016ZC51022)the ANR AutoFerm project,the Platform CAPSEC funded by Region Champagne-Ardenne and FEDER.
文摘Security surveillance of public scene is closely relevant to routine safety of individual.Under the stimulus of this concern,abnormal event detection is becoming one of the most important tasks in computer vision and video processing.In this paper,we propose a new algorithm to address the visual abnormal detection problem.Our algorithm decouples the problem into a feature descriptor extraction process,followed by an AutoEncoder based network called cascade deep AutoEncoder(CDA).The movement information is represented by a novel descriptor capturing the multi-frame optical flow information.And then,the feature descriptor of the normal samples is fed into the CDA network for training.Finally,the abnormal samples are distinguished by the reconstruction error of the CDA in the testing procedure.We validate the proposed method on several video surveillance datasets.
基金supported in part by the National Key Research and Development Program of China (2018AAA0101400)the National Natural Science Foundation of China (Grant Nos.61972016,62032016,61866022)the Natural Science Foundation of Beijing (L191007).
文摘Action recognition is an important research topic in video analysis that remains very challenging.Effective recognition relies on learning a good representation of both spatial information(for appearance)and temporal information(for motion).These two kinds of information are highly correlated but have quite different properties,leading to unsatisfying results of both connecting independent models(e.g.,CNN-LSTM)and direct unbiased co-modeling(e.g.,3DCNN).Besides,a long-lasting tradition on this task with deep learning models is to just use 8 or 16 consecutive frames as input,making it hard to extract discriminative motion features.In this work,we propose a novel network structure called ResLNet(Deep Residual LSTM network),which can take longer inputs(e.g.,of 64 frames)and have convolutions collaborate with LSTM more effectively under the residual structure to learn better spatial-temporal representations than ever without the cost of extra computations with the proposed embedded variable stride convolution.The superiority of this proposal and its ablation study are shown on the three most popular benchmark datasets:Kinetics,HMDB51,and UCF101.The proposed network could be adopted for various features,such as RGB and optical flow.Due to the limitation of the computation power of our experiment equipment and the real-time requirement,the proposed network is tested on the RGB only and shows great performance.
基金supported by the National Key Research and Development Program of China(2016YFE0204200)the National Natural Science Foundation of China(Grant Nos,61972016,62032016)+2 种基金Bejing Natural Science Foundation(L191007)the Fundamental Research Funds for the Central Universities(YWF-21-BJ-J-313 and YWF-20-BJ-J-612)Open Research Fund of Digital Fujian Environment Monitoring Internet of Things Laboratory Foundation(202004).
文摘Temporal action proposal generation aims to output the starting and ending times of each potential action for long videos and often suffers from high computation cost.To address the issue,we propose a new temporal convolution network called Multipath Temporal ConvNet(MTCN).In our work,one novel high performance ring parallel architecture based is further introduced into temporal action proposal generation in order to respond to the requirements of large memory occupation and a large number of videos.Remarkably,the total data transmission is reduced by adding a connection between multiple-computing load in the newly developed architecture.Compared to the traditional Parameter Server architecture,our parallel architecture has higher efficiency on temporal action detection tasks with multiple GPUs.We conduct experiments on ActivityNet-1.3 and THUMOS14,where our method outperforms-other state-of-art temporal action detection methods with high recall and high temporal precision.In addition,a time metric is further proposed here to evaluate the speed performancein the distributed training process.