Deep learning, especially through convolutional neural networks (CNN) such as the U-Net 3D model, has revolutionized fault identification from seismic data, representing a significant leap over traditional methods. Ou...Deep learning, especially through convolutional neural networks (CNN) such as the U-Net 3D model, has revolutionized fault identification from seismic data, representing a significant leap over traditional methods. Our review traces the evolution of CNN, emphasizing the adaptation and capabilities of the U-Net 3D model in automating seismic fault delineation with unprecedented accuracy. We find: 1) The transition from basic neural networks to sophisticated CNN has enabled remarkable advancements in image recognition, which are directly applicable to analyzing seismic data. The U-Net 3D model, with its innovative architecture, exemplifies this progress by providing a method for detailed and accurate fault detection with reduced manual interpretation bias. 2) The U-Net 3D model has demonstrated its superiority over traditional fault identification methods in several key areas: it has enhanced interpretation accuracy, increased operational efficiency, and reduced the subjectivity of manual methods. 3) Despite these achievements, challenges such as the need for effective data preprocessing, acquisition of high-quality annotated datasets, and achieving model generalization across different geological conditions remain. Future research should therefore focus on developing more complex network architectures and innovative training strategies to refine fault identification performance further. Our findings confirm the transformative potential of deep learning, particularly CNN like the U-Net 3D model, in geosciences, advocating for its broader integration to revolutionize geological exploration and seismic analysis.展开更多
In computer vision fields,3D object recognition is one of the most important tasks for many real-world applications.Three-dimensional convolutional neural networks(CNNs)have demonstrated their advantages in 3D object ...In computer vision fields,3D object recognition is one of the most important tasks for many real-world applications.Three-dimensional convolutional neural networks(CNNs)have demonstrated their advantages in 3D object recognition.In this paper,we propose to use the principal curvature directions of 3D objects(using a CAD model)to represent the geometric features as inputs for the 3D CNN.Our framework,namely CurveNet,learns perceptually relevant salient features and predicts object class labels.Curvature directions incorporate complex surface information of a 3D object,which helps our framework to produce more precise and discriminative features for object recognition.Multitask learning is inspired by sharing features between two related tasks,where we consider pose classification as an auxiliary task to enable our CurveNet to better generalize object label classification.Experimental results show that our proposed framework using curvature vectors performs better than voxels as an input for 3D object classification.We further improved the performance of CurveNet by combining two networks with both curvature direction and voxels of a 3D object as the inputs.A Cross-Stitch module was adopted to learn effective shared features across multiple representations.We evaluated our methods using three publicly available datasets and achieved competitive performance in the 3D object recognition task.展开更多
<span style="font-family:Verdana;">Convolutional neural networks, which have achieved outstanding performance in image recognition, have been extensively applied to action recognition. The mainstream a...<span style="font-family:Verdana;">Convolutional neural networks, which have achieved outstanding performance in image recognition, have been extensively applied to action recognition. The mainstream approaches to video understanding can be categorized into two-dimensional and three-dimensional convolutional neural networks. Although three-dimensional convolutional filters can learn the temporal correlation between different frames by extracting the features of multiple frames simultaneously, it results in an explosive number of parameters and calculation cost. Methods based on two-dimensional convolutional neural networks use fewer parameters;they often incorporate optical flow to compensate for their inability to learn temporal relationships. However, calculating the corresponding optical flow results in additional calculation cost;further, it necessitates the use of another model to learn the features of optical flow. We proposed an action recognition framework based on the two-dimensional convolutional neural network;therefore, it was necessary to resolve the lack of temporal relationships. To expand the temporal receptive field, we proposed a multi-scale temporal shift module, which was then combined with a temporal feature difference extraction module to extract the difference between the features of different frames. Finally, the model was compressed to make it more compact. We evaluated our method on two major action recognition benchmarks: the HMDB51 and UCF-101 datasets. Before compression, the proposed method achieved an accuracy of 72.83% on the HMDB51 dataset and 96.25% on the UCF-101 dataset. Following compression, the accuracy was still impressive, at 95.57% and 72.19% on each dataset. The final model was more compact than most related works.</span>展开更多
The mortar pumpability is essential in the construction industry,which requires much labor to estimate manually and always causes material waste.This paper proposes an effective method by combining a 3-dimensional con...The mortar pumpability is essential in the construction industry,which requires much labor to estimate manually and always causes material waste.This paper proposes an effective method by combining a 3-dimensional convolutional neural network(3D CNN)with a 2-dimensional convolutional long short-term memory network(ConvLSTM2D)to automatically classify the mortar pumpability.Experiment results show that the proposed model has an accuracy rate of 100%with a fast convergence speed,based on the dataset organized by collecting the corresponding mortar image sequences.This work demonstrates the feasibility of using computer vision and deep learning for mortar pumpability classification.展开更多
With the rapid development of Web3 D technologies, sketch-based model retrieval has become an increasingly important challenge, while the application of Virtual Reality and 3 D technologies has made shape retrieval of...With the rapid development of Web3 D technologies, sketch-based model retrieval has become an increasingly important challenge, while the application of Virtual Reality and 3 D technologies has made shape retrieval of furniture over a web browser feasible. In this paper, we propose a learning framework for shape retrieval based on two Siamese VGG-16 Convolutional Neural Networks(CNNs), and a CNN-based hybrid learning algorithm to select the best view for a shape. In this algorithm, the AlexNet and VGG-16 CNN architectures are used to perform classification tasks and to extract features, respectively. In addition, a feature fusion method is used to measure the similarity relation of the output features from the two Siamese networks. The proposed framework can provide new alternatives for furniture retrieval in the Web3 D environment. The primary innovation is in the employment of deep learning methods to solve the challenge of obtaining the best view of 3 D furniture,and to address cross-domain feature learning problems. We conduct an experiment to verify the feasibility of the framework and the results show our approach to be superior in comparison to many mainstream state-of-the-art approaches.展开更多
Block matching based 3D filtering methods have achieved great success in image denoising tasks. However the manually set filtering operation could not well describe a good model to transform noisy images to clean imag...Block matching based 3D filtering methods have achieved great success in image denoising tasks. However the manually set filtering operation could not well describe a good model to transform noisy images to clean images. In this paper, we introduce convolutional neural network (CNN) for the 3D filtering step to learn a well fitted model for denoising. With a trainable model, prior knowledge is utilized for better mapping from noisy images to clean images. This block matching and CNN joint model (BMCNN) could denoise images with different sizes and different noise intensity well, especially images with high noise levels. The experimental results demonstrate that among all competing methods, this method achieves the highest peak signal to noise ratio (PSNR) when denoising images with high noise levels (σ 〉 40), and the best visual quality when denoising images with all the tested noise levels.展开更多
Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization syst...Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization systems.While the ultimate purpose is to understand the value of automatically produced Dewey Decimal Classification(DDC)classes for Swedish digital collections,the paper aims to evaluate the performance of six machine learning algorithms as well as a string-matching algorithm based on characteristics of DDC.Design/methodology/approach:State-of-the-art machine learning algorithms require at least 1,000 training examples per class.The complete data set at the time of research involved 143,838 records which had to be reduced to top three hierarchical levels of DDC in order to provide sufficient training data(totaling 802 classes in the training and testing sample,out of 14,413 classes at all levels).Findings:Evaluation shows that Support Vector Machine with linear kernel outperforms other machine learning algorithms as well as the string-matching algorithm on average;the string-matching algorithm outperforms machine learning for specific classes when characteristics of DDC are most suitable for the task.Word embeddings combined with different types of neural networks(simple linear network,standard neural network,1 D convolutional neural network,and recurrent neural network)produced worse results than Support Vector Machine,but reach close results,with the benefit of a smaller representation size.Impact of features in machine learning shows that using keywords or combining titles and keywords gives better results than using only titles as input.Stemming only marginally improves the results.Removed stop-words reduced accuracy in most cases,while removing less frequent words increased it marginally.The greatest impact is produced by the number of training examples:81.90%accuracy on the training set is achieved when at least 1,000 records per class are available in the training set,and 66.13%when too few records(often less than A Comparison of Approaches100 per class)on which to train are available—and these hold only for top 3 hierarchical levels(803 instead of 14,413 classes).Research limitations:Having to reduce the number of hierarchical levels to top three levels of DDC because of the lack of training data for all classes,skews the results so that they work in experimental conditions but barely for end users in operational retrieval systems.Practical implications:In conclusion,for operative information retrieval systems applying purely automatic DDC does not work,either using machine learning(because of the lack of training data for the large number of DDC classes)or using string-matching algorithm(because DDC characteristics perform well for automatic classification only in a small number of classes).Over time,more training examples may become available,and DDC may be enriched with synonyms in order to enhance accuracy of automatic classification which may also benefit information retrieval performance based on DDC.In order for quality information services to reach the objective of highest possible precision and recall,automatic classification should never be implemented on its own;instead,machine-aided indexing that combines the efficiency of automatic suggestions with quality of human decisions at the final stage should be the way for the future.Originality/value:The study explored machine learning on a large classification system of over 14,000 classes which is used in operational information retrieval systems.Due to lack of sufficient training data across the entire set of classes,an approach complementing machine learning,that of string matching,was applied.This combination should be explored further since it provides the potential for real-life applications with large target classification systems.展开更多
The cognitive model ABGP is a special model for agents,which consists of awareness,beliefs,goals and plans. The ABGP agents obtain the knowledge directly from the natural scenes only through some single preestablished...The cognitive model ABGP is a special model for agents,which consists of awareness,beliefs,goals and plans. The ABGP agents obtain the knowledge directly from the natural scenes only through some single preestablished rules like most agent architectures. Inspired by the biological visual cortex( V1) and the higher brain areas perceiving the visual feature,deep convolution neural networks( CNN) are introduced as a visual pathway into ABGP to build a novel visual awareness module. Then a rat-robot maze search simulation platform is constructed to validate that CNN can be used for the awareness module of ABGP. According to the simulation results,the rat-robot implemented by the ABGP with the CNN awareness module reaches the excellent performance of recognizing guideposts,which directly enhances the capability of the communication between the agent and the natural scenes and improves the ability to recognize the real world,which successfully demonstrates that an agent can independently plan its path in terms of the natural scenes.展开更多
文摘Deep learning, especially through convolutional neural networks (CNN) such as the U-Net 3D model, has revolutionized fault identification from seismic data, representing a significant leap over traditional methods. Our review traces the evolution of CNN, emphasizing the adaptation and capabilities of the U-Net 3D model in automating seismic fault delineation with unprecedented accuracy. We find: 1) The transition from basic neural networks to sophisticated CNN has enabled remarkable advancements in image recognition, which are directly applicable to analyzing seismic data. The U-Net 3D model, with its innovative architecture, exemplifies this progress by providing a method for detailed and accurate fault detection with reduced manual interpretation bias. 2) The U-Net 3D model has demonstrated its superiority over traditional fault identification methods in several key areas: it has enhanced interpretation accuracy, increased operational efficiency, and reduced the subjectivity of manual methods. 3) Despite these achievements, challenges such as the need for effective data preprocessing, acquisition of high-quality annotated datasets, and achieving model generalization across different geological conditions remain. Future research should therefore focus on developing more complex network architectures and innovative training strategies to refine fault identification performance further. Our findings confirm the transformative potential of deep learning, particularly CNN like the U-Net 3D model, in geosciences, advocating for its broader integration to revolutionize geological exploration and seismic analysis.
基金This paper was partially supported by a project of the Shanghai Science and Technology Committee(18510760300)Anhui Natural Science Foundation(1908085MF178)Anhui Excellent Young Talents Support Program Project(gxyqZD2019069).
文摘In computer vision fields,3D object recognition is one of the most important tasks for many real-world applications.Three-dimensional convolutional neural networks(CNNs)have demonstrated their advantages in 3D object recognition.In this paper,we propose to use the principal curvature directions of 3D objects(using a CAD model)to represent the geometric features as inputs for the 3D CNN.Our framework,namely CurveNet,learns perceptually relevant salient features and predicts object class labels.Curvature directions incorporate complex surface information of a 3D object,which helps our framework to produce more precise and discriminative features for object recognition.Multitask learning is inspired by sharing features between two related tasks,where we consider pose classification as an auxiliary task to enable our CurveNet to better generalize object label classification.Experimental results show that our proposed framework using curvature vectors performs better than voxels as an input for 3D object classification.We further improved the performance of CurveNet by combining two networks with both curvature direction and voxels of a 3D object as the inputs.A Cross-Stitch module was adopted to learn effective shared features across multiple representations.We evaluated our methods using three publicly available datasets and achieved competitive performance in the 3D object recognition task.
文摘<span style="font-family:Verdana;">Convolutional neural networks, which have achieved outstanding performance in image recognition, have been extensively applied to action recognition. The mainstream approaches to video understanding can be categorized into two-dimensional and three-dimensional convolutional neural networks. Although three-dimensional convolutional filters can learn the temporal correlation between different frames by extracting the features of multiple frames simultaneously, it results in an explosive number of parameters and calculation cost. Methods based on two-dimensional convolutional neural networks use fewer parameters;they often incorporate optical flow to compensate for their inability to learn temporal relationships. However, calculating the corresponding optical flow results in additional calculation cost;further, it necessitates the use of another model to learn the features of optical flow. We proposed an action recognition framework based on the two-dimensional convolutional neural network;therefore, it was necessary to resolve the lack of temporal relationships. To expand the temporal receptive field, we proposed a multi-scale temporal shift module, which was then combined with a temporal feature difference extraction module to extract the difference between the features of different frames. Finally, the model was compressed to make it more compact. We evaluated our method on two major action recognition benchmarks: the HMDB51 and UCF-101 datasets. Before compression, the proposed method achieved an accuracy of 72.83% on the HMDB51 dataset and 96.25% on the UCF-101 dataset. Following compression, the accuracy was still impressive, at 95.57% and 72.19% on each dataset. The final model was more compact than most related works.</span>
基金supported by the Key Project of National Natural Science Foundation of China-Civil Aviation Joint Fund under Grant No.U2033212。
文摘The mortar pumpability is essential in the construction industry,which requires much labor to estimate manually and always causes material waste.This paper proposes an effective method by combining a 3-dimensional convolutional neural network(3D CNN)with a 2-dimensional convolutional long short-term memory network(ConvLSTM2D)to automatically classify the mortar pumpability.Experiment results show that the proposed model has an accuracy rate of 100%with a fast convergence speed,based on the dataset organized by collecting the corresponding mortar image sequences.This work demonstrates the feasibility of using computer vision and deep learning for mortar pumpability classification.
基金supported in part by the Fundamental Research Funds for the Central Universities in China (No. 2100219066)the Key Fundamental Research Funds for the Central Universities in China (No. 0200219153)
文摘With the rapid development of Web3 D technologies, sketch-based model retrieval has become an increasingly important challenge, while the application of Virtual Reality and 3 D technologies has made shape retrieval of furniture over a web browser feasible. In this paper, we propose a learning framework for shape retrieval based on two Siamese VGG-16 Convolutional Neural Networks(CNNs), and a CNN-based hybrid learning algorithm to select the best view for a shape. In this algorithm, the AlexNet and VGG-16 CNN architectures are used to perform classification tasks and to extract features, respectively. In addition, a feature fusion method is used to measure the similarity relation of the output features from the two Siamese networks. The proposed framework can provide new alternatives for furniture retrieval in the Web3 D environment. The primary innovation is in the employment of deep learning methods to solve the challenge of obtaining the best view of 3 D furniture,and to address cross-domain feature learning problems. We conduct an experiment to verify the feasibility of the framework and the results show our approach to be superior in comparison to many mainstream state-of-the-art approaches.
基金This research was supported by the National Natural Science Foundation of China under Grant Nos. 61573380 and 61672542, and Fundamental Research Funds for the Central Universities of China under Grant No. 2016zzts055.
文摘Block matching based 3D filtering methods have achieved great success in image denoising tasks. However the manually set filtering operation could not well describe a good model to transform noisy images to clean images. In this paper, we introduce convolutional neural network (CNN) for the 3D filtering step to learn a well fitted model for denoising. With a trainable model, prior knowledge is utilized for better mapping from noisy images to clean images. This block matching and CNN joint model (BMCNN) could denoise images with different sizes and different noise intensity well, especially images with high noise levels. The experimental results demonstrate that among all competing methods, this method achieves the highest peak signal to noise ratio (PSNR) when denoising images with high noise levels (σ 〉 40), and the best visual quality when denoising images with all the tested noise levels.
文摘Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization systems.While the ultimate purpose is to understand the value of automatically produced Dewey Decimal Classification(DDC)classes for Swedish digital collections,the paper aims to evaluate the performance of six machine learning algorithms as well as a string-matching algorithm based on characteristics of DDC.Design/methodology/approach:State-of-the-art machine learning algorithms require at least 1,000 training examples per class.The complete data set at the time of research involved 143,838 records which had to be reduced to top three hierarchical levels of DDC in order to provide sufficient training data(totaling 802 classes in the training and testing sample,out of 14,413 classes at all levels).Findings:Evaluation shows that Support Vector Machine with linear kernel outperforms other machine learning algorithms as well as the string-matching algorithm on average;the string-matching algorithm outperforms machine learning for specific classes when characteristics of DDC are most suitable for the task.Word embeddings combined with different types of neural networks(simple linear network,standard neural network,1 D convolutional neural network,and recurrent neural network)produced worse results than Support Vector Machine,but reach close results,with the benefit of a smaller representation size.Impact of features in machine learning shows that using keywords or combining titles and keywords gives better results than using only titles as input.Stemming only marginally improves the results.Removed stop-words reduced accuracy in most cases,while removing less frequent words increased it marginally.The greatest impact is produced by the number of training examples:81.90%accuracy on the training set is achieved when at least 1,000 records per class are available in the training set,and 66.13%when too few records(often less than A Comparison of Approaches100 per class)on which to train are available—and these hold only for top 3 hierarchical levels(803 instead of 14,413 classes).Research limitations:Having to reduce the number of hierarchical levels to top three levels of DDC because of the lack of training data for all classes,skews the results so that they work in experimental conditions but barely for end users in operational retrieval systems.Practical implications:In conclusion,for operative information retrieval systems applying purely automatic DDC does not work,either using machine learning(because of the lack of training data for the large number of DDC classes)or using string-matching algorithm(because DDC characteristics perform well for automatic classification only in a small number of classes).Over time,more training examples may become available,and DDC may be enriched with synonyms in order to enhance accuracy of automatic classification which may also benefit information retrieval performance based on DDC.In order for quality information services to reach the objective of highest possible precision and recall,automatic classification should never be implemented on its own;instead,machine-aided indexing that combines the efficiency of automatic suggestions with quality of human decisions at the final stage should be the way for the future.Originality/value:The study explored machine learning on a large classification system of over 14,000 classes which is used in operational information retrieval systems.Due to lack of sufficient training data across the entire set of classes,an approach complementing machine learning,that of string matching,was applied.This combination should be explored further since it provides the potential for real-life applications with large target classification systems.
基金Supported by the National Basic Research Program of China(No.2013CB329502)the National Natural Science Foundation of China(No.61035003,61202212) the National Science and Technology Support Program(No.2012BA107B02)
文摘The cognitive model ABGP is a special model for agents,which consists of awareness,beliefs,goals and plans. The ABGP agents obtain the knowledge directly from the natural scenes only through some single preestablished rules like most agent architectures. Inspired by the biological visual cortex( V1) and the higher brain areas perceiving the visual feature,deep convolution neural networks( CNN) are introduced as a visual pathway into ABGP to build a novel visual awareness module. Then a rat-robot maze search simulation platform is constructed to validate that CNN can be used for the awareness module of ABGP. According to the simulation results,the rat-robot implemented by the ABGP with the CNN awareness module reaches the excellent performance of recognizing guideposts,which directly enhances the capability of the communication between the agent and the natural scenes and improves the ability to recognize the real world,which successfully demonstrates that an agent can independently plan its path in terms of the natural scenes.