Visual object tracking plays a crucial role in computer vision.In recent years,researchers have proposed various methods to achieve high-performance object tracking.Among these,methods based on Transformers have becom...Visual object tracking plays a crucial role in computer vision.In recent years,researchers have proposed various methods to achieve high-performance object tracking.Among these,methods based on Transformers have become a research hotspot due to their ability to globally model and contextualize information.However,current Transformer-based object tracking methods still face challenges such as low tracking accuracy and the presence of redundant feature information.In this paper,we introduce self-calibration multi-head self-attention Transformer(SMSTracker)as a solution to these challenges.It employs a hybrid tensor decomposition self-organizing multihead self-attention transformermechanism,which not only compresses and accelerates Transformer operations but also significantly reduces redundant data,thereby enhancing the accuracy and efficiency of tracking.Additionally,we introduce a self-calibration attention fusion block to resolve common issues of attention ambiguities and inconsistencies found in traditional trackingmethods,ensuring the stability and reliability of tracking performance across various scenarios.By integrating a hybrid tensor decomposition approach with a self-organizingmulti-head self-attentive transformer mechanism,SMSTracker enhances the efficiency and accuracy of the tracking process.Experimental results show that SMSTracker achieves competitive performance in visual object tracking,promising more robust and efficient tracking systems,demonstrating its potential to providemore robust and efficient tracking solutions in real-world applications.展开更多
As a branch of computer science,information visualization aims to help users understand and analyze complex data through graphical interfaces and interactive technologies.Information visualization primarily includes v...As a branch of computer science,information visualization aims to help users understand and analyze complex data through graphical interfaces and interactive technologies.Information visualization primarily includes various visual structures such as time-series structures,spatial relationship structures,statistical distribution structures,and geographic map structures,each with unique functions and application scenarios.To better explain the cognitive process of visualization,researchers have proposed various cognitive models based on interaction mechanisms,visual perception steps,and novice use of visualization.These models help understand user cognition in information visualization,enhancing the effectiveness of data analysis and decision-making.展开更多
Background With the rapid development of Web3D technologies, the online Web3D visualization, particularly for complex models or scenes, has been in a great demand. Owing to the major conflict between the Web3D system ...Background With the rapid development of Web3D technologies, the online Web3D visualization, particularly for complex models or scenes, has been in a great demand. Owing to the major conflict between the Web3D system load and resource consumption in the processing of these huge models, the huge 3D model lightweighting methods for online Web3D visualization are reviewed in this paper. Methods By observing the geometry redundancy introduced by man-made operations in the modeling procedure, several categories of light-weighting related work that aim at reducing the amount of data and resource consumption are elaborated for Web3D visualization. Results By comparing perspectives, the characteristics of each method are summarized, and among the reviewed methods, the geometric redundancy removal that achieves the lightweight goal by detecting and removing the repeated components is an appropriate method for current online Web3D visualization. Meanwhile, the learning algorithm, still in improvement period at present, is our expected future research topic. Conclusions Various aspects should be considered in an efficient lightweight method for online Web3D visualization, such as characteristics of original data, combination or extension of existing methods, scheduling strategy, cache man-agement, and rendering mechanism. Meanwhile, innovation methods, particularly the learning algorithm, are worth exploring.展开更多
With the development of short video industry,video and bullet screen have become important ways to spread public opinions.Public attitudes can be timely obtained through emotional analysis on bullet screen,which can a...With the development of short video industry,video and bullet screen have become important ways to spread public opinions.Public attitudes can be timely obtained through emotional analysis on bullet screen,which can also reduce difficulties in management of online public opinions.A convolutional neural network model based on multi-head attention is proposed to solve the problem of how to effectively model relations among words and identify key words in emotion classification tasks with short text contents and lack of complete context information.Firstly,encode word positions so that order information of input sequences can be used by the model.Secondly,use a multi-head attention mechanism to obtain semantic expressions in different subspaces,effectively capture internal relevance and enhance dependent relationships among words,as well as highlight emotional weights of key emotional words.Then a dilated convolution is used to increase the receptive field and extract more features.On this basis,the above multi-attention mechanism is combined with a convolutional neural network to model and analyze the seven emotional categories of bullet screens.Testing from perspectives of model and dataset,experimental results can validate effectiveness of our approach.Finally,emotions of bullet screens are visualized to provide data supports for hot event controls and other fields.展开更多
Artificial Intelligence(AI)and Computer Vision(CV)advancements have led to many useful methodologies in recent years,particularly to help visually-challenged people.Object detection includes a variety of challenges,fo...Artificial Intelligence(AI)and Computer Vision(CV)advancements have led to many useful methodologies in recent years,particularly to help visually-challenged people.Object detection includes a variety of challenges,for example,handlingmultiple class images,images that get augmented when captured by a camera and so on.The test images include all these variants as well.These detection models alert them about their surroundings when they want to walk independently.This study compares four CNN-based pre-trainedmodels:ResidualNetwork(ResNet-50),Inception v3,DenseConvolutional Network(DenseNet-121),and SqueezeNet,predominantly used in image recognition applications.Based on the analysis performed on these test images,the study infers that Inception V3 outperformed other pre-trained models in terms of accuracy and speed.To further improve the performance of the Inception v3 model,the thermal exchange optimization(TEO)algorithm is applied to tune the hyperparameters(number of epochs,batch size,and learning rate)showing the novelty of the work.Better accuracy was achieved owing to the inclusion of an auxiliary classifier as a regularizer,hyperparameter optimizer,and factorization approach.Additionally,Inception V3 can handle images of different sizes.This makes Inception V3 the optimum model for assisting visually challenged people in real-world communication when integrated with Internet of Things(IoT)-based devices.展开更多
Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by ut...Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by utilizing a self-attention mechanism.This study aims to provide a comprehensive survey of recent transformerbased approaches in image and video applications,as well as diffusion models.We begin by discussing existing surveys of vision transformers and comparing them to this work.Then,we review the main components of a vanilla transformer network,including the self-attention mechanism,feed-forward network,position encoding,etc.In the main part of this survey,we review recent transformer-based models in three categories:Transformer for downstream tasks,Vision Transformer for Generation,and Vision Transformer for Segmentation.We also provide a comprehensive overview of recent transformer models for video tasks and diffusion models.We compare the performance of various hierarchical transformer networks for multiple tasks on popular benchmark datasets.Finally,we explore some future research directions to further improve the field.展开更多
The telecommunications industry is becoming increasingly aware of potential subscriber churn as a result of the growing popularity of smartphones in the mobile Internet era,the quick development of telecommunications ...The telecommunications industry is becoming increasingly aware of potential subscriber churn as a result of the growing popularity of smartphones in the mobile Internet era,the quick development of telecommunications services,the implementation of the number portability policy,and the intensifying competition among operators.At the same time,users'consumption preferences and choices are evolving.Excellent churn prediction models must be created in order to accurately predict the churn tendency,since keeping existing customers is far less expensive than acquiring new ones.But conventional or learning-based algorithms can only go so far into a single subscriber's data;they cannot take into consideration changes in a subscriber's subscription and ignore the coupling and correlation between various features.Additionally,the current churn prediction models have a high computational burden,a fuzzy weight distribution,and significant resource economic costs.The prediction algorithms involving network models currently in use primarily take into account the private information shared between users with text and pictures,ignoring the reference value supplied by other users with the same package.This work suggests a user churn prediction model based on Graph Attention Convolutional Neural Network(GAT-CNN)to address the aforementioned issues.The main contributions of this paper are as follows:Firstly,we present a three-tiered hierarchical cloud-edge cooperative framework that increases the volume of user feature input by means of two aggregations at the device,edge,and cloud layers.Second,we extend the use of users'own data by introducing self-attention and graph convolution models to track the relative changes of both users and packages simultaneously.Lastly,we build an integrated offline-online system for churn prediction based on the strengths of the two models,and we experimentally validate the efficacy of cloudside collaborative training and inference.In summary,the churn prediction model based on Graph Attention Convolutional Neural Network presented in this paper can effectively address the drawbacks of conventional algorithms and offer telecom operators crucial decision support in developing subscriber retention strategies and cutting operational expenses.展开更多
This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large mode...This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.展开更多
The rapid evolution of wireless communication technologies has underscored the critical role of antennas in ensuring seamless connectivity.Antenna defects,ranging from manufacturing imperfections to environmental wear...The rapid evolution of wireless communication technologies has underscored the critical role of antennas in ensuring seamless connectivity.Antenna defects,ranging from manufacturing imperfections to environmental wear,pose significant challenges to the reliability and performance of communication systems.This review paper navigates the landscape of antenna defect detection,emphasizing the need for a nuanced understanding of various defect types and the associated challenges in visual detection.This review paper serves as a valuable resource for researchers,engineers,and practitioners engaged in the design and maintenance of communication systems.The insights presented here pave the way for enhanced reliability in antenna systems through targeted defect detection measures.In this study,a comprehensive literature analysis on computer vision algorithms that are employed in end-of-line visual inspection of antenna parts is presented.The PRISMA principles will be followed throughout the review,and its goals are to provide a summary of recent research,identify relevant computer vision techniques,and evaluate how effective these techniques are in discovering defects during inspections.It contains articles from scholarly journals as well as papers presented at conferences up until June 2023.This research utilized search phrases that were relevant,and papers were chosen based on whether or not they met certain inclusion and exclusion criteria.In this study,several different computer vision approaches,such as feature extraction and defect classification,are broken down and analyzed.Additionally,their applicability and performance are discussed.The review highlights the significance of utilizing a wide variety of datasets and measurement criteria.The findings of this study add to the existing body of knowledge and point researchers in the direction of promising new areas of investigation,such as real-time inspection systems and multispectral imaging.This review,on its whole,offers a complete study of computer vision approaches for quality control in antenna parts.It does so by providing helpful insights and drawing attention to areas that require additional exploration.展开更多
Brood parasitism and egg mimicry of Himalayan Cuckoo(Cuculus saturatus) on its host Blyth's Leaf Warbler(Phylloscopus reguloides) were studied in south-western China from April to July 2009.The cuckoo laid a whit...Brood parasitism and egg mimicry of Himalayan Cuckoo(Cuculus saturatus) on its host Blyth's Leaf Warbler(Phylloscopus reguloides) were studied in south-western China from April to July 2009.The cuckoo laid a white egg with fine brown markings on the blunt end.The eggs were conspicuously bigger than the host's own,with 2.06 g in mass and 1.91 cm3 in volume.Visual modeling showed that the cuckoo eggs,which from the human eye appeared to mimic the host eggs to a great extent,were completely different from the host eggs in both hue and chroma.The characters of the Himalayan Cuckoo nestling,reported for the first time,included two triangular and black patches on its gape,which appeared from four days old and became darker with age and growth.While this character also exists in nestlings of Oriental Cuckoo(C.optatus),it has not been found for other Cuculus species.Our results reveal cryptic aspects in the cuckoo-host egg color matching,which are not visible to the naked human eye,and indicate that high mimetic cuckoo eggs rejected by hosts,as determined by human observers in previous studies,might not be mimetic as birds see them.展开更多
Hue-Saturation-Intensity (HSI) color model, a psychologically appealing color model, was employed to visualize uncertainty represented by relative prediction error based on the case of spatial prediction of pH of to...Hue-Saturation-Intensity (HSI) color model, a psychologically appealing color model, was employed to visualize uncertainty represented by relative prediction error based on the case of spatial prediction of pH of topsoil in the peri-urban Beijing. A two-dimensional legend was designed to accompany the visualization-vertical axis (hues) for visualizing the predicted values and horizontal axis (whiteness) for visualizing the prediction error. Moreover, different ways of visualizing uncertainty were briefly reviewed in this paper. This case study indicated that visualization of both predictions and prediction uncertainty offered a possibility to enhance visual exploration of the data uncertainty and to compare different prediction methods or predictions of totally different variables. The whitish region of the visualization map can be simply interpreted as unsatisfactory prediction results, where may need additional samples or more suitable prediction models for a better prediction results.展开更多
According to the characteristics of bore data,a model of 3D geologic body with generalized tri-prism as the primitive modeling element is constructed while the modeling process and key algorithms of modeling are prese...According to the characteristics of bore data,a model of 3D geologic body with generalized tri-prism as the primitive modeling element is constructed while the modeling process and key algorithms of modeling are presented here in detail.Using this method,the original bore data go through Delaunay triangulation to generate irregular triangular network on the surface,and then links stratum segments on the adjoining bores in session to form tri-prisms which would be pinched out.Finally stratified 3D geologic body model is built by an iterated search which searches for consecutive layer of the same property.The result shows that this method can effectively simulate stratified stratum modeling.展开更多
Data acquisition and modeling are the two important, difficult and costful aspects in a Cybercity project. 2D-GIS is mature and can manage a lot of spatial data. Thus 3D-GIS should make the best of data and technology...Data acquisition and modeling are the two important, difficult and costful aspects in a Cybercity project. 2D-GIS is mature and can manage a lot of spatial data. Thus 3D-GIS should make the best of data and technology of 2D-GIS. Construction of a useful synthetic environment requires integration of multiple types of information like DEM, texture images and 3D representation of objects such as buildings. In this paper, the method for 3D city landscape data model and visualization based on integrated databases is presented. Since the data volume of raster are very huge, special strategies(for example, pyramid gridded method) must be adopted in order to manage raster data efficiently. Three different methods of data acquisition, the proper data structure and a simple modeling method are presented as well. At last, a pilot project of Shanghai Cybercity is illustrated.展开更多
Digital mine is the inevitable outcome of the information processing, and is also a complicated system engineering. Firstly, for the 3D visualization application of the digital mine, the ground and underground integra...Digital mine is the inevitable outcome of the information processing, and is also a complicated system engineering. Firstly, for the 3D visualization application of the digital mine, the ground and underground integrative visualization framework model was proposed based on the mine entity database. So, the visualization problem was availably resolved, as well as the professional analytical ability was improved. Secondly, aiming at the irregularities, non-uniformity, dynamics of mine entities, mix modeling method based on the entity character was put forward, in which 3D expression of mine entities was realized. Lastly, the 3D visualization project for a copper mine was experimentally studied. Satisfactory results were acquired, and the rationality of visualization model and feasibility of 3D modeling were validated.展开更多
The mechanisms of seismically-induced liquefaction of granular soils underhigh confining stresses are still not fully understood.Evaluation of these mechanisms is generallybased on extrapolation of observed behavior a...The mechanisms of seismically-induced liquefaction of granular soils underhigh confining stresses are still not fully understood.Evaluation of these mechanisms is generallybased on extrapolation of observed behavior at shallow depths.Three centrifuge model tests wereconducted at RPI's experimental facility to investigate the effects of confining stresses on thedynamic response of a deep horizontal deposit of saturated sand.Liquefaction was observed at highconfining stresses in each of the tests.A system identification procedure was used to estimate theassociated shear strain and stress time histories.These histories revealed a response marked byshear strength degradation and dilative patterns.The recorded accelerations and pore pressures wereemployed to generate visual animations of the models.These visualizations revealed a liquefactionfront traveling downward and leading to large shear strains and isolation of upper soil layers.展开更多
Based on the analysis of whole mining process in metal mines, it was pointed out that the investigation of the heavy metal pollution of tailings should be taken as an important project for a metal mine. Combined with ...Based on the analysis of whole mining process in metal mines, it was pointed out that the investigation of the heavy metal pollution of tailings should be taken as an important project for a metal mine. Combined with the anlysis of the characteristics of tailings, it is found that the transformation of the heavy metal dissolution process, the heavy metal ions migration with groundwater and the heavy metal transport in porous media are three key aspects. Accordingly, the models of heavy metal pollution were established with providing boundary conditions. Depending upon a case of Ibnglushan Copper Mine railings and its relevant area from Google maps', a three-dimensional grid view of the tailings was set up. By application of Fluent software, the contaminated process of the heavy metal pollutants in the tailings was shown through digital visualization pattern.展开更多
The dynamic multichannel binocular visual image modeling is studied based on Internet of Things (IoT) Perception Layer, using mobile robot self-organizing network. By employing multigroup mobile robots with binocular ...The dynamic multichannel binocular visual image modeling is studied based on Internet of Things (IoT) Perception Layer, using mobile robot self-organizing network. By employing multigroup mobile robots with binocular visual system, the real visual images of the object will be obtained. Then through the mobile self-organizing network, a three-dimensional model is rebuilt by synthesizing the returned images. On this basis, we formalize a novel algorithm for multichannel binocular visual three-dimensional images based on fast three-dimensional modeling. Compared with the method based on single binocular visual system, the new algorithm can improve the Integrity and accuracy of the dynamic three-dimensional object modeling. The simulation results show that the new method can effectively accelerate the modeling speed, improve the similarity and not increase the data size.展开更多
Traditional vehicle detection algorithms use traverse search based vehicle candidate generation and hand crafted based classifier training for vehicle candidate verification.These types of methods generally have high ...Traditional vehicle detection algorithms use traverse search based vehicle candidate generation and hand crafted based classifier training for vehicle candidate verification.These types of methods generally have high processing times and low vehicle detection performance.To address this issue,a visual saliency and deep sparse convolution hierarchical model based vehicle detection algorithm is proposed.A visual saliency calculation is firstly used to generate a small vehicle candidate area.The vehicle candidate sub images are then loaded into a sparse deep convolution hierarchical model with an SVM-based classifier to perform the final detection.The experimental results demonstrate that the proposed method is with 94.81% correct rate and 0.78% false detection rate on the existing datasets and the real road pictures captured by our group,which outperforms the existing state-of-the-art algorithms.More importantly,high discriminative multi-scale features are generated by deep sparse convolution network which has broad application prospects in target recognition in the field of intelligent vehicle.展开更多
基金supported by the National Natural Science Foundation of China under Grant 62177029the Postgraduate Research&Practice Innovation Program of Jiangsu Province(KYCX21_0740),China.
文摘Visual object tracking plays a crucial role in computer vision.In recent years,researchers have proposed various methods to achieve high-performance object tracking.Among these,methods based on Transformers have become a research hotspot due to their ability to globally model and contextualize information.However,current Transformer-based object tracking methods still face challenges such as low tracking accuracy and the presence of redundant feature information.In this paper,we introduce self-calibration multi-head self-attention Transformer(SMSTracker)as a solution to these challenges.It employs a hybrid tensor decomposition self-organizing multihead self-attention transformermechanism,which not only compresses and accelerates Transformer operations but also significantly reduces redundant data,thereby enhancing the accuracy and efficiency of tracking.Additionally,we introduce a self-calibration attention fusion block to resolve common issues of attention ambiguities and inconsistencies found in traditional trackingmethods,ensuring the stability and reliability of tracking performance across various scenarios.By integrating a hybrid tensor decomposition approach with a self-organizingmulti-head self-attentive transformer mechanism,SMSTracker enhances the efficiency and accuracy of the tracking process.Experimental results show that SMSTracker achieves competitive performance in visual object tracking,promising more robust and efficient tracking systems,demonstrating its potential to providemore robust and efficient tracking solutions in real-world applications.
文摘As a branch of computer science,information visualization aims to help users understand and analyze complex data through graphical interfaces and interactive technologies.Information visualization primarily includes various visual structures such as time-series structures,spatial relationship structures,statistical distribution structures,and geographic map structures,each with unique functions and application scenarios.To better explain the cognitive process of visualization,researchers have proposed various cognitive models based on interaction mechanisms,visual perception steps,and novice use of visualization.These models help understand user cognition in information visualization,enhancing the effectiveness of data analysis and decision-making.
文摘Background With the rapid development of Web3D technologies, the online Web3D visualization, particularly for complex models or scenes, has been in a great demand. Owing to the major conflict between the Web3D system load and resource consumption in the processing of these huge models, the huge 3D model lightweighting methods for online Web3D visualization are reviewed in this paper. Methods By observing the geometry redundancy introduced by man-made operations in the modeling procedure, several categories of light-weighting related work that aim at reducing the amount of data and resource consumption are elaborated for Web3D visualization. Results By comparing perspectives, the characteristics of each method are summarized, and among the reviewed methods, the geometric redundancy removal that achieves the lightweight goal by detecting and removing the repeated components is an appropriate method for current online Web3D visualization. Meanwhile, the learning algorithm, still in improvement period at present, is our expected future research topic. Conclusions Various aspects should be considered in an efficient lightweight method for online Web3D visualization, such as characteristics of original data, combination or extension of existing methods, scheduling strategy, cache man-agement, and rendering mechanism. Meanwhile, innovation methods, particularly the learning algorithm, are worth exploring.
基金National Natural Science Foundation of China(No.61562057)Gansu Science and Technology Plan Project(No.18JR3RA104)。
文摘With the development of short video industry,video and bullet screen have become important ways to spread public opinions.Public attitudes can be timely obtained through emotional analysis on bullet screen,which can also reduce difficulties in management of online public opinions.A convolutional neural network model based on multi-head attention is proposed to solve the problem of how to effectively model relations among words and identify key words in emotion classification tasks with short text contents and lack of complete context information.Firstly,encode word positions so that order information of input sequences can be used by the model.Secondly,use a multi-head attention mechanism to obtain semantic expressions in different subspaces,effectively capture internal relevance and enhance dependent relationships among words,as well as highlight emotional weights of key emotional words.Then a dilated convolution is used to increase the receptive field and extract more features.On this basis,the above multi-attention mechanism is combined with a convolutional neural network to model and analyze the seven emotional categories of bullet screens.Testing from perspectives of model and dataset,experimental results can validate effectiveness of our approach.Finally,emotions of bullet screens are visualized to provide data supports for hot event controls and other fields.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2023R191)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4310373DSR61)This study is supported via funding from Prince Sattam bin Abdulaziz University project number(PSAU/2023/R/1444).
文摘Artificial Intelligence(AI)and Computer Vision(CV)advancements have led to many useful methodologies in recent years,particularly to help visually-challenged people.Object detection includes a variety of challenges,for example,handlingmultiple class images,images that get augmented when captured by a camera and so on.The test images include all these variants as well.These detection models alert them about their surroundings when they want to walk independently.This study compares four CNN-based pre-trainedmodels:ResidualNetwork(ResNet-50),Inception v3,DenseConvolutional Network(DenseNet-121),and SqueezeNet,predominantly used in image recognition applications.Based on the analysis performed on these test images,the study infers that Inception V3 outperformed other pre-trained models in terms of accuracy and speed.To further improve the performance of the Inception v3 model,the thermal exchange optimization(TEO)algorithm is applied to tune the hyperparameters(number of epochs,batch size,and learning rate)showing the novelty of the work.Better accuracy was achieved owing to the inclusion of an auxiliary classifier as a regularizer,hyperparameter optimizer,and factorization approach.Additionally,Inception V3 can handle images of different sizes.This makes Inception V3 the optimum model for assisting visually challenged people in real-world communication when integrated with Internet of Things(IoT)-based devices.
基金supported in part by the National Natural Science Foundation of China under Grants 61502162,61702175,and 61772184in part by the Fund of the State Key Laboratory of Geo-information Engineering under Grant SKLGIE2016-M-4-2+4 种基金in part by the Hunan Natural Science Foundation of China under Grant 2018JJ2059in part by the Key R&D Project of Hunan Province of China under Grant 2018GK2014in part by the Open Fund of the State Key Laboratory of Integrated Services Networks under Grant ISN17-14Chinese Scholarship Council(CSC)through College of Computer Science and Electronic Engineering,Changsha,410082Hunan University with Grant CSC No.2018GXZ020784.
文摘Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by utilizing a self-attention mechanism.This study aims to provide a comprehensive survey of recent transformerbased approaches in image and video applications,as well as diffusion models.We begin by discussing existing surveys of vision transformers and comparing them to this work.Then,we review the main components of a vanilla transformer network,including the self-attention mechanism,feed-forward network,position encoding,etc.In the main part of this survey,we review recent transformer-based models in three categories:Transformer for downstream tasks,Vision Transformer for Generation,and Vision Transformer for Segmentation.We also provide a comprehensive overview of recent transformer models for video tasks and diffusion models.We compare the performance of various hierarchical transformer networks for multiple tasks on popular benchmark datasets.Finally,we explore some future research directions to further improve the field.
基金supported by National Key R&D Program of China(No.2022YFB3104500)Natural Science Foundation of Jiangsu Province(No.BK20222013)Scientific Research Foundation of Nanjing Institute of Technology(No.3534113223036)。
文摘The telecommunications industry is becoming increasingly aware of potential subscriber churn as a result of the growing popularity of smartphones in the mobile Internet era,the quick development of telecommunications services,the implementation of the number portability policy,and the intensifying competition among operators.At the same time,users'consumption preferences and choices are evolving.Excellent churn prediction models must be created in order to accurately predict the churn tendency,since keeping existing customers is far less expensive than acquiring new ones.But conventional or learning-based algorithms can only go so far into a single subscriber's data;they cannot take into consideration changes in a subscriber's subscription and ignore the coupling and correlation between various features.Additionally,the current churn prediction models have a high computational burden,a fuzzy weight distribution,and significant resource economic costs.The prediction algorithms involving network models currently in use primarily take into account the private information shared between users with text and pictures,ignoring the reference value supplied by other users with the same package.This work suggests a user churn prediction model based on Graph Attention Convolutional Neural Network(GAT-CNN)to address the aforementioned issues.The main contributions of this paper are as follows:Firstly,we present a three-tiered hierarchical cloud-edge cooperative framework that increases the volume of user feature input by means of two aggregations at the device,edge,and cloud layers.Second,we extend the use of users'own data by introducing self-attention and graph convolution models to track the relative changes of both users and packages simultaneously.Lastly,we build an integrated offline-online system for churn prediction based on the strengths of the two models,and we experimentally validate the efficacy of cloudside collaborative training and inference.In summary,the churn prediction model based on Graph Attention Convolutional Neural Network presented in this paper can effectively address the drawbacks of conventional algorithms and offer telecom operators crucial decision support in developing subscriber retention strategies and cutting operational expenses.
基金Supported by the National Natural Science Foundation of China(72088101,42372175)PetroChina Science and Technology Innovation Fund Program(2021DQ02-0904)。
文摘This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.
文摘The rapid evolution of wireless communication technologies has underscored the critical role of antennas in ensuring seamless connectivity.Antenna defects,ranging from manufacturing imperfections to environmental wear,pose significant challenges to the reliability and performance of communication systems.This review paper navigates the landscape of antenna defect detection,emphasizing the need for a nuanced understanding of various defect types and the associated challenges in visual detection.This review paper serves as a valuable resource for researchers,engineers,and practitioners engaged in the design and maintenance of communication systems.The insights presented here pave the way for enhanced reliability in antenna systems through targeted defect detection measures.In this study,a comprehensive literature analysis on computer vision algorithms that are employed in end-of-line visual inspection of antenna parts is presented.The PRISMA principles will be followed throughout the review,and its goals are to provide a summary of recent research,identify relevant computer vision techniques,and evaluate how effective these techniques are in discovering defects during inspections.It contains articles from scholarly journals as well as papers presented at conferences up until June 2023.This research utilized search phrases that were relevant,and papers were chosen based on whether or not they met certain inclusion and exclusion criteria.In this study,several different computer vision approaches,such as feature extraction and defect classification,are broken down and analyzed.Additionally,their applicability and performance are discussed.The review highlights the significance of utilizing a wide variety of datasets and measurement criteria.The findings of this study add to the existing body of knowledge and point researchers in the direction of promising new areas of investigation,such as real-time inspection systems and multispectral imaging.This review,on its whole,offers a complete study of computer vision approaches for quality control in antenna parts.It does so by providing helpful insights and drawing attention to areas that require additional exploration.
基金supported by National Natural Science Foundation of China(3086004431071938)+1 种基金Program for New Century Excellent Talents in University(NCET-10-0111)China Postdoctoral Science Foundation(20110490967)funded project
文摘Brood parasitism and egg mimicry of Himalayan Cuckoo(Cuculus saturatus) on its host Blyth's Leaf Warbler(Phylloscopus reguloides) were studied in south-western China from April to July 2009.The cuckoo laid a white egg with fine brown markings on the blunt end.The eggs were conspicuously bigger than the host's own,with 2.06 g in mass and 1.91 cm3 in volume.Visual modeling showed that the cuckoo eggs,which from the human eye appeared to mimic the host eggs to a great extent,were completely different from the host eggs in both hue and chroma.The characters of the Himalayan Cuckoo nestling,reported for the first time,included two triangular and black patches on its gape,which appeared from four days old and became darker with age and growth.While this character also exists in nestlings of Oriental Cuckoo(C.optatus),it has not been found for other Cuculus species.Our results reveal cryptic aspects in the cuckoo-host egg color matching,which are not visible to the naked human eye,and indicate that high mimetic cuckoo eggs rejected by hosts,as determined by human observers in previous studies,might not be mimetic as birds see them.
基金Under the auspices of Knowledge Innovation Frontier Project of Institute of Soil Science,Chinese Academy of Sciences(No.ISSASIP0716 )the National Nature Science Foundation of China ( No.40701070,40571065)
文摘Hue-Saturation-Intensity (HSI) color model, a psychologically appealing color model, was employed to visualize uncertainty represented by relative prediction error based on the case of spatial prediction of pH of topsoil in the peri-urban Beijing. A two-dimensional legend was designed to accompany the visualization-vertical axis (hues) for visualizing the predicted values and horizontal axis (whiteness) for visualizing the prediction error. Moreover, different ways of visualizing uncertainty were briefly reviewed in this paper. This case study indicated that visualization of both predictions and prediction uncertainty offered a possibility to enhance visual exploration of the data uncertainty and to compare different prediction methods or predictions of totally different variables. The whitish region of the visualization map can be simply interpreted as unsatisfactory prediction results, where may need additional samples or more suitable prediction models for a better prediction results.
文摘According to the characteristics of bore data,a model of 3D geologic body with generalized tri-prism as the primitive modeling element is constructed while the modeling process and key algorithms of modeling are presented here in detail.Using this method,the original bore data go through Delaunay triangulation to generate irregular triangular network on the surface,and then links stratum segments on the adjoining bores in session to form tri-prisms which would be pinched out.Finally stratified 3D geologic body model is built by an iterated search which searches for consecutive layer of the same property.The result shows that this method can effectively simulate stratified stratum modeling.
文摘Data acquisition and modeling are the two important, difficult and costful aspects in a Cybercity project. 2D-GIS is mature and can manage a lot of spatial data. Thus 3D-GIS should make the best of data and technology of 2D-GIS. Construction of a useful synthetic environment requires integration of multiple types of information like DEM, texture images and 3D representation of objects such as buildings. In this paper, the method for 3D city landscape data model and visualization based on integrated databases is presented. Since the data volume of raster are very huge, special strategies(for example, pyramid gridded method) must be adopted in order to manage raster data efficiently. Three different methods of data acquisition, the proper data structure and a simple modeling method are presented as well. At last, a pilot project of Shanghai Cybercity is illustrated.
基金Project(41061043)supported by the National Natural Science Foundation of China
文摘Digital mine is the inevitable outcome of the information processing, and is also a complicated system engineering. Firstly, for the 3D visualization application of the digital mine, the ground and underground integrative visualization framework model was proposed based on the mine entity database. So, the visualization problem was availably resolved, as well as the professional analytical ability was improved. Secondly, aiming at the irregularities, non-uniformity, dynamics of mine entities, mix modeling method based on the entity character was put forward, in which 3D expression of mine entities was realized. Lastly, the 3D visualization project for a copper mine was experimentally studied. Satisfactory results were acquired, and the rationality of visualization model and feasibility of 3D modeling were validated.
基金This research was supported by the National Science Foundation,Grant No.CMS-984754(Dr.C.Astill program manager)the US Army Engineer Research and Development Center.
文摘The mechanisms of seismically-induced liquefaction of granular soils underhigh confining stresses are still not fully understood.Evaluation of these mechanisms is generallybased on extrapolation of observed behavior at shallow depths.Three centrifuge model tests wereconducted at RPI's experimental facility to investigate the effects of confining stresses on thedynamic response of a deep horizontal deposit of saturated sand.Liquefaction was observed at highconfining stresses in each of the tests.A system identification procedure was used to estimate theassociated shear strain and stress time histories.These histories revealed a response marked byshear strength degradation and dilative patterns.The recorded accelerations and pore pressures wereemployed to generate visual animations of the models.These visualizations revealed a liquefactionfront traveling downward and leading to large shear strains and isolation of upper soil layers.
文摘Based on the analysis of whole mining process in metal mines, it was pointed out that the investigation of the heavy metal pollution of tailings should be taken as an important project for a metal mine. Combined with the anlysis of the characteristics of tailings, it is found that the transformation of the heavy metal dissolution process, the heavy metal ions migration with groundwater and the heavy metal transport in porous media are three key aspects. Accordingly, the models of heavy metal pollution were established with providing boundary conditions. Depending upon a case of Ibnglushan Copper Mine railings and its relevant area from Google maps', a three-dimensional grid view of the tailings was set up. By application of Fluent software, the contaminated process of the heavy metal pollutants in the tailings was shown through digital visualization pattern.
基金supported by HiTech Researchand Development Program of China under Grant No.2007AA10Z235
文摘The dynamic multichannel binocular visual image modeling is studied based on Internet of Things (IoT) Perception Layer, using mobile robot self-organizing network. By employing multigroup mobile robots with binocular visual system, the real visual images of the object will be obtained. Then through the mobile self-organizing network, a three-dimensional model is rebuilt by synthesizing the returned images. On this basis, we formalize a novel algorithm for multichannel binocular visual three-dimensional images based on fast three-dimensional modeling. Compared with the method based on single binocular visual system, the new algorithm can improve the Integrity and accuracy of the dynamic three-dimensional object modeling. The simulation results show that the new method can effectively accelerate the modeling speed, improve the similarity and not increase the data size.
基金Supported by National Natural Science Foundation of China(Grant Nos.U1564201,61573171,61403172,51305167)China Postdoctoral Science Foundation(Grant Nos.2015T80511,2014M561592)+3 种基金Jiangsu Provincial Natural Science Foundation of China(Grant No.BK20140555)Six Talent Peaks Project of Jiangsu Province,China(Grant Nos.2015-JXQC-012,2014-DZXX-040)Jiangsu Postdoctoral Science Foundation,China(Grant No.1402097C)Jiangsu University Scientific Research Foundation for Senior Professionals,China(Grant No.14JDG028)
文摘Traditional vehicle detection algorithms use traverse search based vehicle candidate generation and hand crafted based classifier training for vehicle candidate verification.These types of methods generally have high processing times and low vehicle detection performance.To address this issue,a visual saliency and deep sparse convolution hierarchical model based vehicle detection algorithm is proposed.A visual saliency calculation is firstly used to generate a small vehicle candidate area.The vehicle candidate sub images are then loaded into a sparse deep convolution hierarchical model with an SVM-based classifier to perform the final detection.The experimental results demonstrate that the proposed method is with 94.81% correct rate and 0.78% false detection rate on the existing datasets and the real road pictures captured by our group,which outperforms the existing state-of-the-art algorithms.More importantly,high discriminative multi-scale features are generated by deep sparse convolution network which has broad application prospects in target recognition in the field of intelligent vehicle.