Focused on the task of fast and accurate armored target detection in ground battlefield,a detection method based on multi-scale representation network(MS-RN) and shape-fixed Guided Anchor(SF-GA)scheme is proposed.Firs...Focused on the task of fast and accurate armored target detection in ground battlefield,a detection method based on multi-scale representation network(MS-RN) and shape-fixed Guided Anchor(SF-GA)scheme is proposed.Firstly,considering the large-scale variation and camouflage of armored target,a new MS-RN integrating contextual information in battlefield environment is designed.The MS-RN extracts deep features from templates with different scales and strengthens the detection ability of small targets.Armored targets of different sizes are detected on different representation features.Secondly,aiming at the accuracy and real-time detection requirements,improved shape-fixed Guided Anchor is used on feature maps of different scales to recommend regions of interests(ROIs).Different from sliding or random anchor,the SF-GA can filter out 80% of the regions while still improving the recall.A special detection dataset for armored target,named Armored Target Dataset(ARTD),is constructed,based on which the comparable experiments with state-of-art detection methods are conducted.Experimental results show that the proposed method achieves outstanding performance in detection accuracy and efficiency,especially when small armored targets are involved.展开更多
This paper aims at multi_scale representation of urban GIS,presenting a model to dynamically generalize the building on the basis of Delaunay triangulation model.Considering the constraints of position accuracy,statis...This paper aims at multi_scale representation of urban GIS,presenting a model to dynamically generalize the building on the basis of Delaunay triangulation model.Considering the constraints of position accuracy,statistical area balance and orthogonal characteristics in building cluster generalization,this paper gives a progressive algorithm of building cluster aggregation,including conflict detection (where),object (who) displacement,and geometrical combination operation (how).The algorithm has been realized in an interactive generalization system and some experiment illustrations are provided.展开更多
This paper contains a review of the development of research on multiple representations compiled from Geographic Information Systems (GIS), including data structure, formalization and storage, and intelligent zoom. A ...This paper contains a review of the development of research on multiple representations compiled from Geographic Information Systems (GIS), including data structure, formalization and storage, and intelligent zoom. A summary is also included of the problems of interconnectivity, consistency maintenance, dynamic query and coexisting updates, as well as a research review of multi-scale databases and related studies. Finally,research directions and foci are proposed for the future design and implementation of multi-scale GIS.展开更多
A large number of nanopores and complex fracture structures in shale reservoirs results in multi-scale flow of oil. With the development of shale oil reservoirs, the permeability of multi-scale media undergoes changes...A large number of nanopores and complex fracture structures in shale reservoirs results in multi-scale flow of oil. With the development of shale oil reservoirs, the permeability of multi-scale media undergoes changes due to stress sensitivity, which plays a crucial role in controlling pressure propagation and oil flow. This paper proposes a multi-scale coupled flow mathematical model of matrix nanopores, induced fractures, and hydraulic fractures. In this model, the micro-scale effects of shale oil flow in fractal nanopores, fractal induced fracture network, and stress sensitivity of multi-scale media are considered. We solved the model iteratively using Pedrosa transform, semi-analytic Segmented Bessel function, Laplace transform. The results of this model exhibit good agreement with the numerical solution and field production data, confirming the high accuracy of the model. As well, the influence of stress sensitivity on permeability, pressure and production is analyzed. It is shown that the permeability and production decrease significantly when induced fractures are weakly supported. Closed induced fractures can inhibit interporosity flow in the stimulated reservoir volume (SRV). It has been shown in sensitivity analysis that hydraulic fractures are beneficial to early production, and induced fractures in SRV are beneficial to middle production. The model can characterize multi-scale flow characteristics of shale oil, providing theoretical guidance for rapid productivity evaluation.展开更多
Multi-scale system remains a classical scientific problem in fluid dynamics,biology,etc.In the present study,a scheme of multi-scale Physics-informed neural networks is proposed to solve the boundary layer flow at hig...Multi-scale system remains a classical scientific problem in fluid dynamics,biology,etc.In the present study,a scheme of multi-scale Physics-informed neural networks is proposed to solve the boundary layer flow at high Reynolds numbers without any data.The flow is divided into several regions with different scales based on Prandtl's boundary theory.Different regions are solved with governing equations in different scales.The method of matched asymptotic expansions is used to make the flow field continuously.A flow on a semi infinite flat plate at a high Reynolds number is considered a multi-scale problem because the boundary layer scale is much smaller than the outer flow scale.The results are compared with the reference numerical solutions,which show that the msPINNs can solve the multi-scale problem of the boundary layer in high Reynolds number flows.This scheme can be developed for more multi-scale problems in the future.展开更多
Deep learning has been a catalyst for a transformative revo-lution in machine learning and computer vision in the past decade.Within these research domains,methods grounded in deep learning have exhibited exceptional ...Deep learning has been a catalyst for a transformative revo-lution in machine learning and computer vision in the past decade.Within these research domains,methods grounded in deep learning have exhibited exceptional performance across a spectrum of tasks.The success of deep learning methods can be attributed to their capability to derive potent representations from data,integral for a myriad of downstream applications.These representations encapsulate the intrinsic structure,fea-tures,or latent variables characterising the underlying statistics of visual data.Despite these achievements,the challenge per-sists in effectively conducting representation learning of visual data with deep models,particularly when confronted with vast and noisy datasets.This special issue is a dedicated platform for researchers worldwide to disseminate their latest,high-quality articles,aiming to enhance readers'comprehension of the principles,limitations,and diverse applications of repre-sentation learning in computer vision.展开更多
Due to the structural dependencies among concurrent events in the knowledge graph and the substantial amount of sequential correlation information carried by temporally adjacent events,we propose an Independent Recurr...Due to the structural dependencies among concurrent events in the knowledge graph and the substantial amount of sequential correlation information carried by temporally adjacent events,we propose an Independent Recurrent Temporal Graph Convolution Networks(IndRT-GCNets)framework to efficiently and accurately capture event attribute information.The framework models the knowledge graph sequences to learn the evolutionary represen-tations of entities and relations within each period.Firstly,by utilizing the temporal graph convolution module in the evolutionary representation unit,the framework captures the structural dependency relationships within the knowledge graph in each period.Meanwhile,to achieve better event representation and establish effective correlations,an independent recurrent neural network is employed to implement auto-regressive modeling.Furthermore,static attributes of entities in the entity-relation events are constrained andmerged using a static graph constraint to obtain optimal entity representations.Finally,the evolution of entity and relation representations is utilized to predict events in the next subsequent step.On multiple real-world datasets such as Freebase13(FB13),Freebase 15k(FB15K),WordNet11(WN11),WordNet18(WN18),FB15K-237,WN18RR,YAGO3-10,and Nell-995,the results of multiple evaluation indicators show that our proposed IndRT-GCNets framework outperforms most existing models on knowledge reasoning tasks,which validates the effectiveness and robustness.展开更多
Computer-aided diagnosis of pneumonia based on deep learning is a research hotspot.However,there are some problems that the features of different sizes and different directions are not sufficient when extracting the f...Computer-aided diagnosis of pneumonia based on deep learning is a research hotspot.However,there are some problems that the features of different sizes and different directions are not sufficient when extracting the features in lung X-ray images.A pneumonia classification model based on multi-scale directional feature enhancement MSD-Net is proposed in this paper.The main innovations are as follows:Firstly,the Multi-scale Residual Feature Extraction Module(MRFEM)is designed to effectively extract multi-scale features.The MRFEM uses dilated convolutions with different expansion rates to increase the receptive field and extract multi-scale features effectively.Secondly,the Multi-scale Directional Feature Perception Module(MDFPM)is designed,which uses a three-branch structure of different sizes convolution to transmit direction feature layer by layer,and focuses on the target region to enhance the feature information.Thirdly,the Axial Compression Former Module(ACFM)is designed to perform global calculations to enhance the perception ability of global features in different directions.To verify the effectiveness of the MSD-Net,comparative experiments and ablation experiments are carried out.In the COVID-19 RADIOGRAPHY DATABASE,the Accuracy,Recall,Precision,F1 Score,and Specificity of MSD-Net are 97.76%,95.57%,95.52%,95.52%,and 98.51%,respectively.In the chest X-ray dataset,the Accuracy,Recall,Precision,F1 Score and Specificity of MSD-Net are 97.78%,95.22%,96.49%,95.58%,and 98.11%,respectively.This model improves the accuracy of lung image recognition effectively and provides an important clinical reference to pneumonia Computer-Aided Diagnosis.展开更多
The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand an...The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information.In addition,the signs have different lengths,whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling,which disturbs the perception of complete signs.In this study,we propose a Multi-Scale Context-Aware network(MSCA-Net)to solve the aforementioned problems.Our MSCA-Net contains two main modules:(1)Multi-Scale Motion Attention(MSMA),which uses the differences among frames to perceive information of the hands and face in multiple spatial scales,replacing the heavy feature extractors;and(2)Multi-Scale Temporal Modeling(MSTM),which explores crucial temporal information in the sign language video from different temporal scales.We conduct extensive experiments using three widely used sign language datasets,i.e.,RWTH-PHOENIX-Weather-2014,RWTH-PHOENIX-Weather-2014T,and CSL-Daily.The proposed MSCA-Net achieve state-of-the-art performance,demonstrating the effectiveness of our approach.展开更多
Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false...Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false detection rates when applying object recognition algorithms tailored for remote sensing imagery.Additionally,these complexities contribute to inaccuracies in target localization and hinder precise target categorization.This paper addresses these challenges by proposing a solution:The YOLO-MFD model(YOLO-MFD:Remote Sensing Image Object Detection withMulti-scale Fusion Dynamic Head).Before presenting our method,we delve into the prevalent issues faced in remote sensing imagery analysis.Specifically,we emphasize the struggles of existing object recognition algorithms in comprehensively capturing critical image features amidst varying scales and complex backgrounds.To resolve these issues,we introduce a novel approach.First,we propose the implementation of a lightweight multi-scale module called CEF.This module significantly improves the model’s ability to comprehensively capture important image features by merging multi-scale feature information.It effectively addresses the issues of missed detection and mistaken alarms that are common in remote sensing imagery.Second,an additional layer of small target detection heads is added,and a residual link is established with the higher-level feature extraction module in the backbone section.This allows the model to incorporate shallower information,significantly improving the accuracy of target localization in remotely sensed images.Finally,a dynamic head attentionmechanism is introduced.This allows themodel to exhibit greater flexibility and accuracy in recognizing shapes and targets of different sizes.Consequently,the precision of object detection is significantly improved.The trial results show that the YOLO-MFD model shows improvements of 6.3%,3.5%,and 2.5%over the original YOLOv8 model in Precision,map@0.5 and map@0.5:0.95,separately.These results illustrate the clear advantages of the method.展开更多
Rock fracture mechanisms can be inferred from moment tensors(MT)inverted from microseismic events.However,MT can only be inverted for events whose waveforms are acquired across a network of sensors.This is limiting fo...Rock fracture mechanisms can be inferred from moment tensors(MT)inverted from microseismic events.However,MT can only be inverted for events whose waveforms are acquired across a network of sensors.This is limiting for underground mines where the microseismic stations often lack azimuthal coverage.Thus,there is a need for a method to invert fracture mechanisms using waveforms acquired by a sparse microseismic network.Here,we present a novel,multi-scale framework to classify whether a rock crack contracts or dilates based on a single waveform.The framework consists of a deep learning model that is initially trained on 2400000+manually labelled field-scale seismic and microseismic waveforms acquired across 692 stations.Transfer learning is then applied to fine-tune the model on 300000+MT-labelled labscale acoustic emission waveforms from 39 individual experiments instrumented with different sensor layouts,loading,and rock types in training.The optimal model achieves over 86%F-score on unseen waveforms at both the lab-and field-scale.This model outperforms existing empirical methods in classification of rock fracture mechanisms monitored by a sparse microseismic network.This facilitates rapid assessment of,and early warning against,various rock engineering hazard such as induced earthquakes and rock bursts.展开更多
User representation learning is crucial for capturing different user preferences,but it is also critical challenging because user intentions are latent and dispersed in complex and different patterns of user-generated...User representation learning is crucial for capturing different user preferences,but it is also critical challenging because user intentions are latent and dispersed in complex and different patterns of user-generated data,and thus cannot be measured directly.Text-based data models can learn user representations by mining latent semantics,which is beneficial to enhancing the semantic function of user representations.However,these technologies only extract common features in historical records and cannot represent changes in user intentions.However,sequential feature can express the user’s interests and intentions that change time by time.But the sequential recommendation results based on the user representation of the item lack the interpretability of preference factors.To address these issues,we propose in this paper a novel model with Dual-Layer User Representation,named DLUR,where the user’s intention is learned based on two different layer representations.Specifically,the latent semantic layer adds an interactive layer based on Transformer to extract keywords and key sentences in the text and serve as a basis for interpretation.The sequence layer uses the Transformer model to encode the user’s preference intention to clarify changes in the user’s intention.Therefore,this dual-layer user mode is more comprehensive than a single text mode or sequence mode and can effectually improve the performance of recommendations.Our extensive experiments on five benchmark datasets demonstrate DLUR’s performance over state-of-the-art recommendation models.In addition,DLUR’s ability to explain recommendation results is also demonstrated through some specific cases.展开更多
Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variati...Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.展开更多
Edge devices,due to their limited computational and storage resources,often require the use of compilers for program optimization.Therefore,ensuring the security and reliability of these compilers is of paramount impo...Edge devices,due to their limited computational and storage resources,often require the use of compilers for program optimization.Therefore,ensuring the security and reliability of these compilers is of paramount importance in the emerging field of edge AI.One widely used testing method for this purpose is fuzz testing,which detects bugs by inputting random test cases into the target program.However,this process consumes significant time and resources.To improve the efficiency of compiler fuzz testing,it is common practice to utilize test case prioritization techniques.Some researchers use machine learning to predict the code coverage of test cases,aiming to maximize the test capability for the target compiler by increasing the overall predicted coverage of the test cases.Nevertheless,these methods can only forecast the code coverage of the compiler at a specific optimization level,potentially missing many optimization-related bugs.In this paper,we introduce C-CORE(short for Clustering by Code Representation),the first framework to prioritize test cases according to their code representations,which are derived directly from the source codes.This approach avoids being limited to specific compiler states and extends to a broader range of compiler bugs.Specifically,we first train a scaled pre-trained programming language model to capture as many common features as possible from the test cases generated by a fuzzer.Using this pre-trained model,we then train two downstream models:one for predicting the likelihood of triggering a bug and another for identifying code representations associated with bugs.Subsequently,we cluster the test cases according to their code representations and select the highest-scoring test case from each cluster as the high-quality test case.This reduction in redundant testing cases leads to time savings.Comprehensive evaluation results reveal that code representations are better at distinguishing test capabilities,and C-CORE significantly enhances testing efficiency.Across four datasets,C-CORE increases the average of the percentage of faults detected(APFD)value by 0.16 to 0.31 and reduces test time by over 50% in 46% of cases.When compared to the best results from approaches using predicted code coverage,C-CORE improves the APFD value by 1.1% to 12.3% and achieves an overall time-saving of 159.1%.展开更多
Sparse representation is an effective data classification algorithm that depends on the known training samples to categorise the test sample.It has been widely used in various image classification tasks.Sparseness in ...Sparse representation is an effective data classification algorithm that depends on the known training samples to categorise the test sample.It has been widely used in various image classification tasks.Sparseness in sparse representation means that only a few of instances selected from all training samples can effectively convey the essential class-specific information of the test sample,which is very important for classification.For deformable images such as human faces,pixels at the same location of different images of the same subject usually have different intensities.Therefore,extracting features and correctly classifying such deformable objects is very hard.Moreover,the lighting,attitude and occlusion cause more difficulty.Considering the problems and challenges listed above,a novel image representation and classification algorithm is proposed.First,the authors’algorithm generates virtual samples by a non-linear variation method.This method can effectively extract the low-frequency information of space-domain features of the original image,which is very useful for representing deformable objects.The combination of the original and virtual samples is more beneficial to improve the clas-sification performance and robustness of the algorithm.Thereby,the authors’algorithm calculates the expression coefficients of the original and virtual samples separately using the sparse representation principle and obtains the final score by a designed efficient score fusion scheme.The weighting coefficients in the score fusion scheme are set entirely automatically.Finally,the algorithm classifies the samples based on the final scores.The experimental results show that our method performs better classification than conventional sparse representation algorithms.展开更多
Thermal conductivity is one of the most significant criterion of three-dimensional carbon fiber-reinforced SiC matrix composites(3D C/SiC).Represent volume element(RVE)models of microscale,void/matrix and mesoscale pr...Thermal conductivity is one of the most significant criterion of three-dimensional carbon fiber-reinforced SiC matrix composites(3D C/SiC).Represent volume element(RVE)models of microscale,void/matrix and mesoscale proposed in this work are used to simulate the thermal conductivity behaviors of the 3D C/SiC composites.An entirely new process is introduced to weave the preform with three-dimensional orthogonal architecture.The 3D steady-state analysis step is created for assessing the thermal conductivity behaviors of the composites by applying periodic temperature boundary conditions.Three RVE models of cuboid,hexagonal and fiber random distribution are respectively developed to comparatively study the influence of fiber package pattern on the thermal conductivities at the microscale.Besides,the effect of void morphology on the thermal conductivity of the matrix is analyzed by the void/matrix models.The prediction results at the mesoscale correspond closely to the experimental values.The effect of the porosities and fiber volume fractions on the thermal conductivities is also taken into consideration.The multi-scale models mentioned in this paper can be used to predict the thermal conductivity behaviors of other composites with complex structures.展开更多
Recent research advances in implicit neural representation have shown that a wide range of video data distributions are achieved by sharing model weights for Neural Representation for Videos(NeRV).While explicit metho...Recent research advances in implicit neural representation have shown that a wide range of video data distributions are achieved by sharing model weights for Neural Representation for Videos(NeRV).While explicit methods exist for accurately embedding ownership or copyright information in video data,the nascent NeRV framework has yet to address this issue comprehensively.In response,this paper introduces MarkINeRV,a scheme designed to embed watermarking information into video frames using an invertible neural network watermarking approach to protect the copyright of NeRV,which models the embedding and extraction of watermarks as a pair of inverse processes of a reversible network and employs the same network to achieve embedding and extraction of watermarks.It is just that the information flow is in the opposite direction.Additionally,a video frame quality enhancement module is incorporated to mitigate watermarking information losses in the rendering process and the possibility ofmalicious attacks during transmission,ensuring the accurate extraction of watermarking information through the invertible network’s inverse process.This paper evaluates the accuracy,robustness,and invisibility of MarkINeRV through multiple video datasets.The results demonstrate its efficacy in extracting watermarking information for copyright protection of NeRV.MarkINeRV represents a pioneering investigation into copyright issues surrounding NeRV.展开更多
Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representation...Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.展开更多
In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract i...In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract image features and project them into a feature space,thus evaluating the similarity between samples based on their relative distances within the metric space.To sufficiently extract feature information from limited sample data and mitigate the impact of constrained data vol-ume,a multi-scale feature extraction network is presented to capture data features at various scales during the process of image feature extraction.Additionally,the position of the prototype is fine-tuned by assigning weights to data points to mitigate the influence of outliers on the experiment.The loss function integrates contrastive loss and label-smoothing to bring similar data points closer and separate dissimilar data points within the metric space.Experimental evaluations are conducted on small-sample datasets mini-ImageNet and CUB200-2011.The method in this paper can achieve higher classification accuracy.Specifically,in the 5-way 1-shot experiment,classification accuracy reaches 50.13%and 66.79%respectively on these two datasets.Moreover,in the 5-way 5-shot ex-periment,accuracy of 66.79%and 85.91%are observed,respectively.展开更多
Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often...Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often handpicked and need more delicate operations in intelligent picking machines.Compared with traditional image processing techniques,deep learning models have stronger feature extraction capabilities,and better generalization and are more suitable for practical tea shoot harvesting.However,current research mostly focuses on shoot detection and cannot directly accomplish end-to-end shoot segmentation tasks.We propose a tea shoot instance segmentation model based on multi-scale mixed attention(Mask2FusionNet)using a dataset from the tea garden in Hangzhou.We further analyzed the characteristics of the tea shoot dataset,where the proportion of small to medium-sized targets is 89.9%.Our algorithm is compared with several mainstream object segmentation algorithms,and the results demonstrate that our model achieves an accuracy of 82%in recognizing the tea shoots,showing a better performance compared to other models.Through ablation experiments,we found that ResNet50,PointRend strategy,and the Feature Pyramid Network(FPN)architecture can improve performance by 1.6%,1.4%,and 2.4%,respectively.These experiments demonstrated that our proposed multi-scale and point selection strategy optimizes the feature extraction capability for overlapping small targets.The results indicate that the proposed Mask2FusionNet model can perform the shoot segmentation in unstructured environments,realizing the individual distinction of tea shoots,and complete extraction of the shoot edge contours with a segmentation accuracy of 82.0%.The research results can provide algorithmic support for the segmentation and intelligent harvesting of premium tea shoots at different scales.展开更多
基金supported by the National Key Research and Development Program of China under grant 2016YFC0802904National Natural Science Foundation of China under grant61671470the Postdoctoral Science Foundation Funded Project of China under grant 2017M623423。
文摘Focused on the task of fast and accurate armored target detection in ground battlefield,a detection method based on multi-scale representation network(MS-RN) and shape-fixed Guided Anchor(SF-GA)scheme is proposed.Firstly,considering the large-scale variation and camouflage of armored target,a new MS-RN integrating contextual information in battlefield environment is designed.The MS-RN extracts deep features from templates with different scales and strengthens the detection ability of small targets.Armored targets of different sizes are detected on different representation features.Secondly,aiming at the accuracy and real-time detection requirements,improved shape-fixed Guided Anchor is used on feature maps of different scales to recommend regions of interests(ROIs).Different from sliding or random anchor,the SF-GA can filter out 80% of the regions while still improving the recall.A special detection dataset for armored target,named Armored Target Dataset(ARTD),is constructed,based on which the comparable experiments with state-of-art detection methods are conducted.Experimental results show that the proposed method achieves outstanding performance in detection accuracy and efficiency,especially when small armored targets are involved.
文摘This paper aims at multi_scale representation of urban GIS,presenting a model to dynamically generalize the building on the basis of Delaunay triangulation model.Considering the constraints of position accuracy,statistical area balance and orthogonal characteristics in building cluster generalization,this paper gives a progressive algorithm of building cluster aggregation,including conflict detection (where),object (who) displacement,and geometrical combination operation (how).The algorithm has been realized in an interactive generalization system and some experiment illustrations are provided.
基金supported by the National Natural Science Foundation of China(Grant No.40471090)the Science Innovation Group of Beijing.
文摘This paper contains a review of the development of research on multiple representations compiled from Geographic Information Systems (GIS), including data structure, formalization and storage, and intelligent zoom. A summary is also included of the problems of interconnectivity, consistency maintenance, dynamic query and coexisting updates, as well as a research review of multi-scale databases and related studies. Finally,research directions and foci are proposed for the future design and implementation of multi-scale GIS.
基金This study was supported by the National Natural Science Foundation of China(U22B2075,52274056,51974356).
文摘A large number of nanopores and complex fracture structures in shale reservoirs results in multi-scale flow of oil. With the development of shale oil reservoirs, the permeability of multi-scale media undergoes changes due to stress sensitivity, which plays a crucial role in controlling pressure propagation and oil flow. This paper proposes a multi-scale coupled flow mathematical model of matrix nanopores, induced fractures, and hydraulic fractures. In this model, the micro-scale effects of shale oil flow in fractal nanopores, fractal induced fracture network, and stress sensitivity of multi-scale media are considered. We solved the model iteratively using Pedrosa transform, semi-analytic Segmented Bessel function, Laplace transform. The results of this model exhibit good agreement with the numerical solution and field production data, confirming the high accuracy of the model. As well, the influence of stress sensitivity on permeability, pressure and production is analyzed. It is shown that the permeability and production decrease significantly when induced fractures are weakly supported. Closed induced fractures can inhibit interporosity flow in the stimulated reservoir volume (SRV). It has been shown in sensitivity analysis that hydraulic fractures are beneficial to early production, and induced fractures in SRV are beneficial to middle production. The model can characterize multi-scale flow characteristics of shale oil, providing theoretical guidance for rapid productivity evaluation.
文摘Multi-scale system remains a classical scientific problem in fluid dynamics,biology,etc.In the present study,a scheme of multi-scale Physics-informed neural networks is proposed to solve the boundary layer flow at high Reynolds numbers without any data.The flow is divided into several regions with different scales based on Prandtl's boundary theory.Different regions are solved with governing equations in different scales.The method of matched asymptotic expansions is used to make the flow field continuously.A flow on a semi infinite flat plate at a high Reynolds number is considered a multi-scale problem because the boundary layer scale is much smaller than the outer flow scale.The results are compared with the reference numerical solutions,which show that the msPINNs can solve the multi-scale problem of the boundary layer in high Reynolds number flows.This scheme can be developed for more multi-scale problems in the future.
文摘Deep learning has been a catalyst for a transformative revo-lution in machine learning and computer vision in the past decade.Within these research domains,methods grounded in deep learning have exhibited exceptional performance across a spectrum of tasks.The success of deep learning methods can be attributed to their capability to derive potent representations from data,integral for a myriad of downstream applications.These representations encapsulate the intrinsic structure,fea-tures,or latent variables characterising the underlying statistics of visual data.Despite these achievements,the challenge per-sists in effectively conducting representation learning of visual data with deep models,particularly when confronted with vast and noisy datasets.This special issue is a dedicated platform for researchers worldwide to disseminate their latest,high-quality articles,aiming to enhance readers'comprehension of the principles,limitations,and diverse applications of repre-sentation learning in computer vision.
基金the National Natural Science Founda-tion of China(62062062)hosted by Gulila Altenbek.
文摘Due to the structural dependencies among concurrent events in the knowledge graph and the substantial amount of sequential correlation information carried by temporally adjacent events,we propose an Independent Recurrent Temporal Graph Convolution Networks(IndRT-GCNets)framework to efficiently and accurately capture event attribute information.The framework models the knowledge graph sequences to learn the evolutionary represen-tations of entities and relations within each period.Firstly,by utilizing the temporal graph convolution module in the evolutionary representation unit,the framework captures the structural dependency relationships within the knowledge graph in each period.Meanwhile,to achieve better event representation and establish effective correlations,an independent recurrent neural network is employed to implement auto-regressive modeling.Furthermore,static attributes of entities in the entity-relation events are constrained andmerged using a static graph constraint to obtain optimal entity representations.Finally,the evolution of entity and relation representations is utilized to predict events in the next subsequent step.On multiple real-world datasets such as Freebase13(FB13),Freebase 15k(FB15K),WordNet11(WN11),WordNet18(WN18),FB15K-237,WN18RR,YAGO3-10,and Nell-995,the results of multiple evaluation indicators show that our proposed IndRT-GCNets framework outperforms most existing models on knowledge reasoning tasks,which validates the effectiveness and robustness.
基金supported in part by the National Natural Science Foundation of China(Grant No.62062003)Natural Science Foundation of Ningxia(Grant No.2023AAC03293).
文摘Computer-aided diagnosis of pneumonia based on deep learning is a research hotspot.However,there are some problems that the features of different sizes and different directions are not sufficient when extracting the features in lung X-ray images.A pneumonia classification model based on multi-scale directional feature enhancement MSD-Net is proposed in this paper.The main innovations are as follows:Firstly,the Multi-scale Residual Feature Extraction Module(MRFEM)is designed to effectively extract multi-scale features.The MRFEM uses dilated convolutions with different expansion rates to increase the receptive field and extract multi-scale features effectively.Secondly,the Multi-scale Directional Feature Perception Module(MDFPM)is designed,which uses a three-branch structure of different sizes convolution to transmit direction feature layer by layer,and focuses on the target region to enhance the feature information.Thirdly,the Axial Compression Former Module(ACFM)is designed to perform global calculations to enhance the perception ability of global features in different directions.To verify the effectiveness of the MSD-Net,comparative experiments and ablation experiments are carried out.In the COVID-19 RADIOGRAPHY DATABASE,the Accuracy,Recall,Precision,F1 Score,and Specificity of MSD-Net are 97.76%,95.57%,95.52%,95.52%,and 98.51%,respectively.In the chest X-ray dataset,the Accuracy,Recall,Precision,F1 Score and Specificity of MSD-Net are 97.78%,95.22%,96.49%,95.58%,and 98.11%,respectively.This model improves the accuracy of lung image recognition effectively and provides an important clinical reference to pneumonia Computer-Aided Diagnosis.
基金Supported by the National Natural Science Foundation of China(62072334).
文摘The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information.In addition,the signs have different lengths,whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling,which disturbs the perception of complete signs.In this study,we propose a Multi-Scale Context-Aware network(MSCA-Net)to solve the aforementioned problems.Our MSCA-Net contains two main modules:(1)Multi-Scale Motion Attention(MSMA),which uses the differences among frames to perceive information of the hands and face in multiple spatial scales,replacing the heavy feature extractors;and(2)Multi-Scale Temporal Modeling(MSTM),which explores crucial temporal information in the sign language video from different temporal scales.We conduct extensive experiments using three widely used sign language datasets,i.e.,RWTH-PHOENIX-Weather-2014,RWTH-PHOENIX-Weather-2014T,and CSL-Daily.The proposed MSCA-Net achieve state-of-the-art performance,demonstrating the effectiveness of our approach.
基金the Scientific Research Fund of Hunan Provincial Education Department(23A0423).
文摘Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false detection rates when applying object recognition algorithms tailored for remote sensing imagery.Additionally,these complexities contribute to inaccuracies in target localization and hinder precise target categorization.This paper addresses these challenges by proposing a solution:The YOLO-MFD model(YOLO-MFD:Remote Sensing Image Object Detection withMulti-scale Fusion Dynamic Head).Before presenting our method,we delve into the prevalent issues faced in remote sensing imagery analysis.Specifically,we emphasize the struggles of existing object recognition algorithms in comprehensively capturing critical image features amidst varying scales and complex backgrounds.To resolve these issues,we introduce a novel approach.First,we propose the implementation of a lightweight multi-scale module called CEF.This module significantly improves the model’s ability to comprehensively capture important image features by merging multi-scale feature information.It effectively addresses the issues of missed detection and mistaken alarms that are common in remote sensing imagery.Second,an additional layer of small target detection heads is added,and a residual link is established with the higher-level feature extraction module in the backbone section.This allows the model to incorporate shallower information,significantly improving the accuracy of target localization in remotely sensed images.Finally,a dynamic head attentionmechanism is introduced.This allows themodel to exhibit greater flexibility and accuracy in recognizing shapes and targets of different sizes.Consequently,the precision of object detection is significantly improved.The trial results show that the YOLO-MFD model shows improvements of 6.3%,3.5%,and 2.5%over the original YOLOv8 model in Precision,map@0.5 and map@0.5:0.95,separately.These results illustrate the clear advantages of the method.
基金supported by Western Research Interdisciplinary Initiative R6259A03.
文摘Rock fracture mechanisms can be inferred from moment tensors(MT)inverted from microseismic events.However,MT can only be inverted for events whose waveforms are acquired across a network of sensors.This is limiting for underground mines where the microseismic stations often lack azimuthal coverage.Thus,there is a need for a method to invert fracture mechanisms using waveforms acquired by a sparse microseismic network.Here,we present a novel,multi-scale framework to classify whether a rock crack contracts or dilates based on a single waveform.The framework consists of a deep learning model that is initially trained on 2400000+manually labelled field-scale seismic and microseismic waveforms acquired across 692 stations.Transfer learning is then applied to fine-tune the model on 300000+MT-labelled labscale acoustic emission waveforms from 39 individual experiments instrumented with different sensor layouts,loading,and rock types in training.The optimal model achieves over 86%F-score on unseen waveforms at both the lab-and field-scale.This model outperforms existing empirical methods in classification of rock fracture mechanisms monitored by a sparse microseismic network.This facilitates rapid assessment of,and early warning against,various rock engineering hazard such as induced earthquakes and rock bursts.
基金supported by the Applied Research Center of Artificial Intelligence,Wuhan College(Grant Number X2020113)the Wuhan College Research Project(Grant Number KYZ202009).
文摘User representation learning is crucial for capturing different user preferences,but it is also critical challenging because user intentions are latent and dispersed in complex and different patterns of user-generated data,and thus cannot be measured directly.Text-based data models can learn user representations by mining latent semantics,which is beneficial to enhancing the semantic function of user representations.However,these technologies only extract common features in historical records and cannot represent changes in user intentions.However,sequential feature can express the user’s interests and intentions that change time by time.But the sequential recommendation results based on the user representation of the item lack the interpretability of preference factors.To address these issues,we propose in this paper a novel model with Dual-Layer User Representation,named DLUR,where the user’s intention is learned based on two different layer representations.Specifically,the latent semantic layer adds an interactive layer based on Transformer to extract keywords and key sentences in the text and serve as a basis for interpretation.The sequence layer uses the Transformer model to encode the user’s preference intention to clarify changes in the user’s intention.Therefore,this dual-layer user mode is more comprehensive than a single text mode or sequence mode and can effectually improve the performance of recommendations.Our extensive experiments on five benchmark datasets demonstrate DLUR’s performance over state-of-the-art recommendation models.In addition,DLUR’s ability to explain recommendation results is also demonstrated through some specific cases.
基金the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+2 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211).
文摘Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.
文摘Edge devices,due to their limited computational and storage resources,often require the use of compilers for program optimization.Therefore,ensuring the security and reliability of these compilers is of paramount importance in the emerging field of edge AI.One widely used testing method for this purpose is fuzz testing,which detects bugs by inputting random test cases into the target program.However,this process consumes significant time and resources.To improve the efficiency of compiler fuzz testing,it is common practice to utilize test case prioritization techniques.Some researchers use machine learning to predict the code coverage of test cases,aiming to maximize the test capability for the target compiler by increasing the overall predicted coverage of the test cases.Nevertheless,these methods can only forecast the code coverage of the compiler at a specific optimization level,potentially missing many optimization-related bugs.In this paper,we introduce C-CORE(short for Clustering by Code Representation),the first framework to prioritize test cases according to their code representations,which are derived directly from the source codes.This approach avoids being limited to specific compiler states and extends to a broader range of compiler bugs.Specifically,we first train a scaled pre-trained programming language model to capture as many common features as possible from the test cases generated by a fuzzer.Using this pre-trained model,we then train two downstream models:one for predicting the likelihood of triggering a bug and another for identifying code representations associated with bugs.Subsequently,we cluster the test cases according to their code representations and select the highest-scoring test case from each cluster as the high-quality test case.This reduction in redundant testing cases leads to time savings.Comprehensive evaluation results reveal that code representations are better at distinguishing test capabilities,and C-CORE significantly enhances testing efficiency.Across four datasets,C-CORE increases the average of the percentage of faults detected(APFD)value by 0.16 to 0.31 and reduces test time by over 50% in 46% of cases.When compared to the best results from approaches using predicted code coverage,C-CORE improves the APFD value by 1.1% to 12.3% and achieves an overall time-saving of 159.1%.
文摘Sparse representation is an effective data classification algorithm that depends on the known training samples to categorise the test sample.It has been widely used in various image classification tasks.Sparseness in sparse representation means that only a few of instances selected from all training samples can effectively convey the essential class-specific information of the test sample,which is very important for classification.For deformable images such as human faces,pixels at the same location of different images of the same subject usually have different intensities.Therefore,extracting features and correctly classifying such deformable objects is very hard.Moreover,the lighting,attitude and occlusion cause more difficulty.Considering the problems and challenges listed above,a novel image representation and classification algorithm is proposed.First,the authors’algorithm generates virtual samples by a non-linear variation method.This method can effectively extract the low-frequency information of space-domain features of the original image,which is very useful for representing deformable objects.The combination of the original and virtual samples is more beneficial to improve the clas-sification performance and robustness of the algorithm.Thereby,the authors’algorithm calculates the expression coefficients of the original and virtual samples separately using the sparse representation principle and obtains the final score by a designed efficient score fusion scheme.The weighting coefficients in the score fusion scheme are set entirely automatically.Finally,the algorithm classifies the samples based on the final scores.The experimental results show that our method performs better classification than conventional sparse representation algorithms.
基金Supported by Science Center for Gas Turbine Project of China (Grant No.P2022-B-IV-014-001)Frontier Leading Technology Basic Research Special Project of Jiangsu Province of China (Grant No.BK20212007)the BIT Research and Innovation Promoting Project of China (Grant No.2022YCXZ019)。
文摘Thermal conductivity is one of the most significant criterion of three-dimensional carbon fiber-reinforced SiC matrix composites(3D C/SiC).Represent volume element(RVE)models of microscale,void/matrix and mesoscale proposed in this work are used to simulate the thermal conductivity behaviors of the 3D C/SiC composites.An entirely new process is introduced to weave the preform with three-dimensional orthogonal architecture.The 3D steady-state analysis step is created for assessing the thermal conductivity behaviors of the composites by applying periodic temperature boundary conditions.Three RVE models of cuboid,hexagonal and fiber random distribution are respectively developed to comparatively study the influence of fiber package pattern on the thermal conductivities at the microscale.Besides,the effect of void morphology on the thermal conductivity of the matrix is analyzed by the void/matrix models.The prediction results at the mesoscale correspond closely to the experimental values.The effect of the porosities and fiber volume fractions on the thermal conductivities is also taken into consideration.The multi-scale models mentioned in this paper can be used to predict the thermal conductivity behaviors of other composites with complex structures.
基金supported by the National Natural Science Foundation of China,with Fund Numbers 62272478,62102451the National Defense Science and Technology Independent Research Project(Intelligent Information Hiding Technology and Its Applications in a Certain Field)and Science and Technology Innovation Team Innovative Research Project“Research on Key Technologies for Intelligent Information Hiding”with Fund Number ZZKY20222102.
文摘Recent research advances in implicit neural representation have shown that a wide range of video data distributions are achieved by sharing model weights for Neural Representation for Videos(NeRV).While explicit methods exist for accurately embedding ownership or copyright information in video data,the nascent NeRV framework has yet to address this issue comprehensively.In response,this paper introduces MarkINeRV,a scheme designed to embed watermarking information into video frames using an invertible neural network watermarking approach to protect the copyright of NeRV,which models the embedding and extraction of watermarks as a pair of inverse processes of a reversible network and employs the same network to achieve embedding and extraction of watermarks.It is just that the information flow is in the opposite direction.Additionally,a video frame quality enhancement module is incorporated to mitigate watermarking information losses in the rendering process and the possibility ofmalicious attacks during transmission,ensuring the accurate extraction of watermarking information through the invertible network’s inverse process.This paper evaluates the accuracy,robustness,and invisibility of MarkINeRV through multiple video datasets.The results demonstrate its efficacy in extracting watermarking information for copyright protection of NeRV.MarkINeRV represents a pioneering investigation into copyright issues surrounding NeRV.
基金funded by the Major Science and Technology Projects in Henan Province,China,Grant No.221100210600.
文摘Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.
基金the Scientific Research Foundation of Liaoning Provincial Department of Education(No.LJKZ0139)the Program for Liaoning Excellent Talents in University(No.LR15045).
文摘In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract image features and project them into a feature space,thus evaluating the similarity between samples based on their relative distances within the metric space.To sufficiently extract feature information from limited sample data and mitigate the impact of constrained data vol-ume,a multi-scale feature extraction network is presented to capture data features at various scales during the process of image feature extraction.Additionally,the position of the prototype is fine-tuned by assigning weights to data points to mitigate the influence of outliers on the experiment.The loss function integrates contrastive loss and label-smoothing to bring similar data points closer and separate dissimilar data points within the metric space.Experimental evaluations are conducted on small-sample datasets mini-ImageNet and CUB200-2011.The method in this paper can achieve higher classification accuracy.Specifically,in the 5-way 1-shot experiment,classification accuracy reaches 50.13%and 66.79%respectively on these two datasets.Moreover,in the 5-way 5-shot ex-periment,accuracy of 66.79%and 85.91%are observed,respectively.
基金This research was supported by the National Natural Science Foundation of China No.62276086the National Key R&D Program of China No.2022YFD2000100Zhejiang Provincial Natural Science Foundation of China under Grant No.LTGN23D010002.
文摘Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often handpicked and need more delicate operations in intelligent picking machines.Compared with traditional image processing techniques,deep learning models have stronger feature extraction capabilities,and better generalization and are more suitable for practical tea shoot harvesting.However,current research mostly focuses on shoot detection and cannot directly accomplish end-to-end shoot segmentation tasks.We propose a tea shoot instance segmentation model based on multi-scale mixed attention(Mask2FusionNet)using a dataset from the tea garden in Hangzhou.We further analyzed the characteristics of the tea shoot dataset,where the proportion of small to medium-sized targets is 89.9%.Our algorithm is compared with several mainstream object segmentation algorithms,and the results demonstrate that our model achieves an accuracy of 82%in recognizing the tea shoots,showing a better performance compared to other models.Through ablation experiments,we found that ResNet50,PointRend strategy,and the Feature Pyramid Network(FPN)architecture can improve performance by 1.6%,1.4%,and 2.4%,respectively.These experiments demonstrated that our proposed multi-scale and point selection strategy optimizes the feature extraction capability for overlapping small targets.The results indicate that the proposed Mask2FusionNet model can perform the shoot segmentation in unstructured environments,realizing the individual distinction of tea shoots,and complete extraction of the shoot edge contours with a segmentation accuracy of 82.0%.The research results can provide algorithmic support for the segmentation and intelligent harvesting of premium tea shoots at different scales.