Single-pixel imaging(SPI)can transform 2D or 3D image data into 1D light signals,which offers promising prospects for image compression and transmission.However,during data communication these light signals in public ...Single-pixel imaging(SPI)can transform 2D or 3D image data into 1D light signals,which offers promising prospects for image compression and transmission.However,during data communication these light signals in public channels will easily draw the attention of eavesdroppers.Here,we introduce an efficient encryption method for SPI data transmission that uses the 3D Arnold transformation to directly disrupt 1D single-pixel light signals and utilizes the elliptic curve encryption algorithm for key transmission.This encryption scheme immediately employs Hadamard patterns to illuminate the scene and then utilizes the 3D Arnold transformation to permutate the 1D light signal of single-pixel detection.Then the transformation parameters serve as the secret key,while the security of key exchange is guaranteed by an elliptic curve-based key exchange mechanism.Compared with existing encryption schemes,both computer simulations and optical experiments have been conducted to demonstrate that the proposed technique not only enhances the security of encryption but also eliminates the need for complicated pattern scrambling rules.Additionally,this approach solves the problem of secure key transmission,thus ensuring the security of information and the quality of the decrypted images.展开更多
Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input t...Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input to augment the RGB images.Depth-based methods attempt to convert estimated depth maps to pseudo-LiDAR and then use LiDAR-based object detectors or focus on the perspective of image and depth fusion learning.However,they demonstrate limited performance and efficiency as a result of depth inaccuracy and complex fusion mode with convolutions.Different from these approaches,our proposed depth-guided vision transformer with a normalizing flows(NF-DVT)network uses normalizing flows to build priors in depth maps to achieve more accurate depth information.Then we develop a novel Swin-Transformer-based backbone with a fusion module to process RGB image patches and depth map patches with two separate branches and fuse them using cross-attention to exchange information with each other.Furthermore,with the help of pixel-wise relative depth values in depth maps,we develop new relative position embeddings in the cross-attention mechanism to capture more accurate sequence ordering of input tokens.Our method is the first Swin-Transformer-based backbone architecture for monocular 3D object detection.The experimental results on the KITTI and the challenging Waymo Open datasets show the effectiveness of our proposed method and superior performance over previous counterparts.展开更多
近年来,语音驱动的3D面部动画得到了广泛的研究,虽然先前的工作可以从语音数据中生成连贯的3D面部动画,但是由于视听数据的稀缺性,生成的3D面部动画缺乏真实感和生动性,嘴唇运动的准确性不高。为了提高嘴唇运动的准确性和生动性,本文提...近年来,语音驱动的3D面部动画得到了广泛的研究,虽然先前的工作可以从语音数据中生成连贯的3D面部动画,但是由于视听数据的稀缺性,生成的3D面部动画缺乏真实感和生动性,嘴唇运动的准确性不高。为了提高嘴唇运动的准确性和生动性,本文提出了一种新的模型HBF Talk (端到端的神经网络模型),通过使用Hu BERT (Hidden-Unit BERT)预训练模型对语音数据进行特征提取和编码,引入Flash模块对提取到的语音特征表示进行进一步的编码,获得更为丰富的语音特征上下文表示,最后使用带偏置的跨模态Transformer解码器进行解码。本文进行了定量和定性实验,并与现有的基线模型进行比较,显示本文HBF Talk模型具有更好的性能,提高了语音驱动的嘴唇运动的准确性和生动性。In recent years, speech-driven 3D facial animation has been widely studied. Previous work on the generation of coherent 3D facial animations was reported from speech data. However, the generated 3D facial animations lacks realism and vividness due to the scarcity of audio-visual data, and the accuracy of lip movements is not sufficient. This work is performed in order to improve the accuracy and vividness of lip movement and an end-to-end neural network model, HBF Talk, is proposed. It utilizes the Hu BERT (Hidden-Unit BERT) pre-trained model for feature extraction and encoding of speech data. The Flash module is introduced to further encode the extracted speech feature representations, resulting in more enriched contextual representations of speech features. Finally, a biased cross-modal Transformer decoder is used for decoding. This paper conducts both quantitative and qualitative experiments and compares the results with existing baseline models, demonstrating the proposed HBF Talk model outperforms previous models by improving the accuracy and liveliness of speech-driven lip movements.展开更多
In this work, a new method to deal with the unconnected pixels in motion compensated temporal filtering (MCTF) is presented, which is designed to improve the performance of 3D lifted wavelet coding. Furthermore, multi...In this work, a new method to deal with the unconnected pixels in motion compensated temporal filtering (MCTF) is presented, which is designed to improve the performance of 3D lifted wavelet coding. Furthermore, multiple description scalable coding (MDSC) is investigated, and novel MDSC schemes based on 3D wavelet coding are proposed, using the lifting imple- mentation of temporal filtering. The proposed MDSC schemes can avoid the mismatch problem in multiple description video coding, and have high scalability and robustness of video transmission. Experimental results showed that the proposed schemes are feasible and adequately effective.展开更多
This paper presents an optimized 3-D Discrete Wavelet Transform (3-DDWT) architecture. 1-DDWT employed for the design of 3-DDWT architecture uses reduced lifting scheme approach. Further the architecture is optimized ...This paper presents an optimized 3-D Discrete Wavelet Transform (3-DDWT) architecture. 1-DDWT employed for the design of 3-DDWT architecture uses reduced lifting scheme approach. Further the architecture is optimized by applying block enabling technique, scaling, and rounding of the filter coefficients. The proposed architecture uses biorthogonal (9/7) wavelet filter. The architecture is modeled using Verilog HDL, simulated using ModelSim, synthesized using Xilinx ISE and finally implemented on Virtex-5 FPGA. The proposed 3-DDWT architecture has slice register utilization of 5%, operating frequency of 396 MHz and a power consumption of 0.45 W.展开更多
In this paper, we built upon the estimating primaries by sparse inversion (EPSI) method. We use the 3D curvelet transform and modify the EPSI method to the sparse inversion of the biconvex optimization and Ll-norm r...In this paper, we built upon the estimating primaries by sparse inversion (EPSI) method. We use the 3D curvelet transform and modify the EPSI method to the sparse inversion of the biconvex optimization and Ll-norm regularization, and use alternating optimization to directly estimate the primary reflection coefficients and source wavelet. The 3D curvelet transform is used as a sparseness constraint when inverting the primary reflection coefficients, which results in avoiding the prediction subtraction process in the surface-related multiples elimination (SRME) method. The proposed method not only reduces the damage to the effective waves but also improves the elimination of multiples. It is also a wave equation- based method for elimination of surface multiple reflections, which effectively removes surface multiples under complex submarine conditions.展开更多
激光雷达点云3D物体检测,对于小物体如行人、自行车的检测精度较低,容易漏检误检,提出一种多尺度Transformer激光雷达点云3D物体检测方法 MSPT-RCNN(multi-scale point transformer-RCNN),提高点云3D物体检测精度。该方法包含两个阶段,...激光雷达点云3D物体检测,对于小物体如行人、自行车的检测精度较低,容易漏检误检,提出一种多尺度Transformer激光雷达点云3D物体检测方法 MSPT-RCNN(multi-scale point transformer-RCNN),提高点云3D物体检测精度。该方法包含两个阶段,即第一阶段(RPN)和第二阶段(RCNN)。RPN阶段通过多尺度Transformer网络提取点云特征,该网络包含多尺度邻域嵌入模块和跳跃连接偏移注意力模块,获取多尺度邻域几何信息和不同层次全局语义信息,生成高质量初始3D包围盒;在RCNN阶段,引入包围盒内的点云多尺度邻域几何信息,优化了包围盒位置、尺寸、朝向和置信度等信息。实验结果表明,该方法(MSPT-RCNN)具有较高检测精度,特别是对于远处和较小物体,提升更高。MSPT-RCNN通过有效学习点云数据中的多尺度几何信息,提取不同层次有效的语义信息,能够有效提升3D物体检测精度。展开更多
A new improved Goh's 3 D wavelet transform(WT) coding scheme is presented in this paper. The new scheme has great advantages including a simple code structure, low computation cost and good performance in PSNR, c...A new improved Goh's 3 D wavelet transform(WT) coding scheme is presented in this paper. The new scheme has great advantages including a simple code structure, low computation cost and good performance in PSNR, compression ratios and visual quality of reconstructions, when compared to the other existing 3 D WT coding methods and the 2 D WT based coding methods. The new 3 D WT coding scheme is suitable for very low bit rate video coding.展开更多
A new motion compensated 3 D wavelet transform (MC 3DWT) video coding scheme is presented in this paper. The new coding scheme has a good performance in average PSNR, compression ratio and visual quality of reconst...A new motion compensated 3 D wavelet transform (MC 3DWT) video coding scheme is presented in this paper. The new coding scheme has a good performance in average PSNR, compression ratio and visual quality of reconstructions compared with the existing 3 D wavelet transform (3DWT) coding methods and motion compensated 2 D wavelet transform (MC WT) coding method. The new MC 3DWT coding scheme is suitable for very low bit rate video coding.展开更多
This paper presents an algorithm for coding video signal based on 3-D wavelet transformation. When the frame order t of a video signal is replaced by order 2, the video signal can be looked as a block in 3-D space. Af...This paper presents an algorithm for coding video signal based on 3-D wavelet transformation. When the frame order t of a video signal is replaced by order 2, the video signal can be looked as a block in 3-D space. After splitting the block into smaller sub-blocks, imitate the method of 2-D wavelet transformation for images, we can transform the sub-blocks with 3-D wavelet. Most of video signal energy is in the decomposed low-frequency sub-bands. These sub-bands affect the visual quality of the video signal most. Quantizing different sub-bands with different precision and then entropy encoding each sub-band, we can eliminate inter- and intra-frame redundancy of the video signal and compress data. Our simulation experiments show that this algorithm can achieve very good result.展开更多
Textile-reinforced composites,due to their excellent highstrength-to-low-mass ratio, provide promising alternatives to conventional structural materials in many high-tech sectors. 3D braided composites are a kind of a...Textile-reinforced composites,due to their excellent highstrength-to-low-mass ratio, provide promising alternatives to conventional structural materials in many high-tech sectors. 3D braided composites are a kind of advanced composites reinforced with 3D braided fabrics; the complex nature of 3D braided composites makes the evaluation of the quality of the product very difficult. In this investigation,a defect recognition platform for 3D braided composites evaluation was constructed based on dual-tree complex wavelet packet transform( DT-CWPT) and backpropagation( BP) neural networks. The defects in 3D braided composite materials were probed and detected by an ultrasonic sensing system. DT-CWPT method was used to analyze the ultrasonic scanning pulse signals,and the feature vectors of these signals were extracted into the BP neural networks as samples. The type of defects was identified and recognized with the characteristic ultrasonic wave spectra. The position of defects for the test samples can be determined at the same time. This method would have great potential to evaluate the quality of 3D braided composites.展开更多
Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shap...Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shapes and sizes.The popular deep learning‐based segmentation algorithms generally rely on the convolutional neural network(CNN)and Transformer.The former cannot extract the global image features effectively while the latter lacks the inductive bias and involves the complicated computation for 3D volume data.The existing hybrid CNN‐Transformer network can only provide the limited performance improvement or even poorer segmentation performance than the pure CNN.To address these issues,a short‐term and long‐term memory self‐attention network is proposed.Firstly,a distinctive self‐attention block uses the Transformer to explore the correlation among the region features at different levels extracted by the CNN.Then,the memory structure filters and combines the above information to exclude the similar regions and detect the multiple tumours.Finally,the multi‐layer reconstruction blocks will predict the tumour boundaries.Experimental results demonstrate that our method outperforms other methods in terms of subjective visual and quantitative evaluation.Compared with the most competitive method,the proposed method provides Dice(82.4%vs.76.6%)and Hausdorff distance 95%(HD95)(10.66 vs.11.54 mm)on the KiTS19 as well as Dice(80.2%vs.78.4%)and HD95(9.632 vs.12.17 mm)on the LiTS.展开更多
In the construction and maintenance of particle accelerators,all the accelerator elements should be installed in the same coordinate system,only in this way could the devices in the actual world be consistent with the...In the construction and maintenance of particle accelerators,all the accelerator elements should be installed in the same coordinate system,only in this way could the devices in the actual world be consistent with the design drawings.However,with the occurrence of the movements of the reinforced concrete cover plates at short notice or building deformations in the long term,the control points upon the engineering structure will be displaced,and the fitness between the subnetwork and the global control network may be irresponsible.Therefore,it is necessary to evaluate the deformations of the 3D alignment control network.Different from the extant investigations,in this paper,to characterize the deformations of the control network,all of the congruent models between the points measured in different epochs have been identified,and the congruence model with the most control points is considered as the primary or fundamental model,the remaining models are recognized as the additional ones.Furthermore,the discrepancies between the primary S-transformation parameters and the additional S-transformation parameters can reflect the relative movements of the additional congruence models.Both the iterative GCT method and the iterative combinatorial theory are proposed to detect multiple congruence models in the control network.Considering the actual work of the alignment,it is essential to identify the competitive models in the monitoring network,which can provide us a hint that,even the fitness between the subnetwork and the global control network is good,there are still deformations which may be ignored.The numerical experiments show that the suggested approaches can describe the deformation of the 3D alignment control network roundly.展开更多
Synthetic aperture radar(SAR)three-dimensional(3D)imaging technology can reconstruct the complete structure of observed targets and has been a hot topic.Compared with tomographic SAR,array interferometric SAR,and circ...Synthetic aperture radar(SAR)three-dimensional(3D)imaging technology can reconstruct the complete structure of observed targets and has been a hot topic.Compared with tomographic SAR,array interferometric SAR,and circular SAR,curve SAR can use less data to achieve 3D positioning of targets.Most existing algorithms for estimating Doppler frequency modulation(FM)rate are based on sub aperture partitioning,resulting in low computational efficiency.To address this,this article establishes a target height estimation model,which reflects the relation-ship between the height and the residual Doppler FM rate for spaceborne curve SAR.Then,a fast SAR 3D localization processing flow based on fractional Fourier transform(FrFT)is proposed.Experimental verification demonstrates that this method can estimate the Doppler FM of the target column by column,and the 3D position error for non-overlapping targets is controlled within 1 m.For overlapping points with an intensity ratio greater than 1.5,the root mean square error(RMSE)of the estimation results is around 5 m.If the separation between overlapping points is greater than 35 m,the RMSE decreases to approximately 2 m.展开更多
基金Project supported by the National Natural Science Foundation of China(Grant No.62075241).
文摘Single-pixel imaging(SPI)can transform 2D or 3D image data into 1D light signals,which offers promising prospects for image compression and transmission.However,during data communication these light signals in public channels will easily draw the attention of eavesdroppers.Here,we introduce an efficient encryption method for SPI data transmission that uses the 3D Arnold transformation to directly disrupt 1D single-pixel light signals and utilizes the elliptic curve encryption algorithm for key transmission.This encryption scheme immediately employs Hadamard patterns to illuminate the scene and then utilizes the 3D Arnold transformation to permutate the 1D light signal of single-pixel detection.Then the transformation parameters serve as the secret key,while the security of key exchange is guaranteed by an elliptic curve-based key exchange mechanism.Compared with existing encryption schemes,both computer simulations and optical experiments have been conducted to demonstrate that the proposed technique not only enhances the security of encryption but also eliminates the need for complicated pattern scrambling rules.Additionally,this approach solves the problem of secure key transmission,thus ensuring the security of information and the quality of the decrypted images.
基金supported in part by the Major Project for New Generation of AI (2018AAA0100400)the National Natural Science Foundation of China (61836014,U21B2042,62072457,62006231)the InnoHK Program。
文摘Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input to augment the RGB images.Depth-based methods attempt to convert estimated depth maps to pseudo-LiDAR and then use LiDAR-based object detectors or focus on the perspective of image and depth fusion learning.However,they demonstrate limited performance and efficiency as a result of depth inaccuracy and complex fusion mode with convolutions.Different from these approaches,our proposed depth-guided vision transformer with a normalizing flows(NF-DVT)network uses normalizing flows to build priors in depth maps to achieve more accurate depth information.Then we develop a novel Swin-Transformer-based backbone with a fusion module to process RGB image patches and depth map patches with two separate branches and fuse them using cross-attention to exchange information with each other.Furthermore,with the help of pixel-wise relative depth values in depth maps,we develop new relative position embeddings in the cross-attention mechanism to capture more accurate sequence ordering of input tokens.Our method is the first Swin-Transformer-based backbone architecture for monocular 3D object detection.The experimental results on the KITTI and the challenging Waymo Open datasets show the effectiveness of our proposed method and superior performance over previous counterparts.
文摘近年来,语音驱动的3D面部动画得到了广泛的研究,虽然先前的工作可以从语音数据中生成连贯的3D面部动画,但是由于视听数据的稀缺性,生成的3D面部动画缺乏真实感和生动性,嘴唇运动的准确性不高。为了提高嘴唇运动的准确性和生动性,本文提出了一种新的模型HBF Talk (端到端的神经网络模型),通过使用Hu BERT (Hidden-Unit BERT)预训练模型对语音数据进行特征提取和编码,引入Flash模块对提取到的语音特征表示进行进一步的编码,获得更为丰富的语音特征上下文表示,最后使用带偏置的跨模态Transformer解码器进行解码。本文进行了定量和定性实验,并与现有的基线模型进行比较,显示本文HBF Talk模型具有更好的性能,提高了语音驱动的嘴唇运动的准确性和生动性。In recent years, speech-driven 3D facial animation has been widely studied. Previous work on the generation of coherent 3D facial animations was reported from speech data. However, the generated 3D facial animations lacks realism and vividness due to the scarcity of audio-visual data, and the accuracy of lip movements is not sufficient. This work is performed in order to improve the accuracy and vividness of lip movement and an end-to-end neural network model, HBF Talk, is proposed. It utilizes the Hu BERT (Hidden-Unit BERT) pre-trained model for feature extraction and encoding of speech data. The Flash module is introduced to further encode the extracted speech feature representations, resulting in more enriched contextual representations of speech features. Finally, a biased cross-modal Transformer decoder is used for decoding. This paper conducts both quantitative and qualitative experiments and compares the results with existing baseline models, demonstrating the proposed HBF Talk model outperforms previous models by improving the accuracy and liveliness of speech-driven lip movements.
基金Project supported by the National Natural Science Foundation ofChina (No. 60472100), the Natural Science Foundation of ZhejiangProvince (Nos. RC01057, Y105577, 601017), the Ningbo Scienceand Technology Project (Nos. 2003A61001, 2004A610001,2004A630002), and the Zhejiang Science and Technology Project(No. 2004C31105), China
文摘In this work, a new method to deal with the unconnected pixels in motion compensated temporal filtering (MCTF) is presented, which is designed to improve the performance of 3D lifted wavelet coding. Furthermore, multiple description scalable coding (MDSC) is investigated, and novel MDSC schemes based on 3D wavelet coding are proposed, using the lifting imple- mentation of temporal filtering. The proposed MDSC schemes can avoid the mismatch problem in multiple description video coding, and have high scalability and robustness of video transmission. Experimental results showed that the proposed schemes are feasible and adequately effective.
文摘This paper presents an optimized 3-D Discrete Wavelet Transform (3-DDWT) architecture. 1-DDWT employed for the design of 3-DDWT architecture uses reduced lifting scheme approach. Further the architecture is optimized by applying block enabling technique, scaling, and rounding of the filter coefficients. The proposed architecture uses biorthogonal (9/7) wavelet filter. The architecture is modeled using Verilog HDL, simulated using ModelSim, synthesized using Xilinx ISE and finally implemented on Virtex-5 FPGA. The proposed 3-DDWT architecture has slice register utilization of 5%, operating frequency of 396 MHz and a power consumption of 0.45 W.
基金supported by the National Science and Technology Major Project (No.2011ZX05023-005-008)
文摘In this paper, we built upon the estimating primaries by sparse inversion (EPSI) method. We use the 3D curvelet transform and modify the EPSI method to the sparse inversion of the biconvex optimization and Ll-norm regularization, and use alternating optimization to directly estimate the primary reflection coefficients and source wavelet. The 3D curvelet transform is used as a sparseness constraint when inverting the primary reflection coefficients, which results in avoiding the prediction subtraction process in the surface-related multiples elimination (SRME) method. The proposed method not only reduces the damage to the effective waves but also improves the elimination of multiples. It is also a wave equation- based method for elimination of surface multiple reflections, which effectively removes surface multiples under complex submarine conditions.
文摘激光雷达点云3D物体检测,对于小物体如行人、自行车的检测精度较低,容易漏检误检,提出一种多尺度Transformer激光雷达点云3D物体检测方法 MSPT-RCNN(multi-scale point transformer-RCNN),提高点云3D物体检测精度。该方法包含两个阶段,即第一阶段(RPN)和第二阶段(RCNN)。RPN阶段通过多尺度Transformer网络提取点云特征,该网络包含多尺度邻域嵌入模块和跳跃连接偏移注意力模块,获取多尺度邻域几何信息和不同层次全局语义信息,生成高质量初始3D包围盒;在RCNN阶段,引入包围盒内的点云多尺度邻域几何信息,优化了包围盒位置、尺寸、朝向和置信度等信息。实验结果表明,该方法(MSPT-RCNN)具有较高检测精度,特别是对于远处和较小物体,提升更高。MSPT-RCNN通过有效学习点云数据中的多尺度几何信息,提取不同层次有效的语义信息,能够有效提升3D物体检测精度。
文摘A new improved Goh's 3 D wavelet transform(WT) coding scheme is presented in this paper. The new scheme has great advantages including a simple code structure, low computation cost and good performance in PSNR, compression ratios and visual quality of reconstructions, when compared to the other existing 3 D WT coding methods and the 2 D WT based coding methods. The new 3 D WT coding scheme is suitable for very low bit rate video coding.
文摘A new motion compensated 3 D wavelet transform (MC 3DWT) video coding scheme is presented in this paper. The new coding scheme has a good performance in average PSNR, compression ratio and visual quality of reconstructions compared with the existing 3 D wavelet transform (3DWT) coding methods and motion compensated 2 D wavelet transform (MC WT) coding method. The new MC 3DWT coding scheme is suitable for very low bit rate video coding.
文摘This paper presents an algorithm for coding video signal based on 3-D wavelet transformation. When the frame order t of a video signal is replaced by order 2, the video signal can be looked as a block in 3-D space. After splitting the block into smaller sub-blocks, imitate the method of 2-D wavelet transformation for images, we can transform the sub-blocks with 3-D wavelet. Most of video signal energy is in the decomposed low-frequency sub-bands. These sub-bands affect the visual quality of the video signal most. Quantizing different sub-bands with different precision and then entropy encoding each sub-band, we can eliminate inter- and intra-frame redundancy of the video signal and compress data. Our simulation experiments show that this algorithm can achieve very good result.
基金National Natural Science Foundation of China(No.51303131)
文摘Textile-reinforced composites,due to their excellent highstrength-to-low-mass ratio, provide promising alternatives to conventional structural materials in many high-tech sectors. 3D braided composites are a kind of advanced composites reinforced with 3D braided fabrics; the complex nature of 3D braided composites makes the evaluation of the quality of the product very difficult. In this investigation,a defect recognition platform for 3D braided composites evaluation was constructed based on dual-tree complex wavelet packet transform( DT-CWPT) and backpropagation( BP) neural networks. The defects in 3D braided composite materials were probed and detected by an ultrasonic sensing system. DT-CWPT method was used to analyze the ultrasonic scanning pulse signals,and the feature vectors of these signals were extracted into the BP neural networks as samples. The type of defects was identified and recognized with the characteristic ultrasonic wave spectra. The position of defects for the test samples can be determined at the same time. This method would have great potential to evaluate the quality of 3D braided composites.
基金supported by the National Key Research and Development Program of China under Grant No.2018YFE0206900the National Natural Science Foundation of China under Grant No.61871440 and CAAI‐Huawei Mind-Spore Open Fund.
文摘Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shapes and sizes.The popular deep learning‐based segmentation algorithms generally rely on the convolutional neural network(CNN)and Transformer.The former cannot extract the global image features effectively while the latter lacks the inductive bias and involves the complicated computation for 3D volume data.The existing hybrid CNN‐Transformer network can only provide the limited performance improvement or even poorer segmentation performance than the pure CNN.To address these issues,a short‐term and long‐term memory self‐attention network is proposed.Firstly,a distinctive self‐attention block uses the Transformer to explore the correlation among the region features at different levels extracted by the CNN.Then,the memory structure filters and combines the above information to exclude the similar regions and detect the multiple tumours.Finally,the multi‐layer reconstruction blocks will predict the tumour boundaries.Experimental results demonstrate that our method outperforms other methods in terms of subjective visual and quantitative evaluation.Compared with the most competitive method,the proposed method provides Dice(82.4%vs.76.6%)and Hausdorff distance 95%(HD95)(10.66 vs.11.54 mm)on the KiTS19 as well as Dice(80.2%vs.78.4%)and HD95(9.632 vs.12.17 mm)on the LiTS.
文摘In the construction and maintenance of particle accelerators,all the accelerator elements should be installed in the same coordinate system,only in this way could the devices in the actual world be consistent with the design drawings.However,with the occurrence of the movements of the reinforced concrete cover plates at short notice or building deformations in the long term,the control points upon the engineering structure will be displaced,and the fitness between the subnetwork and the global control network may be irresponsible.Therefore,it is necessary to evaluate the deformations of the 3D alignment control network.Different from the extant investigations,in this paper,to characterize the deformations of the control network,all of the congruent models between the points measured in different epochs have been identified,and the congruence model with the most control points is considered as the primary or fundamental model,the remaining models are recognized as the additional ones.Furthermore,the discrepancies between the primary S-transformation parameters and the additional S-transformation parameters can reflect the relative movements of the additional congruence models.Both the iterative GCT method and the iterative combinatorial theory are proposed to detect multiple congruence models in the control network.Considering the actual work of the alignment,it is essential to identify the competitive models in the monitoring network,which can provide us a hint that,even the fitness between the subnetwork and the global control network is good,there are still deformations which may be ignored.The numerical experiments show that the suggested approaches can describe the deformation of the 3D alignment control network roundly.
基金supported in part by the National Key Research and Development Program of China(No.SQ2022YFB 3900055)in part by the National Natural Science Foundation of China(No.62101039)+1 种基金in part by the Shandong Excellent Young Scientists Fund Program(Overseas)in part by China Postdoctoral Science Foundation(No.2022M720443).
文摘Synthetic aperture radar(SAR)three-dimensional(3D)imaging technology can reconstruct the complete structure of observed targets and has been a hot topic.Compared with tomographic SAR,array interferometric SAR,and circular SAR,curve SAR can use less data to achieve 3D positioning of targets.Most existing algorithms for estimating Doppler frequency modulation(FM)rate are based on sub aperture partitioning,resulting in low computational efficiency.To address this,this article establishes a target height estimation model,which reflects the relation-ship between the height and the residual Doppler FM rate for spaceborne curve SAR.Then,a fast SAR 3D localization processing flow based on fractional Fourier transform(FrFT)is proposed.Experimental verification demonstrates that this method can estimate the Doppler FM of the target column by column,and the 3D position error for non-overlapping targets is controlled within 1 m.For overlapping points with an intensity ratio greater than 1.5,the root mean square error(RMSE)of the estimation results is around 5 m.If the separation between overlapping points is greater than 35 m,the RMSE decreases to approximately 2 m.