The problem of prescribed performance tracking control for unknown time-delay nonlinear systems subject to output constraints is dealt with in this paper. In contrast with related works, only the most fundamental requ...The problem of prescribed performance tracking control for unknown time-delay nonlinear systems subject to output constraints is dealt with in this paper. In contrast with related works, only the most fundamental requirements, i.e., boundedness and the local Lipschitz condition, are assumed for the allowable time delays. Moreover, we focus on the case where the reference is unknown beforehand, which renders the standard prescribed performance control designs under output constraints infeasible. To conquer these challenges, a novel robust prescribed performance control approach is put forward in this paper.Herein, a reverse tuning function is skillfully constructed and automatically generates a performance envelop for the tracking error. In addition, a unified performance analysis framework based on proof by contradiction and the barrier function is established to reveal the inherent robustness of the control system against the time delays. It turns out that the system output tracks the reference with a preassigned settling time and good accuracy,without constraint violations. A comparative simulation on a two-stage chemical reactor is carried out to illustrate the above theoretical findings.展开更多
Background:Artificial intelligence(AI)technology represented by deep learning has made remarkable achievements in digital pathology,enhancing the accuracy and reliability of diagnosis and prognosis evaluation.The spat...Background:Artificial intelligence(AI)technology represented by deep learning has made remarkable achievements in digital pathology,enhancing the accuracy and reliability of diagnosis and prognosis evaluation.The spatial distribution of CD3^(+)and CD8^(+)T cells within the tumor microenvironment has been demonstrated to have a significant impact on the prognosis of colorectal cancer(CRC).This study aimed to investigate CD3_(CT)(CD3^(+)T cells density in the core of the tumor[CT])prognostic ability in patients with CRC by using AI technology.Methods:The study involved the enrollment of 492 patients from two distinct medical centers,with 358 patients assigned to the training cohort and an additional 134 patients allocated to the validation cohort.To facilitate tissue segmentation and T-cells quantification in whole-slide images(WSIs),a fully automated workflow based on deep learning was devised.Upon the completion of tissue segmentation and subsequent cell segmentation,a comprehensive analysis was conducted.Results:The evaluation of various positive T cell densities revealed comparable discriminatory ability between CD3_(CT) and CD3-CD8(the combination of CD3^(+)and CD8^(+)T cells density within the CT and invasive margin)in predicting mortality(C-index in training cohort:0.65 vs.0.64;validation cohort:0.69 vs.0.69).The CD3_(CT) was confirmed as an independent prognostic factor,with high CD3_(CT) density associated with increased overall survival(OS)in the training cohort(hazard ratio[HR]=0.22,95%confidence interval[CI]:0.12–0.38,P<0.001)and validation cohort(HR=0.21,95%CI:0.05–0.92,P=0.037).Conclusions:We quantify the spatial distribution of CD3^(+)and CD8^(+)T cells within tissue regions in WSIs using AI technology.The CD3_(CT) confirmed as a stage-independent predictor for OS in CRC patients.Moreover,CD3_(CT) shows promise in simplifying the CD3-CD8 system and facilitating its practical application in clinical settings.展开更多
The cavitation in axial piston pumps threatens the reliability and safety of the overall hydraulic system.Vibration signal can reflect the cavitation conditions in axial piston pumps and it has been combined with mach...The cavitation in axial piston pumps threatens the reliability and safety of the overall hydraulic system.Vibration signal can reflect the cavitation conditions in axial piston pumps and it has been combined with machine learning to detect the pump cavitation.However,the vibration signal usually contains noise in real working conditions,which raises concerns about accurate recognition of cavitation in noisy environment.This paper presents an intelligent method to recognise the cavitation in axial piston pumps in noisy environment.First,we train a convolutional neural network(CNN)using the spectrogram images transformed from raw vibration data under different cavitation conditions.Second,we employ the technique of gradient-weighted class activation mapping(Grad-CAM)to visualise class-discriminative regions in the spectrogram image.Finally,we propose a novel image processing method based on Grad-CAM heatmap to automatically remove entrained noise and enhance class features in the spectrogram image.The experimental results show that the proposed method greatly improves the diagnostic performance of the CNN model in noisy environments.The classification accuracy of cavitation conditions increases from 0.50 to 0.89 and from 0.80 to 0.92 at signal-to-noise ratios of 4 and 6 dB,respectively.展开更多
The Internet of Things(IoT)plays an essential role in the current and future generations of information,network,and communication development and applications.This research focuses on vocal tract visualization and mod...The Internet of Things(IoT)plays an essential role in the current and future generations of information,network,and communication development and applications.This research focuses on vocal tract visualization and modeling,which are critical issues in realizing inner vocal tract animation.That is applied in many fields,such as speech training,speech therapy,speech analysis and other speech production-related applications.This work constructed a geometric model by observation of Magnetic Resonance Imaging data,providing a new method to annotate and construct 3D vocal tract organs.The proposed method has two advantages compared with previous methods.Firstly it has a uniform construction protocol for all speech organs.Secondly,this method can build correspondent feature points between different speech organs.There are less than three control parameters can be used to describe every speech organ accurately,for which the accumulated contribution rate is more than 88%.By means of the reconfiguration,the model error is less than 1.0 mm.Regarding to the data from Chinese Magnetic resonance imaging(MRI),this is the first work of 3D vocal tract model.It will promote the theoretical research and development of the intelligent Internet of Things facing speech generation-related issues.展开更多
Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the ...Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the scale of the parallel training corpus is limited.Considering the pre-trained language model has a strong ability for monolingual representation,it is the key challenge for machine translation to construct the in-depth relationship between the source and target language by injecting the lexical and syntactic information into pre-trained language models.To alleviate the dependence on the parallel corpus,we propose a Linguistics Knowledge-Driven MultiTask(LKMT)approach to inject part-of-speech and syntactic knowledge into pre-trained models,thus enhancing the machine translation performance.On the one hand,we integrate part-of-speech and dependency labels into the embedding layer and exploit large-scale monolingual corpus to update all parameters of pre-trained language models,thus ensuring the updated language model contains potential lexical and syntactic information.On the other hand,we leverage an extra self-attention layer to explicitly inject linguistic knowledge into the pre-trained language model-enhanced machine translation model.Experiments on the benchmark dataset show that our proposed LKMT approach improves the Urdu-English translation accuracy by 1.97 points and the English-Urdu translation accuracy by 2.42 points,highlighting the effectiveness of our LKMT framework.Detailed ablation experiments confirm the positive impact of part-of-speech and dependency parsing on machine translation.展开更多
In addressing the challenge of motion artifacts in Positron Emission Tomography (PET) lung scans, our studyintroduces the Triple Equivariant Motion Transformer (TEMT), an innovative, unsupervised, deep-learningbasedfr...In addressing the challenge of motion artifacts in Positron Emission Tomography (PET) lung scans, our studyintroduces the Triple Equivariant Motion Transformer (TEMT), an innovative, unsupervised, deep-learningbasedframework for efficient respiratory motion correction in PET imaging. Unlike traditional techniques,which segment PET data into bins throughout a respiratory cycle and often face issues such as inefficiency andoveremphasis on certain artifacts, TEMT employs Convolutional Neural Networks (CNNs) for effective featureextraction and motion decomposition.TEMT’s unique approach involves transforming motion sequences into Liegroup domains to highlight fundamental motion patterns, coupled with employing competitive weighting forprecise target deformation field generation. Our empirical evaluations confirm TEMT’s superior performancein handling diverse PET lung datasets compared to existing image registration networks. Experimental resultsdemonstrate that TEMT achieved Dice indices of 91.40%, 85.41%, 79.78%, and 72.16% on simulated geometricphantom data, lung voxel phantom data, cardiopulmonary voxel phantom data, and clinical data, respectively. Tofacilitate further research and practical application, the TEMT framework, along with its implementation detailsand part of the simulation data, is made publicly accessible at https://github.com/yehaowei/temt.展开更多
Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or d...Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.展开更多
Autonomous navigation for intelligent mobile robots has gained significant attention,with a focus on enabling robots to generate reliable policies based on maintenance of spatial memory.In this paper,we propose a lear...Autonomous navigation for intelligent mobile robots has gained significant attention,with a focus on enabling robots to generate reliable policies based on maintenance of spatial memory.In this paper,we propose a learning-based visual navigation pipeline that uses topological maps as memory configurations.We introduce a unique online topology construction approach that fuses odometry pose estimation and perceptual similarity estimation.This tackles the issues of topological node redundancy and incorrect edge connections,which stem from the distribution gap between the spatial and perceptual domains.Furthermore,we propose a differentiable graph extraction structure,the topology multi-factor transformer(TMFT).This structure utilizes graph neural networks to integrate global memory and incorporates a multi-factor attention mechanism to underscore elements closely related to relevant target cues for policy generation.Results from photorealistic simulations on image-goal navigation tasks highlight the superior navigation performance of our proposed pipeline compared to existing memory structures.Comprehensive validation through behavior visualization,interpretability tests,and real-world deployment further underscore the adapt-ability and efficacy of our method.展开更多
Subarachnoid haemorrhage(SAH),mostly caused by the rupture of intracranial aneu-rysm,is a common disease with a high fatality rate.SAH lesions are generally diffusely distributed,showing a variety of scales with irreg...Subarachnoid haemorrhage(SAH),mostly caused by the rupture of intracranial aneu-rysm,is a common disease with a high fatality rate.SAH lesions are generally diffusely distributed,showing a variety of scales with irregular edges.The complex characteristics of lesions make SAH segmentation a challenging task.To cope with these difficulties,a u-shaped deformable transformer(UDT)is proposed for SAH segmentation.Specifically,first,a multi-scale deformable attention(MSDA)module is exploited to model the diffuseness and scale-variant characteristics of SAH lesions,where the MSDA module can fuse features in different scales and adjust the attention field of each element dynamically to generate discriminative multi-scale features.Second,the cross deformable attention-based skip connection(CDASC)module is designed to model the irregular edge char-acteristic of SAH lesions,where the CDASC module can utilise the spatial details from encoder features to refine the spatial information of decoder features.Third,the MSDA and CDASC modules are embedded into the backbone Res-UNet to construct the proposed UDT.Extensive experiments are conducted on the self-built SAH-CT dataset and two public medical datasets(GlaS and MoNuSeg).Experimental results show that the presented UDT achieves the state-of-the-art performance.展开更多
Objective: This study aimed to establish a method to predict the overall survival(OS) of patients with stage Ⅰ-Ⅲ colorectal cancer(CRC) through coupling radiomics analysis of CT images with the measurement of tumor ...Objective: This study aimed to establish a method to predict the overall survival(OS) of patients with stage Ⅰ-Ⅲ colorectal cancer(CRC) through coupling radiomics analysis of CT images with the measurement of tumor ecosystem diversification.Methods: We retrospectively identified 161 consecutive patients with stage Ⅰ-Ⅲ CRC who had underwent radical resection as a training cohort. A total of 248 patients were recruited for temporary independent validation as external validation cohort 1, with 103 patients from an external institute as the external validation cohort 2. CT image features to describe tumor spatial heterogeneity leveraging the measurement of diversification of tumor ecosystem, were extracted to build a marker, termed the EcoRad signature. Multivariate Cox regression was used to assess the EcoRad signature, with a prediction model constructed to demonstrate its incremental value to the traditional staging system for OS prediction.Results: The EcoRad signature was significantly associated with OS in the training cohort [hazard ratio(HR)=6.670;95% confidence interval(95% CI): 3.433-12.956;P<0.001), external validation cohort 1(HR=2.866;95% CI: 1.646-4.990;P<0.001) and external validation cohort 2(HR=3.342;95% CI: 1.289-8.663;P=0.002).Incorporating the EcoRad signature into the prediction model presented a higher prediction ability(P<0.001) with respect to the C-index(0.813, 95% CI: 0.804-0.822 in the training cohort;0.758, 95% CI: 0.751-0.765 in the external validation cohort 1;and 0.746, 95% CI: 0.722-0.770 in external validation cohort 2), compared with the reference model that only incorporated tumor, node, metastasis(TNM) system, as well as a better calibration,improved reclassification and superior clinical usefulness.Conclusions: This study establishes a method to measure the spatial heterogeneity of CRC through coupling radiomics analysis with measurement of diversification of the tumor ecosystem, and suggests that this approach could effectively predict OS and could be used as a supplement for risk stratification among stage Ⅰ-Ⅲ CRC patients.展开更多
The continuous observation of the magnetic field by the Solar Dynamics Observatory(SDO)/Helioseismic and Magnetic Imager(HMI)produces numerous image sequences in time and space.These sequences provide data support for...The continuous observation of the magnetic field by the Solar Dynamics Observatory(SDO)/Helioseismic and Magnetic Imager(HMI)produces numerous image sequences in time and space.These sequences provide data support for predicting the evolution of photosphericmagnetic field.Based on the spatiotemporal long short-term memory(LSTM)network,we use the preprocessed data of photospheric magnetic field in active regions to build a prediction model for magnetic field evolution.Because of the elaborate learning and memory mechanism,the trained model can characterize the inherent relationships contained in spatiotemporal features.The testing results of the prediction model indicate that(1)the prediction pattern learned by the model can be applied to predict the evolution of new magnetic field in the next 6 hours that have not been trained,and predicted results are roughly consistent with real observed magnetic field evolution in terms of large-scale structure and movement speed;(2)the performance of the model is related to the prediction time;the shorter the prediction time,the higher the accuracy of the predicted results;(3)the performance of themodel is stable not only for active regions in the north and south but also for data in positive and negative regions.Detailed experimental results and discussions on magnetic flux emergence and magnetic neutral lines finally show that the proposed model could effectively predict the large-scale and short-term evolution of the photospheric magnetic field in active regions.Moreover,our study may provide a reference for the spatiotemporal prediction of other solar activities.展开更多
To address the problem of the low accuracy of transverse velocity field measurements for small targets in highresolution solar images,we proposed a novel velocity field measurement method for high-resolution solar ima...To address the problem of the low accuracy of transverse velocity field measurements for small targets in highresolution solar images,we proposed a novel velocity field measurement method for high-resolution solar images based on PWCNet.This method transforms the transverse velocity field measurements into an optical flow field prediction problem.We evaluated the performance of the proposed method using the Hαand TiO data sets obtained from New Vacuum Solar Telescope observations.The experimental results show that our method effectively predicts the optical flow of small targets in images compared with several typical machine-and deeplearning methods.On the Hαdata set,the proposed method improves the image structure similarity from 0.9182 to0.9587 and reduces the mean of residuals from 24.9931 to 15.2818;on the TiO data set,the proposed method improves the image structure similarity from 0.9289 to 0.9628 and reduces the mean of residuals from 25.9908 to17.0194.The optical flow predicted using the proposed method can provide accurate data for the atmospheric motion information of solar images.The code implementing the proposed method is available on https://github.com/lygmsy123/transverse-velocity-field-measurement.展开更多
With the continuous development of science and technology, digital signal processing is more and more widely used in various fields. Among them, the analog-to-digital converter (ADC) is one of the key components to co...With the continuous development of science and technology, digital signal processing is more and more widely used in various fields. Among them, the analog-to-digital converter (ADC) is one of the key components to convert analog signals to digital signals. As a common type of ADC, 12-bit sequential approximation analog-to-digital converter (SAR ADC) has attracted extensive attention for its performance and application. This paper aims to conduct in-depth research and analysis of 12-bit SAR ADC to meet the growing demands of digital signal processing. This article designs a 12-bit, successive approximation analog-to-digital converter (SAR ADC) with a sampling rate of 5 MS/s. The overall circuit adopts a fully differential structure, with key modules including DAC capacitor array, comparator, and control logic. According to the DAC circuit in this paper, a fully differential capacitor DAC array structure is proposed to reduce the area of layout DAC. The comparator uses a digital dynamic comparator to improve the ADC conversion speed. The chip is designed based on the SMIC180 nm CMOS process. The simulation results show that when the sampling rate is 5 MS/s, the effective bit of SAR ADC is 11.92 bit, the SNR is 74.62 dB, and the SFDR is 89.24 dB.展开更多
Commonsense question answering(CQA)requires understanding and reasoning over QA context and related commonsense knowledge,such as a structured Knowledge Graph(KG).Existing studies combine language models and graph neu...Commonsense question answering(CQA)requires understanding and reasoning over QA context and related commonsense knowledge,such as a structured Knowledge Graph(KG).Existing studies combine language models and graph neural networks to model inference.However,traditional knowledge graph are mostly concept-based,ignoring direct path evidence necessary for accurate reasoning.In this paper,we propose MRGNN(Meta-path Reasoning Graph Neural Network),a novel model that comprehensively captures sequential semantic information from concepts and paths.In MRGNN,meta-paths are introduced as direct inference evidence and an original graph neural network is adopted to aggregate features from both concepts and paths simultaneously.We conduct sufficient experiments on the CommonsenceQA and OpenBookQA datasets,showing the effectiveness of MRGNN.Also,we conduct further ablation experiments and explain the reasoning behavior through the case study.展开更多
To the Editor:Blindness and vision loss(BVL)constitute a growing concern affecting an ever-expanding global population.[1]Globally,approximately 40%of the population relies on burning solid fuels for both cooking and ...To the Editor:Blindness and vision loss(BVL)constitute a growing concern affecting an ever-expanding global population.[1]Globally,approximately 40%of the population relies on burning solid fuels for both cooking and heating.Household air pollution(HAP)predominantly arises from the incomplete combustion of solid fuels used for cooking.Nowadays,the adverse effects of HAP on eye health in human have ascended as a salient concern for environmental,ophthalmic,and public health.[2]Gaining a comprehensive understanding of the burden and temporal trends of BVL due to HAP is the essential basis for health program planning.展开更多
The generative adversarial network(GAN)is first proposed in 2014,and this kind of network model is machine learning systems that can learn to measure a given distribution of data,one of the most important applications...The generative adversarial network(GAN)is first proposed in 2014,and this kind of network model is machine learning systems that can learn to measure a given distribution of data,one of the most important applications is style transfer.Style transfer is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image.CYCLE-GAN is a classic GAN model,which has a wide range of scenarios in style transfer.Considering its unsupervised learning characteristics,the mapping is easy to be learned between an input image and an output image.However,it is difficult for CYCLE-GAN to converge and generate high-quality images.In order to solve this problem,spectral normalization is introduced into each convolutional kernel of the discriminator.Every convolutional kernel reaches Lipschitz stability constraint with adding spectral normalization and the value of the convolutional kernel is limited to[0,1],which promotes the training process of the proposed model.Besides,we use pretrained model(VGG16)to control the loss of image content in the position of l1 regularization.To avoid overfitting,l1 regularization term and l2 regularization term are both used in the object loss function.In terms of Frechet Inception Distance(FID)score evaluation,our proposed model achieves outstanding performance and preserves more discriminative features.Experimental results show that the proposed model converges faster and achieves better FID scores than the state of the art.展开更多
This paper addresses the open vehicle routing problem with time window(OVRPTW), where each vehicle does not need to return to the depot after completing the delivery task.The optimization objective is to minimize the ...This paper addresses the open vehicle routing problem with time window(OVRPTW), where each vehicle does not need to return to the depot after completing the delivery task.The optimization objective is to minimize the total distance. This problem exists widely in real-life logistics distribution process.We propose a hybrid column generation algorithm(HCGA) for the OVRPTW, embedding both exact algorithm and metaheuristic. In HCGA, a label setting algorithm and an intelligent algorithm are designed to select columns from small and large subproblems, respectively. Moreover, a branch strategy is devised to generate the final feasible solution for the OVRPTW. The computational results show that the proposed algorithm has faster speed and can obtain the approximate optimal solution of the problem with 100 customers in a reasonable time.展开更多
Word embedding has been widely used in word sense disambiguation(WSD)and many other tasks in recent years for it can well represent the semantics of words.However,the existing word embedding methods mostly represent e...Word embedding has been widely used in word sense disambiguation(WSD)and many other tasks in recent years for it can well represent the semantics of words.However,the existing word embedding methods mostly represent each word as a single vector,without considering the homonymy and polysemy of the word;thus,their performances are limited.In order to address this problem,an effective topical word embedding(TWE)‐based WSD method,named TWE‐WSD,is proposed,which integrates Latent Dirichlet Allocation(LDA)and word embedding.Instead of generating a single word vector(WV)for each word,TWE‐WSD generates a topical WV for each word under each topic.Effective integrating strategies are designed to obtain high quality contextual vectors.Extensive experiments on SemEval‐2013 and SemEval‐2015 for English all‐words tasks showed that TWE‐WSD outperforms other state‐of‐the‐art WSD methods,especially on nouns.展开更多
In China,Tibetan is usually divided into three major dialects:the Am-do,Khams and Lhasa dialects.The Am-do dialect evolved from ancient Tibetan and is a local variant of modern Tibetan.Although this dialect has its ow...In China,Tibetan is usually divided into three major dialects:the Am-do,Khams and Lhasa dialects.The Am-do dialect evolved from ancient Tibetan and is a local variant of modern Tibetan.Although this dialect has its own specific historical and social conditions and development,there have been different degrees of communication with other ethnic groups,but all the abovementioned dialects developed from the same language:Tibetan.This paper uses the particularity of Tibetan suffixes in pronunciation and proposes a lexicon for the Am-do language,which optimizes the problems existing in previous research.Audio data of the Am-do dialect are expanded by data augmentation technology combining noise and reverberation,and the morphological characteristics and characteristics of the Tibetan language are further considered.According to the particularity of Tibetan grammar,grammatical features are used to optimize grammatical relationships and are combined with a language model,and the Am-do dialect is scored and rescored.Experimental results show that compared with the baseline,our proposed new lexicon and data augmentation technology yields a relative increase of approximately 3%in character error rates(CERs)and a relative increase of 3%-19%in the recognition rate of acoustic models and language models.展开更多
Research on the feature of speech and image signals are carried out from two perspectives,the time domain and the frequency domain.The speech and image signals are a non-stationary signal,so FT is not used for the non...Research on the feature of speech and image signals are carried out from two perspectives,the time domain and the frequency domain.The speech and image signals are a non-stationary signal,so FT is not used for the non-stationary characteristics of the signal.When short-term stable speech is obtained by windowing and framing the subsequent processing of the signal is completed by the Discrete Fourier Transform(DFT).The Fast Discrete Fourier Transform is a commonly used analysis method for speech and image signal processing in frequency domain.It has the problem of adjusting window size to a for desired resolution.But the Fractional Fourier Transform can have both time domain and frequency domain processing capabilities.This paper performs global processing speech encryption by combining speech with image of Fractional Fourier Transform.The speech signal is embedded watermark image that is processed by fractional transformation,and the embedded watermark has the effect of rotation and superposition,which improves the security of the speech.The paper results show that the proposed speech encryption method has a higher security level by Fractional Fourier Transform.The technology is easy to extend to practical applications.展开更多
基金supported in part by the National Natural Science Foundation of China (62103093)the National Key Research and Development Program of China (2022YFB3305905)+6 种基金the Xingliao Talent Program of Liaoning Province of China (XLYC2203130)the Fundamental Research Funds for the Central Universities of China (N2108003)the Natural Science Foundation of Liaoning Province (2023-MS-087)the BNU Talent Seed Fund,UIC Start-Up Fund (R72021115)the Guangdong Key Laboratory of AI and MM Data Processing (2020KSYS007)the Guangdong Provincial Key Laboratory IRADS for Data Science (2022B1212010006)the Guangdong Higher Education Upgrading Plan 2021–2025 of “Rushing to the Top,Making Up Shortcomings and Strengthening Special Features” with UIC Research,China (R0400001-22,R0400025-21)。
文摘The problem of prescribed performance tracking control for unknown time-delay nonlinear systems subject to output constraints is dealt with in this paper. In contrast with related works, only the most fundamental requirements, i.e., boundedness and the local Lipschitz condition, are assumed for the allowable time delays. Moreover, we focus on the case where the reference is unknown beforehand, which renders the standard prescribed performance control designs under output constraints infeasible. To conquer these challenges, a novel robust prescribed performance control approach is put forward in this paper.Herein, a reverse tuning function is skillfully constructed and automatically generates a performance envelop for the tracking error. In addition, a unified performance analysis framework based on proof by contradiction and the barrier function is established to reveal the inherent robustness of the control system against the time delays. It turns out that the system output tracks the reference with a preassigned settling time and good accuracy,without constraint violations. A comparative simulation on a two-stage chemical reactor is carried out to illustrate the above theoretical findings.
基金supported by grants from the National Key R&D Program of China(No.2021YFF1201003)the National Science Fund for Distinguished Young Scholars(No.81925023)+3 种基金the Key-Area Research and Development Program of Guangdong Province(No.2021B0101420006)the Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application(No.2022B1212010011)the High-level Hospital Construction Project(No.DFJHBF202105)the National Science Foundation for Young Scientists of China(No.82001986)
文摘Background:Artificial intelligence(AI)technology represented by deep learning has made remarkable achievements in digital pathology,enhancing the accuracy and reliability of diagnosis and prognosis evaluation.The spatial distribution of CD3^(+)and CD8^(+)T cells within the tumor microenvironment has been demonstrated to have a significant impact on the prognosis of colorectal cancer(CRC).This study aimed to investigate CD3_(CT)(CD3^(+)T cells density in the core of the tumor[CT])prognostic ability in patients with CRC by using AI technology.Methods:The study involved the enrollment of 492 patients from two distinct medical centers,with 358 patients assigned to the training cohort and an additional 134 patients allocated to the validation cohort.To facilitate tissue segmentation and T-cells quantification in whole-slide images(WSIs),a fully automated workflow based on deep learning was devised.Upon the completion of tissue segmentation and subsequent cell segmentation,a comprehensive analysis was conducted.Results:The evaluation of various positive T cell densities revealed comparable discriminatory ability between CD3_(CT) and CD3-CD8(the combination of CD3^(+)and CD8^(+)T cells density within the CT and invasive margin)in predicting mortality(C-index in training cohort:0.65 vs.0.64;validation cohort:0.69 vs.0.69).The CD3_(CT) was confirmed as an independent prognostic factor,with high CD3_(CT) density associated with increased overall survival(OS)in the training cohort(hazard ratio[HR]=0.22,95%confidence interval[CI]:0.12–0.38,P<0.001)and validation cohort(HR=0.21,95%CI:0.05–0.92,P=0.037).Conclusions:We quantify the spatial distribution of CD3^(+)and CD8^(+)T cells within tissue regions in WSIs using AI technology.The CD3_(CT) confirmed as a stage-independent predictor for OS in CRC patients.Moreover,CD3_(CT) shows promise in simplifying the CD3-CD8 system and facilitating its practical application in clinical settings.
基金National Key R&D Program of China,Grant/Award Number:2018YFB1702503Open Foundation of the State Key Laboratory of Fluid Power and Mechatronic Systems,Grant/Award Number:GZKF-202108+2 种基金Open Foundation of the Guangdong Provincial Key Laboratory of Electronic Information Products Reliability TechnologyChina National Postdoctoral Program for Innovative Talents,Grant/Award Number:BX20200210China Postdoctoral Science Foundation,Grant/Award Number:2019M660086。
文摘The cavitation in axial piston pumps threatens the reliability and safety of the overall hydraulic system.Vibration signal can reflect the cavitation conditions in axial piston pumps and it has been combined with machine learning to detect the pump cavitation.However,the vibration signal usually contains noise in real working conditions,which raises concerns about accurate recognition of cavitation in noisy environment.This paper presents an intelligent method to recognise the cavitation in axial piston pumps in noisy environment.First,we train a convolutional neural network(CNN)using the spectrogram images transformed from raw vibration data under different cavitation conditions.Second,we employ the technique of gradient-weighted class activation mapping(Grad-CAM)to visualise class-discriminative regions in the spectrogram image.Finally,we propose a novel image processing method based on Grad-CAM heatmap to automatically remove entrained noise and enhance class features in the spectrogram image.The experimental results show that the proposed method greatly improves the diagnostic performance of the CNN model in noisy environments.The classification accuracy of cavitation conditions increases from 0.50 to 0.89 and from 0.80 to 0.92 at signal-to-noise ratios of 4 and 6 dB,respectively.
基金This work was supported by the Regional Innovation Cooperation Project of Sichuan Province(Grant No.2022YFQ0073).
文摘The Internet of Things(IoT)plays an essential role in the current and future generations of information,network,and communication development and applications.This research focuses on vocal tract visualization and modeling,which are critical issues in realizing inner vocal tract animation.That is applied in many fields,such as speech training,speech therapy,speech analysis and other speech production-related applications.This work constructed a geometric model by observation of Magnetic Resonance Imaging data,providing a new method to annotate and construct 3D vocal tract organs.The proposed method has two advantages compared with previous methods.Firstly it has a uniform construction protocol for all speech organs.Secondly,this method can build correspondent feature points between different speech organs.There are less than three control parameters can be used to describe every speech organ accurately,for which the accumulated contribution rate is more than 88%.By means of the reconfiguration,the model error is less than 1.0 mm.Regarding to the data from Chinese Magnetic resonance imaging(MRI),this is the first work of 3D vocal tract model.It will promote the theoretical research and development of the intelligent Internet of Things facing speech generation-related issues.
基金supported by the National Natural Science Foundation of China under Grant(61732005,61972186)Yunnan Provincial Major Science and Technology Special Plan Projects(Nos.202103AA080015,202203AA080004).
文摘Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the scale of the parallel training corpus is limited.Considering the pre-trained language model has a strong ability for monolingual representation,it is the key challenge for machine translation to construct the in-depth relationship between the source and target language by injecting the lexical and syntactic information into pre-trained language models.To alleviate the dependence on the parallel corpus,we propose a Linguistics Knowledge-Driven MultiTask(LKMT)approach to inject part-of-speech and syntactic knowledge into pre-trained models,thus enhancing the machine translation performance.On the one hand,we integrate part-of-speech and dependency labels into the embedding layer and exploit large-scale monolingual corpus to update all parameters of pre-trained language models,thus ensuring the updated language model contains potential lexical and syntactic information.On the other hand,we leverage an extra self-attention layer to explicitly inject linguistic knowledge into the pre-trained language model-enhanced machine translation model.Experiments on the benchmark dataset show that our proposed LKMT approach improves the Urdu-English translation accuracy by 1.97 points and the English-Urdu translation accuracy by 2.42 points,highlighting the effectiveness of our LKMT framework.Detailed ablation experiments confirm the positive impact of part-of-speech and dependency parsing on machine translation.
基金the National Natural Science Foundation of China(No.82160347)Yunnan Provincial Science and Technology Department(No.202102AE090031)Yunnan Key Laboratory of Smart City in Cyberspace Security(No.202105AG070010).
文摘In addressing the challenge of motion artifacts in Positron Emission Tomography (PET) lung scans, our studyintroduces the Triple Equivariant Motion Transformer (TEMT), an innovative, unsupervised, deep-learningbasedframework for efficient respiratory motion correction in PET imaging. Unlike traditional techniques,which segment PET data into bins throughout a respiratory cycle and often face issues such as inefficiency andoveremphasis on certain artifacts, TEMT employs Convolutional Neural Networks (CNNs) for effective featureextraction and motion decomposition.TEMT’s unique approach involves transforming motion sequences into Liegroup domains to highlight fundamental motion patterns, coupled with employing competitive weighting forprecise target deformation field generation. Our empirical evaluations confirm TEMT’s superior performancein handling diverse PET lung datasets compared to existing image registration networks. Experimental resultsdemonstrate that TEMT achieved Dice indices of 91.40%, 85.41%, 79.78%, and 72.16% on simulated geometricphantom data, lung voxel phantom data, cardiopulmonary voxel phantom data, and clinical data, respectively. Tofacilitate further research and practical application, the TEMT framework, along with its implementation detailsand part of the simulation data, is made publicly accessible at https://github.com/yehaowei/temt.
基金supported by Yunnan Provincial Major Science and Technology Special Plan Projects(Grant Nos.202202AD080003,202202AE090008,202202AD080004,202302AD080003)National Natural Science Foundation of China(Grant Nos.U21B2027,62266027,62266028,62266025)Yunnan Province Young and Middle-Aged Academic and Technical Leaders Reserve Talent Program(Grant No.202305AC160063).
文摘Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively.
基金supported in part by the National Natural Science Foundation of China (62225309,62073222,U21A20480,62361166632)。
文摘Autonomous navigation for intelligent mobile robots has gained significant attention,with a focus on enabling robots to generate reliable policies based on maintenance of spatial memory.In this paper,we propose a learning-based visual navigation pipeline that uses topological maps as memory configurations.We introduce a unique online topology construction approach that fuses odometry pose estimation and perceptual similarity estimation.This tackles the issues of topological node redundancy and incorrect edge connections,which stem from the distribution gap between the spatial and perceptual domains.Furthermore,we propose a differentiable graph extraction structure,the topology multi-factor transformer(TMFT).This structure utilizes graph neural networks to integrate global memory and incorporates a multi-factor attention mechanism to underscore elements closely related to relevant target cues for policy generation.Results from photorealistic simulations on image-goal navigation tasks highlight the superior navigation performance of our proposed pipeline compared to existing memory structures.Comprehensive validation through behavior visualization,interpretability tests,and real-world deployment further underscore the adapt-ability and efficacy of our method.
基金National Natural Science Foundation of China,Grant/Award Numbers:62377026,62201222Knowledge Innovation Program of Wuhan-Shuguang Project,Grant/Award Number:2023010201020382+1 种基金National Key Research and Development Programme of China,Grant/Award Number:2022YFD1700204Fundamental Research Funds for the Central Universities,Grant/Award Numbers:CCNU22QN014,CCNU22JC007,CCNU22XJ034.
文摘Subarachnoid haemorrhage(SAH),mostly caused by the rupture of intracranial aneu-rysm,is a common disease with a high fatality rate.SAH lesions are generally diffusely distributed,showing a variety of scales with irregular edges.The complex characteristics of lesions make SAH segmentation a challenging task.To cope with these difficulties,a u-shaped deformable transformer(UDT)is proposed for SAH segmentation.Specifically,first,a multi-scale deformable attention(MSDA)module is exploited to model the diffuseness and scale-variant characteristics of SAH lesions,where the MSDA module can fuse features in different scales and adjust the attention field of each element dynamically to generate discriminative multi-scale features.Second,the cross deformable attention-based skip connection(CDASC)module is designed to model the irregular edge char-acteristic of SAH lesions,where the CDASC module can utilise the spatial details from encoder features to refine the spatial information of decoder features.Third,the MSDA and CDASC modules are embedded into the backbone Res-UNet to construct the proposed UDT.Extensive experiments are conducted on the self-built SAH-CT dataset and two public medical datasets(GlaS and MoNuSeg).Experimental results show that the presented UDT achieves the state-of-the-art performance.
基金supported by the National Key R&D Program of China (No. 2021YFF1201003)the Key R&D Program of Guangdong Province, China (No. 2021B0101420006)+2 种基金the National Science Fund for Distinguished Young Scholars (No. 81925023 and 82071892)the National Natural Science Foundation of China (No. 81771912 and 82071892)the National Natural Science Foundation for Young Scientists of China (No. 81701782 and 81901910).
文摘Objective: This study aimed to establish a method to predict the overall survival(OS) of patients with stage Ⅰ-Ⅲ colorectal cancer(CRC) through coupling radiomics analysis of CT images with the measurement of tumor ecosystem diversification.Methods: We retrospectively identified 161 consecutive patients with stage Ⅰ-Ⅲ CRC who had underwent radical resection as a training cohort. A total of 248 patients were recruited for temporary independent validation as external validation cohort 1, with 103 patients from an external institute as the external validation cohort 2. CT image features to describe tumor spatial heterogeneity leveraging the measurement of diversification of tumor ecosystem, were extracted to build a marker, termed the EcoRad signature. Multivariate Cox regression was used to assess the EcoRad signature, with a prediction model constructed to demonstrate its incremental value to the traditional staging system for OS prediction.Results: The EcoRad signature was significantly associated with OS in the training cohort [hazard ratio(HR)=6.670;95% confidence interval(95% CI): 3.433-12.956;P<0.001), external validation cohort 1(HR=2.866;95% CI: 1.646-4.990;P<0.001) and external validation cohort 2(HR=3.342;95% CI: 1.289-8.663;P=0.002).Incorporating the EcoRad signature into the prediction model presented a higher prediction ability(P<0.001) with respect to the C-index(0.813, 95% CI: 0.804-0.822 in the training cohort;0.758, 95% CI: 0.751-0.765 in the external validation cohort 1;and 0.746, 95% CI: 0.722-0.770 in external validation cohort 2), compared with the reference model that only incorporated tumor, node, metastasis(TNM) system, as well as a better calibration,improved reclassification and superior clinical usefulness.Conclusions: This study establishes a method to measure the spatial heterogeneity of CRC through coupling radiomics analysis with measurement of diversification of the tumor ecosystem, and suggests that this approach could effectively predict OS and could be used as a supplement for risk stratification among stage Ⅰ-Ⅲ CRC patients.
基金This study is supported by the National Natural Science Foundation of China(Grant Nos.12073077,11873027,U2031140,11773072 and 12063002)。
文摘The continuous observation of the magnetic field by the Solar Dynamics Observatory(SDO)/Helioseismic and Magnetic Imager(HMI)produces numerous image sequences in time and space.These sequences provide data support for predicting the evolution of photosphericmagnetic field.Based on the spatiotemporal long short-term memory(LSTM)network,we use the preprocessed data of photospheric magnetic field in active regions to build a prediction model for magnetic field evolution.Because of the elaborate learning and memory mechanism,the trained model can characterize the inherent relationships contained in spatiotemporal features.The testing results of the prediction model indicate that(1)the prediction pattern learned by the model can be applied to predict the evolution of new magnetic field in the next 6 hours that have not been trained,and predicted results are roughly consistent with real observed magnetic field evolution in terms of large-scale structure and movement speed;(2)the performance of the model is related to the prediction time;the shorter the prediction time,the higher the accuracy of the predicted results;(3)the performance of themodel is stable not only for active regions in the north and south but also for data in positive and negative regions.Detailed experimental results and discussions on magnetic flux emergence and magnetic neutral lines finally show that the proposed model could effectively predict the large-scale and short-term evolution of the photospheric magnetic field in active regions.Moreover,our study may provide a reference for the spatiotemporal prediction of other solar activities.
基金supported by the National Natural Science Foundation of China under Grant Nos.12063002,12163004,and 12073077。
文摘To address the problem of the low accuracy of transverse velocity field measurements for small targets in highresolution solar images,we proposed a novel velocity field measurement method for high-resolution solar images based on PWCNet.This method transforms the transverse velocity field measurements into an optical flow field prediction problem.We evaluated the performance of the proposed method using the Hαand TiO data sets obtained from New Vacuum Solar Telescope observations.The experimental results show that our method effectively predicts the optical flow of small targets in images compared with several typical machine-and deeplearning methods.On the Hαdata set,the proposed method improves the image structure similarity from 0.9182 to0.9587 and reduces the mean of residuals from 24.9931 to 15.2818;on the TiO data set,the proposed method improves the image structure similarity from 0.9289 to 0.9628 and reduces the mean of residuals from 25.9908 to17.0194.The optical flow predicted using the proposed method can provide accurate data for the atmospheric motion information of solar images.The code implementing the proposed method is available on https://github.com/lygmsy123/transverse-velocity-field-measurement.
文摘With the continuous development of science and technology, digital signal processing is more and more widely used in various fields. Among them, the analog-to-digital converter (ADC) is one of the key components to convert analog signals to digital signals. As a common type of ADC, 12-bit sequential approximation analog-to-digital converter (SAR ADC) has attracted extensive attention for its performance and application. This paper aims to conduct in-depth research and analysis of 12-bit SAR ADC to meet the growing demands of digital signal processing. This article designs a 12-bit, successive approximation analog-to-digital converter (SAR ADC) with a sampling rate of 5 MS/s. The overall circuit adopts a fully differential structure, with key modules including DAC capacitor array, comparator, and control logic. According to the DAC circuit in this paper, a fully differential capacitor DAC array structure is proposed to reduce the area of layout DAC. The comparator uses a digital dynamic comparator to improve the ADC conversion speed. The chip is designed based on the SMIC180 nm CMOS process. The simulation results show that when the sampling rate is 5 MS/s, the effective bit of SAR ADC is 11.92 bit, the SNR is 74.62 dB, and the SFDR is 89.24 dB.
基金supported by the Key Research and Development Program of Hubei Province(2020BAB017)the Scientific Research Center Program of National Language Commission(ZDI135-135)the Fundamental Research Funds for the Central Universities(KJ02502022-0155,CCNU22XJ037).
文摘Commonsense question answering(CQA)requires understanding and reasoning over QA context and related commonsense knowledge,such as a structured Knowledge Graph(KG).Existing studies combine language models and graph neural networks to model inference.However,traditional knowledge graph are mostly concept-based,ignoring direct path evidence necessary for accurate reasoning.In this paper,we propose MRGNN(Meta-path Reasoning Graph Neural Network),a novel model that comprehensively captures sequential semantic information from concepts and paths.In MRGNN,meta-paths are introduced as direct inference evidence and an original graph neural network is adopted to aggregate features from both concepts and paths simultaneously.We conduct sufficient experiments on the CommonsenceQA and OpenBookQA datasets,showing the effectiveness of MRGNN.Also,we conduct further ablation experiments and explain the reasoning behavior through the case study.
基金GDPH Supporting Fund for Talent Program(No.KY0120220263)National Natural Science Foundation of China(Nos.82271125,82171075)+8 种基金Project of Special Research on Cardiovascular Diseases(No.2020XXG007)National Medical Simulation Education Research Project(No.2021MNYB01)Special Fund Project of Technology Achievement Transformation in Life and Health Innovation of the Greater Bay Area(No.GBALH202308)Science and Technology Program of Guangzhou(Nos.20220610092 and 202103000045)Outstanding Young Talent Trainee Program of Guangdong Provincial People’s Hospital(No.KJ012019087)GDPH Scientific Research Funds for Leading Medical Talents and Distinguished Young Scholars in Guangdong Province(No.KJ012019457)Talent Introduction Fund of Guangdong Provincial People’s Hospital(No.Y012018145)launch fund of Guangdong Provincial People’s Hospital for NSFC(Nos.8217040546,8217040449,and 8227040339)Four"Batches"Innovation Project of Invigorating Medical through Science and Technology of Shanxi Province(No.2022XM24)
文摘To the Editor:Blindness and vision loss(BVL)constitute a growing concern affecting an ever-expanding global population.[1]Globally,approximately 40%of the population relies on burning solid fuels for both cooking and heating.Household air pollution(HAP)predominantly arises from the incomplete combustion of solid fuels used for cooking.Nowadays,the adverse effects of HAP on eye health in human have ascended as a salient concern for environmental,ophthalmic,and public health.[2]Gaining a comprehensive understanding of the burden and temporal trends of BVL due to HAP is the essential basis for health program planning.
基金This work is supported by the National Natural Science Foundation of China(No.61702226)the 111 Project(B12018)+1 种基金the Natural Science Foundation of Jiangsu Province(No.BK20170200)the Fundamental Research Funds for the Central Universities(No.JUSRP11854).
文摘The generative adversarial network(GAN)is first proposed in 2014,and this kind of network model is machine learning systems that can learn to measure a given distribution of data,one of the most important applications is style transfer.Style transfer is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image.CYCLE-GAN is a classic GAN model,which has a wide range of scenarios in style transfer.Considering its unsupervised learning characteristics,the mapping is easy to be learned between an input image and an output image.However,it is difficult for CYCLE-GAN to converge and generate high-quality images.In order to solve this problem,spectral normalization is introduced into each convolutional kernel of the discriminator.Every convolutional kernel reaches Lipschitz stability constraint with adding spectral normalization and the value of the convolutional kernel is limited to[0,1],which promotes the training process of the proposed model.Besides,we use pretrained model(VGG16)to control the loss of image content in the position of l1 regularization.To avoid overfitting,l1 regularization term and l2 regularization term are both used in the object loss function.In terms of Frechet Inception Distance(FID)score evaluation,our proposed model achieves outstanding performance and preserves more discriminative features.Experimental results show that the proposed model converges faster and achieves better FID scores than the state of the art.
基金supported by the National Natural Science Foundation of China (61963022,51665025,61873328)。
文摘This paper addresses the open vehicle routing problem with time window(OVRPTW), where each vehicle does not need to return to the depot after completing the delivery task.The optimization objective is to minimize the total distance. This problem exists widely in real-life logistics distribution process.We propose a hybrid column generation algorithm(HCGA) for the OVRPTW, embedding both exact algorithm and metaheuristic. In HCGA, a label setting algorithm and an intelligent algorithm are designed to select columns from small and large subproblems, respectively. Moreover, a branch strategy is devised to generate the final feasible solution for the OVRPTW. The computational results show that the proposed algorithm has faster speed and can obtain the approximate optimal solution of the problem with 100 customers in a reasonable time.
基金National Natural Science Foundation of China,Grant/Award Number:61562054The Fund of China Scholarship Council,Grant/Award Number:201908530036Talents Introduction Project of Guangxi University for Nationalities,Grant/Award Number:2014MDQD020。
文摘Word embedding has been widely used in word sense disambiguation(WSD)and many other tasks in recent years for it can well represent the semantics of words.However,the existing word embedding methods mostly represent each word as a single vector,without considering the homonymy and polysemy of the word;thus,their performances are limited.In order to address this problem,an effective topical word embedding(TWE)‐based WSD method,named TWE‐WSD,is proposed,which integrates Latent Dirichlet Allocation(LDA)and word embedding.Instead of generating a single word vector(WV)for each word,TWE‐WSD generates a topical WV for each word under each topic.Effective integrating strategies are designed to obtain high quality contextual vectors.Extensive experiments on SemEval‐2013 and SemEval‐2015 for English all‐words tasks showed that TWE‐WSD outperforms other state‐of‐the‐art WSD methods,especially on nouns.
基金This work was supported by the Regional Innovation Cooperation Project of Sichuan Province(Grant No.22QYCX0082).
文摘In China,Tibetan is usually divided into three major dialects:the Am-do,Khams and Lhasa dialects.The Am-do dialect evolved from ancient Tibetan and is a local variant of modern Tibetan.Although this dialect has its own specific historical and social conditions and development,there have been different degrees of communication with other ethnic groups,but all the abovementioned dialects developed from the same language:Tibetan.This paper uses the particularity of Tibetan suffixes in pronunciation and proposes a lexicon for the Am-do language,which optimizes the problems existing in previous research.Audio data of the Am-do dialect are expanded by data augmentation technology combining noise and reverberation,and the morphological characteristics and characteristics of the Tibetan language are further considered.According to the particularity of Tibetan grammar,grammatical features are used to optimize grammatical relationships and are combined with a language model,and the Am-do dialect is scored and rescored.Experimental results show that compared with the baseline,our proposed new lexicon and data augmentation technology yields a relative increase of approximately 3%in character error rates(CERs)and a relative increase of 3%-19%in the recognition rate of acoustic models and language models.
基金The work is supported by Regional Innovation Cooperation Project of Sichuan Province(Grant No.22QYCX0082)Jian-Guo Wei received the grant,and the Science and Technology Plan of Qinghai Province,China(Grant No.2019-ZJ-7012)Xiu Juan Ma received the grant.
文摘Research on the feature of speech and image signals are carried out from two perspectives,the time domain and the frequency domain.The speech and image signals are a non-stationary signal,so FT is not used for the non-stationary characteristics of the signal.When short-term stable speech is obtained by windowing and framing the subsequent processing of the signal is completed by the Discrete Fourier Transform(DFT).The Fast Discrete Fourier Transform is a commonly used analysis method for speech and image signal processing in frequency domain.It has the problem of adjusting window size to a for desired resolution.But the Fractional Fourier Transform can have both time domain and frequency domain processing capabilities.This paper performs global processing speech encryption by combining speech with image of Fractional Fourier Transform.The speech signal is embedded watermark image that is processed by fractional transformation,and the embedded watermark has the effect of rotation and superposition,which improves the security of the speech.The paper results show that the proposed speech encryption method has a higher security level by Fractional Fourier Transform.The technology is easy to extend to practical applications.