In this paper, we present a study on activity functions for an MLNN (multi-layered neural network) and propose a suitable activity function for data enlargement processing. We have carefully studied the training perfo...In this paper, we present a study on activity functions for an MLNN (multi-layered neural network) and propose a suitable activity function for data enlargement processing. We have carefully studied the training performance of Sigmoid, ReLu, Leaky-ReLu and L & exp. activity functions for few inputs to multiple output training patterns. Our MLNNs model has L hidden layers with two or three inputs to four or six outputs data variations by BP (backpropagation) NN (neural network) training. We focused on the multi teacher training signals to investigate and evaluate the training performance in MLNNs to select the best and good activity function for data enlargement and hence could be applicable for image and signal processing (synaptic divergence) along with the proposed methods with convolution networks. We specifically used four activity functions from which we found out that L & exp. activity function can suite DENN (data enlargement neural network) training since it could give the highest percentage training abilities compared to the other activity functions of Sigmoid, ReLu and Leaky-ReLu during simulation and training of data in the network. And finally, we recommend L & exp. function to be good for MLNNs and may be applicable for signal processing of data and information enlargement because of its performance training characteristics with multiple teacher training patterns using original generated data and hence can be tried with CNN (convolution neural networks) of image processing.展开更多
In recent years,deep learning-based signal recognition technology has gained attention and emerged as an important approach for safeguarding the electromagnetic environment.However,training deep learning-based classif...In recent years,deep learning-based signal recognition technology has gained attention and emerged as an important approach for safeguarding the electromagnetic environment.However,training deep learning-based classifiers on large signal datasets with redundant samples requires significant memory and high costs.This paper proposes a support databased core-set selection method(SD)for signal recognition,aiming to screen a representative subset that approximates the large signal dataset.Specifically,this subset can be identified by employing the labeled information during the early stages of model training,as some training samples are labeled as supporting data frequently.This support data is crucial for model training and can be found using a border sample selector.Simulation results demonstrate that the SD method minimizes the impact on model recognition performance while reducing the dataset size,and outperforms five other state-of-the-art core-set selection methods when the fraction of training sample kept is less than or equal to 0.3 on the RML2016.04C dataset or 0.5 on the RML22 dataset.The SD method is particularly helpful for signal recognition tasks with limited memory and computing resources.展开更多
Compared with channel estimation method based on explicit training sequences,bandwidth is saved for those methods using superimposed training sequences,while it is wasted when Cyclic Prefix(CP) is added.In previous wo...Compared with channel estimation method based on explicit training sequences,bandwidth is saved for those methods using superimposed training sequences,while it is wasted when Cyclic Prefix(CP) is added.In previous work of McLernon,the Mean Square Error(MSE) performance of Data-Dependent Superimposed Training(DDST) without CP for Single-Input Single-Output(SISO) system was analyzed under the assumption that the data-dependent sequence matrix was a circulant matrix and not interfered by others.In fact,for the system without CP,the data-dependent sequence matrix is not circulant any more and will be interfered.This paper derives the exact expression of MSE for the system without CP and also gives its extension to Multiple-Input Multiple-Output(MIMO) system without CP.展开更多
Tri-training能有效利用无标记样例提高泛化能力.针对Tri-training迭代中无标记样例常被错误标记而形成训练集噪声,导致性能不稳定的缺点,文中提出ADE-Tri-training(Tri-training with Adaptive Data Editing)新算法.它不仅利用Remove O...Tri-training能有效利用无标记样例提高泛化能力.针对Tri-training迭代中无标记样例常被错误标记而形成训练集噪声,导致性能不稳定的缺点,文中提出ADE-Tri-training(Tri-training with Adaptive Data Editing)新算法.它不仅利用Remove Only剪辑操作对每次迭代可能产生的误标记样例识别并移除,更重要的是采用自适应策略来确定Remove Only触发与抑制的恰当时机.文中证明,PAC理论下自适应策略中一系列判别充分条件可同时确保新训练集规模迭代增大和新假设分类错误率迭代降低更多.UCI数据集上实验结果表明:ADE-Tri-training具有更好的分类泛化性能和健壮性.展开更多
弱监督关系抽取利用已有关系实体对从文本集中自动获取训练数据,有效解决了训练数据不足的问题。针对弱监督训练数据存在噪声、特征不足和不平衡,导致关系抽取性能不高的问题,文中提出NF-Tri-training(Tritraining with Noise Filtering...弱监督关系抽取利用已有关系实体对从文本集中自动获取训练数据,有效解决了训练数据不足的问题。针对弱监督训练数据存在噪声、特征不足和不平衡,导致关系抽取性能不高的问题,文中提出NF-Tri-training(Tritraining with Noise Filtering)弱监督关系抽取算法。它利用欠采样解决样本不平衡问题,基于Tri-training从未标注数据中迭代学习新的样本,提高分类器的泛化能力,采用数据编辑技术识别并移除初始训练数据和每次迭代产生的错标样本。在互动百科采集数据集上实验结果表明NF-Tri-training算法能够有效提升关系分类器的性能。展开更多
The existing Big Data of transport flows and railway operations can be mined through advanced statistical analysis and machine learning methods in order to describe and predict well the train speed, punctuality, track...The existing Big Data of transport flows and railway operations can be mined through advanced statistical analysis and machine learning methods in order to describe and predict well the train speed, punctuality, track capacity and energy consumption. The accurate modelling of the real spatial and temporal distribution of line and network transport, traffic and performance stimulates a faster construction and implementation of robust and resilient timetables, as well as the development of efficient decision support tools for real-time rescheduling of train schedules. In combination with advanced train control and safety systems even (semi-.) automatic piloting of trains on main and regional railway lines will become feasible in near future.展开更多
Based on the analysis of the high-order compatibility optimization method proposed by predecessors, a new training image optimization method based on data event repetition probability is proposed. The basic idea is to...Based on the analysis of the high-order compatibility optimization method proposed by predecessors, a new training image optimization method based on data event repetition probability is proposed. The basic idea is to extract the data event contained in the condition data and calculate the number of repetitions of the extracted data events and their repetition probability in the training image to obtain two statistical indicators, unmatched ratio and repeated probability variance of data events. The two statistical indicators are used to characterize the diversity and stability of the sedimentary model in the training image and evaluate the matching of the geological volume spatial structure contained in data of the well block to be modeled. The unmatched ratio reflects the completeness of geological model in training image, which is the first choice index. The repeated probability variance reflects the stationarity index of geological model of each training image, and is an auxiliary index. Then, we can integrate the above two indexes to achieve the optimization of training image. Multiple sets of theoretical model tests show that the training image with small variance and low no-matching ratio is the optimal training image. The method is used to optimize the training image of turbidite channel in Plutonio oilfield in Angola. The geological model established by this method is in good agreement with the seismic attributes and can better reproduce the morphological characteristics of the channels and distribution pattern of sands.展开更多
Typical supervised classification techniques require training instances similar to the values that need to be classified. This research proposes a methodology that can utilize training instances found in a different f...Typical supervised classification techniques require training instances similar to the values that need to be classified. This research proposes a methodology that can utilize training instances found in a different format. The benefit of this approach is that it allows the use of traditional classification techniques, without the need to hand-tag training instances if the information exists in other data sources. The proposed approach is presented through a practical classification application. The evaluation results show that the approach is viable, and that the segmentation of classifiers can greatly improve accuracy.展开更多
Data-driven methods are widely considered for fault diagnosis in complex systems.However,in practice,the between-class imbalance due to limited faulty samples may deteriorate their classification performance.To addres...Data-driven methods are widely considered for fault diagnosis in complex systems.However,in practice,the between-class imbalance due to limited faulty samples may deteriorate their classification performance.To address this issue,synthetic minority methods for enhancing data have been proved to be effective in many applications.Generative adversarial networks(GANs),capable of automatic features extraction,can also be adopted for augmenting the faulty samples.However,the monitoring data of a complex system may include not only continuous signals but also discrete/categorical signals.Since the current GAN methods still have some challenges in handling such heterogeneous monitoring data,a Mixed Dual Discriminator GAN(noted as M-D2GAN)is proposed in this work.In order to render the expanded fault samples more aligned with the real situation and improve the accuracy and robustness of the fault diagnosis model,different types of variables are generated in different ways,including floating-point,integer,categorical,and hierarchical.For effectively considering the class imbalance problem,proper modifications are made to the GAN model,where a normal class discriminator is added.A practical case study concerning the braking system of a high-speed train is carried out to verify the effectiveness of the proposed framework.Compared to the classic GAN,the proposed framework achieves better results with respect to F-measure and G-mean metrics.展开更多
The application of Global Navigation Satellite Systems(GNSSs)in the intelligent railway systems is rapidly developing all over the world.With the GNSs-based train positioning and moving state perception,the autonomy a...The application of Global Navigation Satellite Systems(GNSSs)in the intelligent railway systems is rapidly developing all over the world.With the GNSs-based train positioning and moving state perception,the autonomy and flexibility of a novel train control system can be greatly enhanced over the existing solutions relying on the track-side facilities.Considering the safety critical features of the railway signaling applications,the GNSS stand-alone mode may not be sufficient to satisfy the practical requirements.In this paper,the key technologies for applying GNSS in novel train-centric railway signaling systems are investigated,including the multi-sensor data fusion,Virtual Balise(VB)capturing and messaging,train integrity monitoring and system performance evaluation.According to the practical characteristics of the novel train control system under the moving block mode,the details of the key technologies are introduced.Field demonstration results of a novel train control system using the presented technologies under the practical railway operation conditions are presented to illustrate the achievable performance feature of autonomous train state perception using BeiDou Navigation Satellite System(BDS)and related solutions.It reveals the great potentials of these key technologies in the next generation train control system and other GNSS-based railway implementations.展开更多
文摘In this paper, we present a study on activity functions for an MLNN (multi-layered neural network) and propose a suitable activity function for data enlargement processing. We have carefully studied the training performance of Sigmoid, ReLu, Leaky-ReLu and L & exp. activity functions for few inputs to multiple output training patterns. Our MLNNs model has L hidden layers with two or three inputs to four or six outputs data variations by BP (backpropagation) NN (neural network) training. We focused on the multi teacher training signals to investigate and evaluate the training performance in MLNNs to select the best and good activity function for data enlargement and hence could be applicable for image and signal processing (synaptic divergence) along with the proposed methods with convolution networks. We specifically used four activity functions from which we found out that L & exp. activity function can suite DENN (data enlargement neural network) training since it could give the highest percentage training abilities compared to the other activity functions of Sigmoid, ReLu and Leaky-ReLu during simulation and training of data in the network. And finally, we recommend L & exp. function to be good for MLNNs and may be applicable for signal processing of data and information enlargement because of its performance training characteristics with multiple teacher training patterns using original generated data and hence can be tried with CNN (convolution neural networks) of image processing.
基金supported by National Natural Science Foundation of China(62371098)Natural Science Foundation of Sichuan Province(2023NSFSC1422)+1 种基金National Key Research and Development Program of China(2021YFB2900404)Central Universities of South west Minzu University(ZYN2022032).
文摘In recent years,deep learning-based signal recognition technology has gained attention and emerged as an important approach for safeguarding the electromagnetic environment.However,training deep learning-based classifiers on large signal datasets with redundant samples requires significant memory and high costs.This paper proposes a support databased core-set selection method(SD)for signal recognition,aiming to screen a representative subset that approximates the large signal dataset.Specifically,this subset can be identified by employing the labeled information during the early stages of model training,as some training samples are labeled as supporting data frequently.This support data is crucial for model training and can be found using a border sample selector.Simulation results demonstrate that the SD method minimizes the impact on model recognition performance while reducing the dataset size,and outperforms five other state-of-the-art core-set selection methods when the fraction of training sample kept is less than or equal to 0.3 on the RML2016.04C dataset or 0.5 on the RML22 dataset.The SD method is particularly helpful for signal recognition tasks with limited memory and computing resources.
基金Supported by the National Natural Science Foundation of China (No.60772087,No.50803016,No.60975004,No.60902023)the Foundation for the Author of National Excellent Doctoral Dissertation of P.R. China (No.200341)+1 种基金the National 863 High-Tech R&D Program (No.2007AA01Z 228)the open research fund of Key Laboratory of Information Coding and Transmission,Southwest Jiaotong University
文摘Compared with channel estimation method based on explicit training sequences,bandwidth is saved for those methods using superimposed training sequences,while it is wasted when Cyclic Prefix(CP) is added.In previous work of McLernon,the Mean Square Error(MSE) performance of Data-Dependent Superimposed Training(DDST) without CP for Single-Input Single-Output(SISO) system was analyzed under the assumption that the data-dependent sequence matrix was a circulant matrix and not interfered by others.In fact,for the system without CP,the data-dependent sequence matrix is not circulant any more and will be interfered.This paper derives the exact expression of MSE for the system without CP and also gives its extension to Multiple-Input Multiple-Output(MIMO) system without CP.
文摘Tri-training能有效利用无标记样例提高泛化能力.针对Tri-training迭代中无标记样例常被错误标记而形成训练集噪声,导致性能不稳定的缺点,文中提出ADE-Tri-training(Tri-training with Adaptive Data Editing)新算法.它不仅利用Remove Only剪辑操作对每次迭代可能产生的误标记样例识别并移除,更重要的是采用自适应策略来确定Remove Only触发与抑制的恰当时机.文中证明,PAC理论下自适应策略中一系列判别充分条件可同时确保新训练集规模迭代增大和新假设分类错误率迭代降低更多.UCI数据集上实验结果表明:ADE-Tri-training具有更好的分类泛化性能和健壮性.
基金Supported by the National Natural Science Foundation of China under Grant Nos.60702033 60772076 (国家自然科学基金)+3 种基金the National High-Tech Research and Development Plan of China under Grant No.2007AA01Z171 (国家高技术研究发展计划(863)the Science Fund for Distinguished Young Scholars of Heilongjiang Province of China under Grant No.JC200611 (黑龙江省杰出青年科学基金)the Natural Science Foundation of Heilongjiang Province of China under Grant No.ZJG0705 (黑龙江省自然科学重点基金)the Foundation of Harbin Institute of Technology of China under Grant No.HIT.2003.53 (哈尔滨工业大学校基金)
文摘弱监督关系抽取利用已有关系实体对从文本集中自动获取训练数据,有效解决了训练数据不足的问题。针对弱监督训练数据存在噪声、特征不足和不平衡,导致关系抽取性能不高的问题,文中提出NF-Tri-training(Tritraining with Noise Filtering)弱监督关系抽取算法。它利用欠采样解决样本不平衡问题,基于Tri-training从未标注数据中迭代学习新的样本,提高分类器的泛化能力,采用数据编辑技术识别并移除初始训练数据和每次迭代产生的错标样本。在互动百科采集数据集上实验结果表明NF-Tri-training算法能够有效提升关系分类器的性能。
文摘The existing Big Data of transport flows and railway operations can be mined through advanced statistical analysis and machine learning methods in order to describe and predict well the train speed, punctuality, track capacity and energy consumption. The accurate modelling of the real spatial and temporal distribution of line and network transport, traffic and performance stimulates a faster construction and implementation of robust and resilient timetables, as well as the development of efficient decision support tools for real-time rescheduling of train schedules. In combination with advanced train control and safety systems even (semi-.) automatic piloting of trains on main and regional railway lines will become feasible in near future.
基金Supported by the China National Science and Technology Major Project(2016ZX05015001-001,2016ZX05033-003-002)
文摘Based on the analysis of the high-order compatibility optimization method proposed by predecessors, a new training image optimization method based on data event repetition probability is proposed. The basic idea is to extract the data event contained in the condition data and calculate the number of repetitions of the extracted data events and their repetition probability in the training image to obtain two statistical indicators, unmatched ratio and repeated probability variance of data events. The two statistical indicators are used to characterize the diversity and stability of the sedimentary model in the training image and evaluate the matching of the geological volume spatial structure contained in data of the well block to be modeled. The unmatched ratio reflects the completeness of geological model in training image, which is the first choice index. The repeated probability variance reflects the stationarity index of geological model of each training image, and is an auxiliary index. Then, we can integrate the above two indexes to achieve the optimization of training image. Multiple sets of theoretical model tests show that the training image with small variance and low no-matching ratio is the optimal training image. The method is used to optimize the training image of turbidite channel in Plutonio oilfield in Angola. The geological model established by this method is in good agreement with the seismic attributes and can better reproduce the morphological characteristics of the channels and distribution pattern of sands.
文摘Typical supervised classification techniques require training instances similar to the values that need to be classified. This research proposes a methodology that can utilize training instances found in a different format. The benefit of this approach is that it allows the use of traditional classification techniques, without the need to hand-tag training instances if the information exists in other data sources. The proposed approach is presented through a practical classification application. The evaluation results show that the approach is viable, and that the segmentation of classifiers can greatly improve accuracy.
文摘Data-driven methods are widely considered for fault diagnosis in complex systems.However,in practice,the between-class imbalance due to limited faulty samples may deteriorate their classification performance.To address this issue,synthetic minority methods for enhancing data have been proved to be effective in many applications.Generative adversarial networks(GANs),capable of automatic features extraction,can also be adopted for augmenting the faulty samples.However,the monitoring data of a complex system may include not only continuous signals but also discrete/categorical signals.Since the current GAN methods still have some challenges in handling such heterogeneous monitoring data,a Mixed Dual Discriminator GAN(noted as M-D2GAN)is proposed in this work.In order to render the expanded fault samples more aligned with the real situation and improve the accuracy and robustness of the fault diagnosis model,different types of variables are generated in different ways,including floating-point,integer,categorical,and hierarchical.For effectively considering the class imbalance problem,proper modifications are made to the GAN model,where a normal class discriminator is added.A practical case study concerning the braking system of a high-speed train is carried out to verify the effectiveness of the proposed framework.Compared to the classic GAN,the proposed framework achieves better results with respect to F-measure and G-mean metrics.
基金supported by National Key Research and Development Program of China(2022YFB4300501)National Natural Science Foundation of China(62027809,U2268206,T2222015).
文摘The application of Global Navigation Satellite Systems(GNSSs)in the intelligent railway systems is rapidly developing all over the world.With the GNSs-based train positioning and moving state perception,the autonomy and flexibility of a novel train control system can be greatly enhanced over the existing solutions relying on the track-side facilities.Considering the safety critical features of the railway signaling applications,the GNSS stand-alone mode may not be sufficient to satisfy the practical requirements.In this paper,the key technologies for applying GNSS in novel train-centric railway signaling systems are investigated,including the multi-sensor data fusion,Virtual Balise(VB)capturing and messaging,train integrity monitoring and system performance evaluation.According to the practical characteristics of the novel train control system under the moving block mode,the details of the key technologies are introduced.Field demonstration results of a novel train control system using the presented technologies under the practical railway operation conditions are presented to illustrate the achievable performance feature of autonomous train state perception using BeiDou Navigation Satellite System(BDS)and related solutions.It reveals the great potentials of these key technologies in the next generation train control system and other GNSS-based railway implementations.