Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due ...Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due to their extensive energy consumption during workload pro-cessing.Numerous research studies have examined distinct operating cost mitigation techniques for geo-distributed data centers(DCs).However,oper-ating cost savings during workload processing,which also considers string-matching techniques in geo-distributed DCs,remains unexplored.In this research,we propose a novel string matching-based geographical load balanc-ing(SMGLB)technique to mitigate the operating cost of the geo-distributed DC.The primary goal of this study is to use a string-matching algorithm(i.e.,Boyer Moore)to compare the contents of incoming workloads to those of documents that have already been processed in a data center.A successful match prevents the global load balancer from sending the user’s request to a data center for processing and displaying the results of the previously processed workload to the user to save energy.On the contrary,if no match can be discovered,the global load balancer will allocate the incoming workload to a specific DC for processing considering variable energy prices,the number of active servers,on-site green energy,and traces of incoming workload.The results of numerical evaluations show that the SMGLB can minimize the operating expenses of the geo-distributed data centers more than the existing workload distribution techniques.展开更多
This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both t...This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both the string matching exercise and the ICA technique were employed for a big data project carried out by the CSO. The project was called the SESADP (Structure of Earnings Survey Administrative Data Project) and involved linking the Irish Census dataset 2011 to a large Public Sector Dataset. The ICA technique provides a mathematical tool to link the datasets and the matching rate for an exact match can be calculated before the matching process begins. Based on the number of variables and the size of the population, the matching rate is calculated in the ICA approach from the MRUI (Matching Rate for Unique Identifier) formula, and false positives are eliminated. No string matching is used in the ICA, therefore names are not required on the dataset, making the data more secure & ensuring confidentiality. The SESADP Project was highly successful using the ICA technique. A comparison of the results using a string matching exercise for the SESADP and the ICA are discussed here.展开更多
Identification of reservoir types in deep carbonates has always been a great challenge due to complex logging responses caused by the heterogeneous scale and distribution of storage spaces.Traditional cross-plot analy...Identification of reservoir types in deep carbonates has always been a great challenge due to complex logging responses caused by the heterogeneous scale and distribution of storage spaces.Traditional cross-plot analysis and empirical formula methods for identifying reservoir types using geophysical logging data have high uncertainty and low efficiency,which cannot accurately reflect the nonlinear relationship between reservoir types and logging data.Recently,the kernel Fisher discriminant analysis(KFD),a kernel-based machine learning technique,attracts attention in many fields because of its strong nonlinear processing ability.However,the overall performance of KFD model may be limited as a single kernel function cannot simultaneously extrapolate and interpolate well,especially for highly complex data cases.To address this issue,in this study,a mixed kernel Fisher discriminant analysis(MKFD)model was established and applied to identify reservoir types of the deep Sinian carbonates in central Sichuan Basin,China.The MKFD model was trained and tested with 453 datasets from 7 coring wells,utilizing GR,CAL,DEN,AC,CNL and RT logs as input variables.The particle swarm optimization(PSO)was adopted for hyper-parameter optimization of MKFD model.To evaluate the model performance,prediction results of MKFD were compared with those of basic-kernel based KFD,RF and SVM models.Subsequently,the built MKFD model was applied in a blind well test,and a variable importance analysis was conducted.The comparison and blind test results demonstrated that MKFD outperformed traditional KFD,RF and SVM in the identification of reservoir types,which provided higher accuracy and stronger generalization.The MKFD can therefore be a reliable method for identifying reservoir types of deep carbonates.展开更多
Highly accurate vegetative type distribution information is of great significance for forestry resource monitoring and management.In order to improve the classification accuracy of forest types,Sentinel-1 and 2 data o...Highly accurate vegetative type distribution information is of great significance for forestry resource monitoring and management.In order to improve the classification accuracy of forest types,Sentinel-1 and 2 data of Changbai Mountain protection development zone were selected,and combined with DEM to construct a multi-featured random forest type classification model incorporating fusing intensity,texture,spectral,vegetation index and topography information and using random forest Gini index(GI)for optimization.The overall accuracy of classification was 94.60%and the Kappa coefficient was 0.933.Comparing the classification results before and after feature optimization,it shows that feature optimization has a greater impact on the classification accuracy.Comparing the classification results of random forest,maximum likelihood method and CART decision tree under the same conditions,it shows that the random forest has a higher performance and can be applied to forestry research work such as forest resource survey and monitoring.展开更多
Background:Erzhu Erchen decoction(EZECD),which is based on Erchen decoction and enhanced with Atractylodes lancea and Atractylodes macrocephala,is widely used for the treatment of dampness and heat(The clinical manife...Background:Erzhu Erchen decoction(EZECD),which is based on Erchen decoction and enhanced with Atractylodes lancea and Atractylodes macrocephala,is widely used for the treatment of dampness and heat(The clinical manifestations of Western medicine include thirst,inability to drink more,diarrhea,yellow urine,red tongue,et al.)internalized disease.Nevertheless,the mechanism of EZECD on damp-heat internalized Type 2 diabetes(T2D)remains unknown.We employed data mining,pharmacology databases and experimental verification to study how EZECD treats damp-heat internalized T2D.Methods:The main compounds or genes of EZECD and damp-heat internalized T2D were obtained from the pharmacology databases.Succeeding,the overlapped targets of EZECD and damp-heat internalized T2D were performed by the Gene Ontology,kyoto encyclopedia of genes and genomes analysis.And the compound-disease targets-pathway network were constructed to obtain the hub compound.Moreover,the hub genes and core related pathways were mined with weighted gene co-expression network analysis based on Gene Expression Omnibus database,the capability of hub compound and genes was valid in AutoDock 1.5.7.Furthermore,and violin plot and gene set enrichment analysis were performed to explore the role of hub genes in damp-heat internalized T2D.Finally,the interactions of hub compound and genes were explored using Comparative Toxicogenomics Database and quantitative polymerase chain reaction.Results:First,herb-compounds-genes-disease network illustrated that the hub compound of EZECD for damp-heat internalized T2D could be quercetin.Consistently,the hub genes were CASP8,CCL2,and AHR according to weighted gene co-expression network analysis.Molecular docking showed that quercetin could bind with the hub genes.Further,gene set enrichment analysis and Gene Ontology represented that CASP8,or CCL2,is negatively involved in insulin secretion response to the TNF or lipopolysaccharide process,and AHR or CCL2 positively regulated lipid and atherosclerosis,and/or including NOD-like receptor signaling pathway,and TNF signaling pathway.Ultimately,the quantitative polymerase chain reaction and western blotting analysis showed that quercetin could down-regulated the mRNA and protein experssion of CASP8,CCL2,and AHR.It was consistent with the results in Comparative Toxicogenomics Database databases.Conclusion:These results demonstrated quercetin could inhibit the expression of CASP8,CCL2,AHR in damp-heat internalized T2D,which improves insulin secretion and inhibits lipid and atherosclerosis,as well as/or including NOD-like receptor signaling pathway,and TNF signaling pathway,suggesting that EZECD may be more effective to treat damp-heat internalized T2D.展开更多
Boosted by a strong solar power market,the electricity grid is exposed to risk under an increasing share of fluctuant solar power.To increase the stability of the electricity grid,an accurate solar power forecast is n...Boosted by a strong solar power market,the electricity grid is exposed to risk under an increasing share of fluctuant solar power.To increase the stability of the electricity grid,an accurate solar power forecast is needed to evaluate such fluctuations.In terms of forecast,solar irradiance is the key factor of solar power generation,which is affected by atmospheric conditions,including surface meteorological variables and column integrated variables.These variables involve multiple numerical timeseries and images.However,few studies have focused on the processing method of multiple data types in an interhour direct normal irradiance(DNI)forecast.In this study,a framework for predicting the DNI for a 10-min time horizon was developed,which included the nondimensionalization of multiple data types and time-series,development of a forecast model,and transformation of the outputs.Several atmospheric variables were considered in the forecast framework,including the historical DNI,wind speed and direction,relative humidity time-series,and ground-based cloud images.Experiments were conducted to evaluate the performance of the forecast framework.The experimental results demonstrate that the proposed method performs well with a normalized mean bias error of 0.41%and a normalized root mean square error(n RMSE)of20.53%,and outperforms the persistent model with an improvement of 34%in the nRMSE.展开更多
DNA molecules are green materials with great potential for high-density and long-term data storage.However,the current data-writing process of DNA data storage via DNA synthesis suffers from high costs and the product...DNA molecules are green materials with great potential for high-density and long-term data storage.However,the current data-writing process of DNA data storage via DNA synthesis suffers from high costs and the production of hazards,limiting its practical applications.Here,we developed a DNA movable-type storage system that can utilize DNA fragments pre-produced by cell factories for data writing.In this system,these pre-generated DNA fragments,referred to herein as“DNA movable types,”are used as basic writing units in a repetitive way.The process of data writing is achieved by the rapid assembly of these DNA movable types,thereby avoiding the costly and environmentally hazardous process of de novo DNA synthesis.With this system,we successfully encoded 24 bytes of digital information in DNA and read it back accurately by means of high-throughput sequencing and decoding,thereby demonstrating the feasibility of this system.Through its repetitive usage and biological assembly of DNA movable-type fragments,this system exhibits excellent potential for writing cost reduction,opening up a novel route toward an economical and sustainable digital data-storage technology.展开更多
Background:To systematically summarize and categorize the Chinese herbal medicine in the domestic traditional Chinese medicine(TCM)literature on type 2 diabetes mellitus(T2DM),in this paper,we mine traditional Chinese...Background:To systematically summarize and categorize the Chinese herbal medicine in the domestic traditional Chinese medicine(TCM)literature on type 2 diabetes mellitus(T2DM),in this paper,we mine traditional Chinese medicine data for relationships and provide for future practitioners and researchers.Methods:Taking randomized controlled trials on the treatment of T2DM in TCM as the research theme,we searched for full-text literature in three major clinical databases,including CNKI,Wan Fang,and VIP,published between 1990 and 2020.We then conducted frequency statistics,cluster analysis,association rules extraction,and principal component analysis based on a corpus of medical academic words extracted from 1116 research articles.Results:The most frequently used is Astragali Radix,and the most commonly used two-herb combination in T2DM treatment consisted of Coptidis Rhizoma and Moutan Cortex.Moutan Cortex,Alismatis Rhizoma,and Dioscoreae Rhizoma were the most frequently used three-herb combination.We found a“lung”and“liver”and“kidney”model and confirmed the value of classical meridian tropism theory and pattern identification.The treatment is mainly to fill deficiency and clear heat and consider water infiltration,dampness,blood circulation,and silt.Conclusion:This study provides an in-depth perspective on the TCM medication rules for T2DM and offers practitioners and researchers valuable information about the current status and frontier trends of TCM research on T2DM in terms of diagnosis and treatment.展开更多
大数据时代,流数据大量涌现.概念漂移作为流数据挖掘中最典型且困难的问题,受到了越来越广泛的关注.集成学习是处理流数据中概念漂移的常用方法,然而在漂移发生后,学习模型往往无法对流数据的分布变化做出及时响应,且不能有效处理不同...大数据时代,流数据大量涌现.概念漂移作为流数据挖掘中最典型且困难的问题,受到了越来越广泛的关注.集成学习是处理流数据中概念漂移的常用方法,然而在漂移发生后,学习模型往往无法对流数据的分布变化做出及时响应,且不能有效处理不同类型概念漂移,导致模型泛化性能下降.针对这个问题,提出一种面向不同类型概念漂移的两阶段自适应集成学习方法(two-stage adaptive ensemble learning method for different types of concept drift,TAEL).该方法首先通过检测漂移跨度来判断概念漂移类型,然后根据不同漂移类型,提出“过滤-扩充”两阶段样本处理机制动态选择合适的样本处理策略.具体地,在过滤阶段,针对不同漂移类型,创建不同的非关键样本过滤器,提取历史样本块中的关键样本,使历史数据分布更接近最新数据分布,提高基学习器有效性;在扩充阶段,提出一种分块优先抽样方法,针对不同漂移类型设置合适的抽取规模,并根据历史关键样本所属类别在当前样本块上的规模占比设置抽样优先级,再由抽样优先级确定抽样概率,依据抽样概率从历史关键样本块中抽取关键样本子集扩充当前样本块,缓解样本扩充后的类别不平衡现象,解决当前基学习器欠拟合问题的同时增强其稳定性.实验结果表明,所提方法能够对不同类型的概念漂移做出及时响应,加快漂移发生后在线集成模型的收敛速度,提高模型的整体泛化性能.展开更多
文摘Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due to their extensive energy consumption during workload pro-cessing.Numerous research studies have examined distinct operating cost mitigation techniques for geo-distributed data centers(DCs).However,oper-ating cost savings during workload processing,which also considers string-matching techniques in geo-distributed DCs,remains unexplored.In this research,we propose a novel string matching-based geographical load balanc-ing(SMGLB)technique to mitigate the operating cost of the geo-distributed DC.The primary goal of this study is to use a string-matching algorithm(i.e.,Boyer Moore)to compare the contents of incoming workloads to those of documents that have already been processed in a data center.A successful match prevents the global load balancer from sending the user’s request to a data center for processing and displaying the results of the previously processed workload to the user to save energy.On the contrary,if no match can be discovered,the global load balancer will allocate the incoming workload to a specific DC for processing considering variable energy prices,the number of active servers,on-site green energy,and traces of incoming workload.The results of numerical evaluations show that the SMGLB can minimize the operating expenses of the geo-distributed data centers more than the existing workload distribution techniques.
文摘This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both the string matching exercise and the ICA technique were employed for a big data project carried out by the CSO. The project was called the SESADP (Structure of Earnings Survey Administrative Data Project) and involved linking the Irish Census dataset 2011 to a large Public Sector Dataset. The ICA technique provides a mathematical tool to link the datasets and the matching rate for an exact match can be calculated before the matching process begins. Based on the number of variables and the size of the population, the matching rate is calculated in the ICA approach from the MRUI (Matching Rate for Unique Identifier) formula, and false positives are eliminated. No string matching is used in the ICA, therefore names are not required on the dataset, making the data more secure & ensuring confidentiality. The SESADP Project was highly successful using the ICA technique. A comparison of the results using a string matching exercise for the SESADP and the ICA are discussed here.
基金supported by the National Natural Science Foundation of China(No.U21B2062)the Natural Science Foundation of Hubei Province(No.2023AFB307)。
文摘Identification of reservoir types in deep carbonates has always been a great challenge due to complex logging responses caused by the heterogeneous scale and distribution of storage spaces.Traditional cross-plot analysis and empirical formula methods for identifying reservoir types using geophysical logging data have high uncertainty and low efficiency,which cannot accurately reflect the nonlinear relationship between reservoir types and logging data.Recently,the kernel Fisher discriminant analysis(KFD),a kernel-based machine learning technique,attracts attention in many fields because of its strong nonlinear processing ability.However,the overall performance of KFD model may be limited as a single kernel function cannot simultaneously extrapolate and interpolate well,especially for highly complex data cases.To address this issue,in this study,a mixed kernel Fisher discriminant analysis(MKFD)model was established and applied to identify reservoir types of the deep Sinian carbonates in central Sichuan Basin,China.The MKFD model was trained and tested with 453 datasets from 7 coring wells,utilizing GR,CAL,DEN,AC,CNL and RT logs as input variables.The particle swarm optimization(PSO)was adopted for hyper-parameter optimization of MKFD model.To evaluate the model performance,prediction results of MKFD were compared with those of basic-kernel based KFD,RF and SVM models.Subsequently,the built MKFD model was applied in a blind well test,and a variable importance analysis was conducted.The comparison and blind test results demonstrated that MKFD outperformed traditional KFD,RF and SVM in the identification of reservoir types,which provided higher accuracy and stronger generalization.The MKFD can therefore be a reliable method for identifying reservoir types of deep carbonates.
基金Supported by projects of National Natural Science Foundation of China(Nos.42171407,42077242)Natural Science Foundation of Jilin Province(No.20210101098JC)+1 种基金Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,MNR(No.KF-2020-05-024)National Key R&D Program of China(No.2021YFD1500100).
文摘Highly accurate vegetative type distribution information is of great significance for forestry resource monitoring and management.In order to improve the classification accuracy of forest types,Sentinel-1 and 2 data of Changbai Mountain protection development zone were selected,and combined with DEM to construct a multi-featured random forest type classification model incorporating fusing intensity,texture,spectral,vegetation index and topography information and using random forest Gini index(GI)for optimization.The overall accuracy of classification was 94.60%and the Kappa coefficient was 0.933.Comparing the classification results before and after feature optimization,it shows that feature optimization has a greater impact on the classification accuracy.Comparing the classification results of random forest,maximum likelihood method and CART decision tree under the same conditions,it shows that the random forest has a higher performance and can be applied to forestry research work such as forest resource survey and monitoring.
基金supported by a grant from Hubei Key Laboratory of Diabetes and Angiopathy Program of Hubei University of Science and Technology(2020XZ10)Project of Education Commission of Hubei Province(B2022192).
文摘Background:Erzhu Erchen decoction(EZECD),which is based on Erchen decoction and enhanced with Atractylodes lancea and Atractylodes macrocephala,is widely used for the treatment of dampness and heat(The clinical manifestations of Western medicine include thirst,inability to drink more,diarrhea,yellow urine,red tongue,et al.)internalized disease.Nevertheless,the mechanism of EZECD on damp-heat internalized Type 2 diabetes(T2D)remains unknown.We employed data mining,pharmacology databases and experimental verification to study how EZECD treats damp-heat internalized T2D.Methods:The main compounds or genes of EZECD and damp-heat internalized T2D were obtained from the pharmacology databases.Succeeding,the overlapped targets of EZECD and damp-heat internalized T2D were performed by the Gene Ontology,kyoto encyclopedia of genes and genomes analysis.And the compound-disease targets-pathway network were constructed to obtain the hub compound.Moreover,the hub genes and core related pathways were mined with weighted gene co-expression network analysis based on Gene Expression Omnibus database,the capability of hub compound and genes was valid in AutoDock 1.5.7.Furthermore,and violin plot and gene set enrichment analysis were performed to explore the role of hub genes in damp-heat internalized T2D.Finally,the interactions of hub compound and genes were explored using Comparative Toxicogenomics Database and quantitative polymerase chain reaction.Results:First,herb-compounds-genes-disease network illustrated that the hub compound of EZECD for damp-heat internalized T2D could be quercetin.Consistently,the hub genes were CASP8,CCL2,and AHR according to weighted gene co-expression network analysis.Molecular docking showed that quercetin could bind with the hub genes.Further,gene set enrichment analysis and Gene Ontology represented that CASP8,or CCL2,is negatively involved in insulin secretion response to the TNF or lipopolysaccharide process,and AHR or CCL2 positively regulated lipid and atherosclerosis,and/or including NOD-like receptor signaling pathway,and TNF signaling pathway.Ultimately,the quantitative polymerase chain reaction and western blotting analysis showed that quercetin could down-regulated the mRNA and protein experssion of CASP8,CCL2,and AHR.It was consistent with the results in Comparative Toxicogenomics Database databases.Conclusion:These results demonstrated quercetin could inhibit the expression of CASP8,CCL2,AHR in damp-heat internalized T2D,which improves insulin secretion and inhibits lipid and atherosclerosis,as well as/or including NOD-like receptor signaling pathway,and TNF signaling pathway,suggesting that EZECD may be more effective to treat damp-heat internalized T2D.
基金supported by the National Key Research and Development Program of China(No.2018YFB1500803)National Natural Science Foundation of China(No.61773118,No.61703100)Fundamental Research Funds for Central Universities.
文摘Boosted by a strong solar power market,the electricity grid is exposed to risk under an increasing share of fluctuant solar power.To increase the stability of the electricity grid,an accurate solar power forecast is needed to evaluate such fluctuations.In terms of forecast,solar irradiance is the key factor of solar power generation,which is affected by atmospheric conditions,including surface meteorological variables and column integrated variables.These variables involve multiple numerical timeseries and images.However,few studies have focused on the processing method of multiple data types in an interhour direct normal irradiance(DNI)forecast.In this study,a framework for predicting the DNI for a 10-min time horizon was developed,which included the nondimensionalization of multiple data types and time-series,development of a forecast model,and transformation of the outputs.Several atmospheric variables were considered in the forecast framework,including the historical DNI,wind speed and direction,relative humidity time-series,and ground-based cloud images.Experiments were conducted to evaluate the performance of the forecast framework.The experimental results demonstrate that the proposed method performs well with a normalized mean bias error of 0.41%and a normalized root mean square error(n RMSE)of20.53%,and outperforms the persistent model with an improvement of 34%in the nRMSE.
基金supported by the National Key Research and Development Program of China(2018YFA0900100)the Natural Science Foundation of Tianjin,China(19JCJQJC63300)Tianjin University。
文摘DNA molecules are green materials with great potential for high-density and long-term data storage.However,the current data-writing process of DNA data storage via DNA synthesis suffers from high costs and the production of hazards,limiting its practical applications.Here,we developed a DNA movable-type storage system that can utilize DNA fragments pre-produced by cell factories for data writing.In this system,these pre-generated DNA fragments,referred to herein as“DNA movable types,”are used as basic writing units in a repetitive way.The process of data writing is achieved by the rapid assembly of these DNA movable types,thereby avoiding the costly and environmentally hazardous process of de novo DNA synthesis.With this system,we successfully encoded 24 bytes of digital information in DNA and read it back accurately by means of high-throughput sequencing and decoding,thereby demonstrating the feasibility of this system.Through its repetitive usage and biological assembly of DNA movable-type fragments,this system exhibits excellent potential for writing cost reduction,opening up a novel route toward an economical and sustainable digital data-storage technology.
基金supported by China’s National Key R&D Program,NO.2019YFC1709801.
文摘Background:To systematically summarize and categorize the Chinese herbal medicine in the domestic traditional Chinese medicine(TCM)literature on type 2 diabetes mellitus(T2DM),in this paper,we mine traditional Chinese medicine data for relationships and provide for future practitioners and researchers.Methods:Taking randomized controlled trials on the treatment of T2DM in TCM as the research theme,we searched for full-text literature in three major clinical databases,including CNKI,Wan Fang,and VIP,published between 1990 and 2020.We then conducted frequency statistics,cluster analysis,association rules extraction,and principal component analysis based on a corpus of medical academic words extracted from 1116 research articles.Results:The most frequently used is Astragali Radix,and the most commonly used two-herb combination in T2DM treatment consisted of Coptidis Rhizoma and Moutan Cortex.Moutan Cortex,Alismatis Rhizoma,and Dioscoreae Rhizoma were the most frequently used three-herb combination.We found a“lung”and“liver”and“kidney”model and confirmed the value of classical meridian tropism theory and pattern identification.The treatment is mainly to fill deficiency and clear heat and consider water infiltration,dampness,blood circulation,and silt.Conclusion:This study provides an in-depth perspective on the TCM medication rules for T2DM and offers practitioners and researchers valuable information about the current status and frontier trends of TCM research on T2DM in terms of diagnosis and treatment.
文摘大数据时代,流数据大量涌现.概念漂移作为流数据挖掘中最典型且困难的问题,受到了越来越广泛的关注.集成学习是处理流数据中概念漂移的常用方法,然而在漂移发生后,学习模型往往无法对流数据的分布变化做出及时响应,且不能有效处理不同类型概念漂移,导致模型泛化性能下降.针对这个问题,提出一种面向不同类型概念漂移的两阶段自适应集成学习方法(two-stage adaptive ensemble learning method for different types of concept drift,TAEL).该方法首先通过检测漂移跨度来判断概念漂移类型,然后根据不同漂移类型,提出“过滤-扩充”两阶段样本处理机制动态选择合适的样本处理策略.具体地,在过滤阶段,针对不同漂移类型,创建不同的非关键样本过滤器,提取历史样本块中的关键样本,使历史数据分布更接近最新数据分布,提高基学习器有效性;在扩充阶段,提出一种分块优先抽样方法,针对不同漂移类型设置合适的抽取规模,并根据历史关键样本所属类别在当前样本块上的规模占比设置抽样优先级,再由抽样优先级确定抽样概率,依据抽样概率从历史关键样本块中抽取关键样本子集扩充当前样本块,缓解样本扩充后的类别不平衡现象,解决当前基学习器欠拟合问题的同时增强其稳定性.实验结果表明,所提方法能够对不同类型的概念漂移做出及时响应,加快漂移发生后在线集成模型的收敛速度,提高模型的整体泛化性能.