期刊文献+
共找到369,412篇文章
< 1 2 250 >
每页显示 20 50 100
A method for cleaning wind power anomaly data by combining image processing with community detection algorithms
1
作者 Qiaoling Yang Kai Chen +2 位作者 Jianzhang Man Jiaheng Duan Zuoqi Jin 《Global Energy Interconnection》 EI CSCD 2024年第3期293-312,共20页
Current methodologies for cleaning wind power anomaly data exhibit limited capabilities in identifying abnormal data within extensive datasets and struggle to accommodate the considerable variability and intricacy of ... Current methodologies for cleaning wind power anomaly data exhibit limited capabilities in identifying abnormal data within extensive datasets and struggle to accommodate the considerable variability and intricacy of wind farm data.Consequently,a method for cleaning wind power anomaly data by combining image processing with community detection algorithms(CWPAD-IPCDA)is proposed.To precisely identify and initially clean anomalous data,wind power curve(WPC)images are converted into graph structures,which employ the Louvain community recognition algorithm and graph-theoretic methods for community detection and segmentation.Furthermore,the mathematical morphology operation(MMO)determines the main part of the initially cleaned wind power curve images and maps them back to the normal wind power points to complete the final cleaning.The CWPAD-IPCDA method was applied to clean datasets from 25 wind turbines(WTs)in two wind farms in northwest China to validate its feasibility.A comparison was conducted using density-based spatial clustering of applications with noise(DBSCAN)algorithm,an improved isolation forest algorithm,and an image-based(IB)algorithm.The experimental results demonstrate that the CWPAD-IPCDA method surpasses the other three algorithms,achieving an approximately 7.23%higher average data cleaning rate.The mean value of the sum of the squared errors(SSE)of the dataset after cleaning is approximately 6.887 lower than that of the other algorithms.Moreover,the mean of overall accuracy,as measured by the F1-score,exceeds that of the other methods by approximately 10.49%;this indicates that the CWPAD-IPCDA method is more conducive to improving the accuracy and reliability of wind power curve modeling and wind farm power forecasting. 展开更多
关键词 Wind turbine power curve Abnormal data cleaning Community detection Louvain algorithm Mathematical morphology operation
下载PDF
A novel method for clustering cellular data to improve classification
2
作者 Diek W.Wheeler Giorgio A.Ascoli 《Neural Regeneration Research》 SCIE CAS 2025年第9期2697-2705,共9页
Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse... Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons. 展开更多
关键词 cellular data clustering dendrogram data classification Levene's one-tailed statistical test unsupervised hierarchical clustering
下载PDF
A New Encryption Mechanism Supporting the Update of Encrypted Data for Secure and Efficient Collaboration in the Cloud Environment
3
作者 Chanhyeong Cho Byeori Kim +1 位作者 Haehyun Cho Taek-Young Youn 《Computer Modeling in Engineering & Sciences》 SCIE EI 2025年第1期813-834,共22页
With the rise of remote collaboration,the demand for advanced storage and collaboration tools has rapidly increased.However,traditional collaboration tools primarily rely on access control,leaving data stored on cloud... With the rise of remote collaboration,the demand for advanced storage and collaboration tools has rapidly increased.However,traditional collaboration tools primarily rely on access control,leaving data stored on cloud servers vulnerable due to insufficient encryption.This paper introduces a novel mechanism that encrypts data in‘bundle’units,designed to meet the dual requirements of efficiency and security for frequently updated collaborative data.Each bundle includes updated information,allowing only the updated portions to be reencrypted when changes occur.The encryption method proposed in this paper addresses the inefficiencies of traditional encryption modes,such as Cipher Block Chaining(CBC)and Counter(CTR),which require decrypting and re-encrypting the entire dataset whenever updates occur.The proposed method leverages update-specific information embedded within data bundles and metadata that maps the relationship between these bundles and the plaintext data.By utilizing this information,the method accurately identifies the modified portions and applies algorithms to selectively re-encrypt only those sections.This approach significantly enhances the efficiency of data updates while maintaining high performance,particularly in large-scale data environments.To validate this approach,we conducted experiments measuring execution time as both the size of the modified data and the total dataset size varied.Results show that the proposed method significantly outperforms CBC and CTR modes in execution speed,with greater performance gains as data size increases.Additionally,our security evaluation confirms that this method provides robust protection against both passive and active attacks. 展开更多
关键词 Cloud collaboration mode of operation data update efficiency
下载PDF
Synthetic data as an investigative tool in hypertension and renal diseases research
4
作者 Aleena Jamal Som Singh Fawad Qureshi 《World Journal of Methodology》 2025年第1期9-13,共5页
There is a growing body of clinical research on the utility of synthetic data derivatives,an emerging research tool in medicine.In nephrology,clinicians can use machine learning and artificial intelligence as powerful... There is a growing body of clinical research on the utility of synthetic data derivatives,an emerging research tool in medicine.In nephrology,clinicians can use machine learning and artificial intelligence as powerful aids in their clinical decision-making while also preserving patient privacy.This is especially important given the epidemiology of chronic kidney disease,renal oncology,and hypertension worldwide.However,there remains a need to create a framework for guidance regarding how to better utilize synthetic data as a practical application in this research. 展开更多
关键词 Synthetic data Artificial intelligence NEPHROLOGY Blood pressure RESEARCH EDITORIAL
下载PDF
Data cleaning method for the process of acid production with flue gas based on improved random forest 被引量:3
5
作者 Xiaoli Li Minghua Liu +2 位作者 Kang Wang Zhiqiang Liu Guihai Li 《Chinese Journal of Chemical Engineering》 SCIE EI CAS CSCD 2023年第7期72-84,共13页
Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the op... Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the operating environment of acid production with flue gas is complex and there is much equipment.The data obtained by the detection equipment is seriously polluted and prone to abnormal phenomena such as data loss and outliers.Therefore,to solve the problem of abnormal data in the process of acid production with flue gas,a data cleaning method based on improved random forest is proposed.Firstly,an outlier data recognition model based on isolation forest is designed to identify and eliminate the outliers in the dataset.Secondly,an improved random forest regression model is established.Genetic algorithm is used to optimize the hyperparameters of the random forest regression model.Then the optimal parameter combination is found in the search space and the trend of data is predicted.Finally,the improved random forest data cleaning method is used to compensate for the missing data after eliminating abnormal data and the data cleaning is realized.Results show that the proposed method can accurately eliminate and compensate for the abnormal data in the process of acid production with flue gas.The method improves the accuracy of compensation for missing data.With the data after cleaning,a more accurate model can be established,which is significant to the subsequent temperature control.The conversion rate of SO_(2) can be further improved,thereby improving the yield of sulfuric acid and economic benefits. 展开更多
关键词 Acid production data cleaning Isolation forest Random forest data compensation
下载PDF
A Review of Data Cleaning Methods for Web Information System 被引量:1
6
作者 Jinlin Wang Xing Wang +2 位作者 Yuchen Yang Hongli Zhang Binxing Fang 《Computers, Materials & Continua》 SCIE EI 2020年第3期1053-1075,共23页
Web information system(WIS)is frequently-used and indispensable in daily social life.WIS provides information services in many scenarios,such as electronic commerce,communities,and edutainment.Data cleaning plays an e... Web information system(WIS)is frequently-used and indispensable in daily social life.WIS provides information services in many scenarios,such as electronic commerce,communities,and edutainment.Data cleaning plays an essential role in various WIS scenarios to improve the quality of data service.In this paper,we present a review of the state-of-the-art methods for data cleaning in WIS.According to the characteristics of data cleaning,we extract the critical elements of WIS,such as interactive objects,application scenarios,and core technology,to classify the existing works.Then,after elaborating and analyzing each category,we summarize the descriptions and challenges of data cleaning methods with sub-elements such as data&user interaction,data quality rule,model,crowdsourcing,and privacy preservation.Finally,we analyze various types of problems and provide suggestions for future research on data cleaning in WIS from the technology and interactive perspective. 展开更多
关键词 data cleaning web information system data quality rule crowdsourcing privacy preservation
下载PDF
Data Cleaning Based on Stacked Denoising Autoencoders and Multi-Sensor Collaborations 被引量:1
7
作者 Xiangmao Chang Yuan Qiu +1 位作者 Shangting Su Deliang Yang 《Computers, Materials & Continua》 SCIE EI 2020年第5期691-703,共13页
Wireless sensor networks are increasingly used in sensitive event monitoring.However,various abnormal data generated by sensors greatly decrease the accuracy of the event detection.Although many methods have been prop... Wireless sensor networks are increasingly used in sensitive event monitoring.However,various abnormal data generated by sensors greatly decrease the accuracy of the event detection.Although many methods have been proposed to deal with the abnormal data,they generally detect and/or repair all abnormal data without further differentiate.Actually,besides the abnormal data caused by events,it is well known that sensor nodes prone to generate abnormal data due to factors such as sensor hardware drawbacks and random effects of external sources.Dealing with all abnormal data without differentiate will result in false detection or missed detection of the events.In this paper,we propose a data cleaning approach based on Stacked Denoising Autoencoders(SDAE)and multi-sensor collaborations.We detect all abnormal data by SDAE,then differentiate the abnormal data by multi-sensor collaborations.The abnormal data caused by events are unchanged,while the abnormal data caused by other factors are repaired.Real data based simulations show the efficiency of the proposed approach. 展开更多
关键词 data cleaning wireless sensor networks stacked denoising autoencoders multi-sensor collaborations
下载PDF
Application of fuzzy equivalence theory in data cleaning
8
作者 李华旸 刘玉葆 李又奎 《Journal of Southeast University(English Edition)》 EI CAS 2004年第4期454-457,共4页
This paper presented a rule merging and simplifying method and an improved analysis deviation algorithm. The fuzzy equivalence theory avoids the rigid way (either this or that) of traditional equivalence theory. Durin... This paper presented a rule merging and simplifying method and an improved analysis deviation algorithm. The fuzzy equivalence theory avoids the rigid way (either this or that) of traditional equivalence theory. During a data cleaning process task, some rules exist such as included/being included relations with each other. The equivalence degree of the being-included rule is smaller than that of the including rule, so a rule merging and simplifying method is introduced to reduce the total computing time. And this kind of relation will affect the deviation of fuzzy equivalence degree. An improved analysis deviation algorithm that omits the influence of the included rules' equivalence degree was also presented. Normally the duplicate records are logged in a file, and users have to check and verify them one by one. It's time-cost. The proposed algorithm can save users' labor during duplicate records checking. Finally, an experiment was presented which demonstrates the possibility of the rule. 展开更多
关键词 data recording data warehouses database systems MERGING SEMANTICS
下载PDF
An Improvement of Data Cleaning Method for Grain Big Data Processing Using Task Merging 被引量:1
9
作者 Feiyu Lian Maixia Fu Xingang Ju 《Journal of Computer and Communications》 2020年第3期1-19,共19页
Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in... Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in high scalability mode, but due to the lack of effective design, there are amounts of computing redundancy in the process of data cleaning, which results in lower performance. In this research, we found that some tasks often are carried out multiple times on same input files, or require same operation results in the process of data cleaning. For this problem, we proposed a new optimization technique that is based on task merge. By merging simple or redundancy computations on same input files, the number of the loop computation in MapReduce can be reduced greatly. The experiment shows, by this means, the overall system runtime is significantly reduced, which proves that the process of data cleaning is optimized. In this paper, we optimized several modules of data cleaning such as entity identification, inconsistent data restoration, and missing value filling. Experimental results show that the proposed method in this paper can increase efficiency for grain big data cleaning. 展开更多
关键词 GRAIN BIG data data cleaning TASK MERGING Hadoop MAPREDUCE
下载PDF
基于re3data的中英科学数据仓储平台对比研究 被引量:1
10
作者 袁烨 陈媛媛 《数字图书馆论坛》 CSSCI 2024年第2期13-23,共11页
以re3data为数据获取源,选取中英两国406个科学数据仓储为研究对象,从分布特征、责任类型、仓储许可、技术标准及质量标准等5个方面、11个指标对两国科学数据仓储的建设情况进行对比分析,试图为我国数据仓储的可持续发展提出建议:广泛... 以re3data为数据获取源,选取中英两国406个科学数据仓储为研究对象,从分布特征、责任类型、仓储许可、技术标准及质量标准等5个方面、11个指标对两国科学数据仓储的建设情况进行对比分析,试图为我国数据仓储的可持续发展提出建议:广泛联结国内外异质机构,推进多学科领域的交流与合作,有效扩充仓储许可权限与类型,优化技术标准的应用现况,提高元数据使用的灵活性。 展开更多
关键词 科学数据 数据仓储平台 re3data 中国 英国
下载PDF
Cleaning of Multi-Source Uncertain Time Series Data Based on PageRank
11
作者 高嘉伟 孙纪舟 《Journal of Donghua University(English Edition)》 CAS 2023年第6期695-700,共6页
There are errors in multi-source uncertain time series data.Truth discovery methods for time series data are effective in finding more accurate values,but some have limitations in their usability.To tackle this challe... There are errors in multi-source uncertain time series data.Truth discovery methods for time series data are effective in finding more accurate values,but some have limitations in their usability.To tackle this challenge,we propose a new and convenient truth discovery method to handle time series data.A more accurate sample is closer to the truth and,consequently,to other accurate samples.Because the mutual-confirm relationship between sensors is very similar to the mutual-quote relationship between web pages,we evaluate sensor reliability based on PageRank and then estimate the truth by sensor reliability.Therefore,this method does not rely on smoothness assumptions or prior knowledge of the data.Finally,we validate the effectiveness and efficiency of the proposed method on real-world and synthetic data sets,respectively. 展开更多
关键词 big data data cleaning time series truth discovery PAGERANK
下载PDF
Mining Software Repository for Cleaning Bugs Using Data Mining Technique 被引量:1
12
作者 Nasir Mahmood Yaser Hafeez +4 位作者 Khalid Iqbal Shariq Hussain Muhammad Aqib Muhammad Jamal Oh-Young Song 《Computers, Materials & Continua》 SCIE EI 2021年第10期873-893,共21页
Despite advances in technological complexity and efforts,software repository maintenance requires reusing the data to reduce the effort and complexity.However,increasing ambiguity,irrelevance,and bugs while extracting... Despite advances in technological complexity and efforts,software repository maintenance requires reusing the data to reduce the effort and complexity.However,increasing ambiguity,irrelevance,and bugs while extracting similar data during software development generate a large amount of data from those data that reside in repositories.Thus,there is a need for a repository mining technique for relevant and bug-free data prediction.This paper proposes a fault prediction approach using a data-mining technique to find good predictors for high-quality software.To predict errors in mining data,the Apriori algorithm was used to discover association rules by fixing confidence at more than 40%and support at least 30%.The pruning strategy was adopted based on evaluation measures.Next,the rules were extracted from three projects of different domains;the extracted rules were then combined to obtain the most popular rules based on the evaluation measure values.To evaluate the proposed approach,we conducted an experimental study to compare the proposed rules with existing ones using four different industrial projects.The evaluation showed that the results of our proposal are promising.Practitioners and developers can utilize these rules for defect prediction during early software development. 展开更多
关键词 Fault prediction association rule data mining frequent pattern mining
下载PDF
A Rule Management System for Knowledge Based Data Cleaning
13
作者 Louardi BRADJI Mahmoud BOUFAIDA 《Intelligent Information Management》 2011年第6期230-239,共10页
In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantag... In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantages of our system are threefold. First, it aims at proposing a strong and unified rule form based on first order structure that permits the representation and management of all the types of rules and their quality via some characteristics. Second, it leads to increase the quality of rules which conditions the quality of data cleaning. Third, it uses an appropriate knowledge acquisition process, which is the weakest task in the current rule and knowledge based systems. As several research works have shown that data cleaning is rather driven by domain knowledge than by data, we have identified and analyzed the properties that distinguish knowledge and rules from data for better determining the most components of the proposed system. In order to illustrate our system, we also present a first experiment with a case study at health sector where we demonstrate how the system is useful for the improvement of data quality. The autonomy, extensibility and platform-independency of the proposed rule management system facilitate its incorporation in any system that is interested in data quality management. 展开更多
关键词 RULE data Quality data cleaning KNOWLEDGE RULE Management SYSTEM RULE Based SYSTEM Structure
下载PDF
Data Secure Storage Mechanism for IIoT Based on Blockchain 被引量:2
14
作者 Jin Wang Guoshu Huang +2 位作者 R.Simon Sherratt Ding Huang Jia Ni 《Computers, Materials & Continua》 SCIE EI 2024年第3期4029-4048,共20页
With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapi... With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapid development of IIoT.Blockchain technology has immutability,decentralization,and autonomy,which can greatly improve the inherent defects of the IIoT.In the traditional blockchain,data is stored in a Merkle tree.As data continues to grow,the scale of proofs used to validate it grows,threatening the efficiency,security,and reliability of blockchain-based IIoT.Accordingly,this paper first analyzes the inefficiency of the traditional blockchain structure in verifying the integrity and correctness of data.To solve this problem,a new Vector Commitment(VC)structure,Partition Vector Commitment(PVC),is proposed by improving the traditional VC structure.Secondly,this paper uses PVC instead of the Merkle tree to store big data generated by IIoT.PVC can improve the efficiency of traditional VC in the process of commitment and opening.Finally,this paper uses PVC to build a blockchain-based IIoT data security storage mechanism and carries out a comparative analysis of experiments.This mechanism can greatly reduce communication loss and maximize the rational use of storage space,which is of great significance for maintaining the security and stability of blockchain-based IIoT. 展开更多
关键词 Blockchain IIoT data storage cryptographic commitment
下载PDF
Hadoop-based secure storage solution for big data in cloud computing environment 被引量:1
15
作者 Shaopeng Guan Conghui Zhang +1 位作者 Yilin Wang Wenqing Liu 《Digital Communications and Networks》 SCIE CSCD 2024年第1期227-236,共10页
In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose... In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average. 展开更多
关键词 Big data security data encryption HADOOP Parallel encrypted storage Zookeeper
下载PDF
Defect Detection Model Using Time Series Data Augmentation and Transformation 被引量:1
16
作者 Gyu-Il Kim Hyun Yoo +1 位作者 Han-Jin Cho Kyungyong Chung 《Computers, Materials & Continua》 SCIE EI 2024年第2期1713-1730,共18页
Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal depende... Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight. 展开更多
关键词 Defect detection time series deep learning data augmentation data transformation
下载PDF
Spatiotemporal deformation characteristics of Outang landslide and identification of triggering factors using data mining 被引量:2
17
作者 Beibei Yang Zhongqiang Liu +1 位作者 Suzanne Lacasse Xin Liang 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2024年第10期4088-4104,共17页
Since the impoundment of Three Gorges Reservoir(TGR)in 2003,numerous slopes have experienced noticeable movement or destabilization owing to reservoir level changes and seasonal rainfall.One case is the Outang landsli... Since the impoundment of Three Gorges Reservoir(TGR)in 2003,numerous slopes have experienced noticeable movement or destabilization owing to reservoir level changes and seasonal rainfall.One case is the Outang landslide,a large-scale and active landslide,on the south bank of the Yangtze River.The latest monitoring data and site investigations available are analyzed to establish spatial and temporal landslide deformation characteristics.Data mining technology,including the two-step clustering and Apriori algorithm,is then used to identify the dominant triggers of landslide movement.In the data mining process,the two-step clustering method clusters the candidate triggers and displacement rate into several groups,and the Apriori algorithm generates correlation criteria for the cause-and-effect.The analysis considers multiple locations of the landslide and incorporates two types of time scales:longterm deformation on a monthly basis and short-term deformation on a daily basis.This analysis shows that the deformations of the Outang landslide are driven by both rainfall and reservoir water while its deformation varies spatiotemporally mainly due to the difference in local responses to hydrological factors.The data mining results reveal different dominant triggering factors depending on the monitoring frequency:the monthly and bi-monthly cumulative rainfall control the monthly deformation,and the 10-d cumulative rainfall and the 5-d cumulative drop of water level in the reservoir dominate the daily deformation of the landslide.It is concluded that the spatiotemporal deformation pattern and data mining rules associated with precipitation and reservoir water level have the potential to be broadly implemented for improving landslide prevention and control in the dam reservoirs and other landslideprone areas. 展开更多
关键词 LANDSLIDE Deformation characteristics Triggering factor data mining Three gorges reservoir
下载PDF
Assimilation of GOES-R Geostationary Lightning Mapper Flash Extent Density Data in GSI 3DVar, EnKF, and Hybrid En3DVar for the Analysis and Short-Term Forecast of a Supercell Storm Case 被引量:1
18
作者 Rong KONG Ming XUE +2 位作者 Edward R.MANSELL Chengsi LIU Alexandre O.FIERRO 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 2024年第2期263-277,共15页
Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interp... Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interpolation ensemble Kalman filter(GSI-EnKF) framework were previously developed and tested with a mesoscale convective system(MCS) case. In this study, such capabilities are further developed to assimilate GOES GLM FED data within the GSI ensemble-variational(EnVar) hybrid data assimilation(DA) framework. The results of assimilating the GLM FED data using 3DVar, and pure En3DVar(PEn3DVar, using 100% ensemble covariance and no static covariance) are compared with those of EnKF/DfEnKF for a supercell storm case. The focus of this study is to validate the correctness and evaluate the performance of the new implementation rather than comparing the performance of FED DA among different DA schemes. Only the results of 3DVar and pEn3DVar are examined and compared with EnKF/DfEnKF. Assimilation of a single FED observation shows that the magnitude and horizontal extent of the analysis increments from PEn3DVar are generally larger than from EnKF, which is mainly caused by using different localization strategies in EnFK/DfEnKF and PEn3DVar as well as the integration limits of the graupel mass in the observation operator. Overall, the forecast performance of PEn3DVar is comparable to EnKF/DfEnKF, suggesting correct implementation. 展开更多
关键词 GOES-R LIGHTNING data assimilation ENKF EnVar
下载PDF
An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine 被引量:2
19
作者 Bo Zhu Xiaona Jing +1 位作者 Lan Qiu Runbo Li 《Computers, Materials & Continua》 SCIE EI 2024年第6期3977-3999,共23页
When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to ... When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles. 展开更多
关键词 Imbalanced data classification Silhouette value Mahalanobis distance RIME algorithm CS-SVM
下载PDF
EDSUCh:A robust ensemble data summarization method for effective medical diagnosis 被引量:1
20
作者 Mohiuddin Ahmed A.N.M.Bazlur Rashid 《Digital Communications and Networks》 SCIE CSCD 2024年第1期182-189,共8页
Identifying rare patterns for medical diagnosis is a challenging task due to heterogeneity and the volume of data.Data summarization can create a concise version of the original data that can be used for effective dia... Identifying rare patterns for medical diagnosis is a challenging task due to heterogeneity and the volume of data.Data summarization can create a concise version of the original data that can be used for effective diagnosis.In this paper,we propose an ensemble summarization method that combines clustering and sampling to create a summary of the original data to ensure the inclusion of rare patterns.To the best of our knowledge,there has been no such technique available to augment the performance of anomaly detection techniques and simultaneously increase the efficiency of medical diagnosis.The performance of popular anomaly detection algorithms increases significantly in terms of accuracy and computational complexity when the summaries are used.Therefore,the medical diagnosis becomes more effective,and our experimental results reflect that the combination of the proposed summarization scheme and all underlying algorithms used in this paper outperforms the most popular anomaly detection techniques. 展开更多
关键词 data summarization ENSEMBLE Medical diagnosis Sampling
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部