期刊文献+
共找到939篇文章
< 1 2 47 >
每页显示 20 50 100
Data cleaning method for the process of acid production with flue gas based on improved random forest 被引量:1
1
作者 Xiaoli Li Minghua Liu +2 位作者 Kang Wang Zhiqiang Liu Guihai Li 《Chinese Journal of Chemical Engineering》 SCIE EI CAS CSCD 2023年第7期72-84,共13页
Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the op... Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the operating environment of acid production with flue gas is complex and there is much equipment.The data obtained by the detection equipment is seriously polluted and prone to abnormal phenomena such as data loss and outliers.Therefore,to solve the problem of abnormal data in the process of acid production with flue gas,a data cleaning method based on improved random forest is proposed.Firstly,an outlier data recognition model based on isolation forest is designed to identify and eliminate the outliers in the dataset.Secondly,an improved random forest regression model is established.Genetic algorithm is used to optimize the hyperparameters of the random forest regression model.Then the optimal parameter combination is found in the search space and the trend of data is predicted.Finally,the improved random forest data cleaning method is used to compensate for the missing data after eliminating abnormal data and the data cleaning is realized.Results show that the proposed method can accurately eliminate and compensate for the abnormal data in the process of acid production with flue gas.The method improves the accuracy of compensation for missing data.With the data after cleaning,a more accurate model can be established,which is significant to the subsequent temperature control.The conversion rate of SO_(2) can be further improved,thereby improving the yield of sulfuric acid and economic benefits. 展开更多
关键词 Acid production data cleaning Isolation forest Random forest data compensation
下载PDF
Data Cleaning Based on Stacked Denoising Autoencoders and Multi-Sensor Collaborations 被引量:1
2
作者 Xiangmao Chang Yuan Qiu +1 位作者 Shangting Su Deliang Yang 《Computers, Materials & Continua》 SCIE EI 2020年第5期691-703,共13页
Wireless sensor networks are increasingly used in sensitive event monitoring.However,various abnormal data generated by sensors greatly decrease the accuracy of the event detection.Although many methods have been prop... Wireless sensor networks are increasingly used in sensitive event monitoring.However,various abnormal data generated by sensors greatly decrease the accuracy of the event detection.Although many methods have been proposed to deal with the abnormal data,they generally detect and/or repair all abnormal data without further differentiate.Actually,besides the abnormal data caused by events,it is well known that sensor nodes prone to generate abnormal data due to factors such as sensor hardware drawbacks and random effects of external sources.Dealing with all abnormal data without differentiate will result in false detection or missed detection of the events.In this paper,we propose a data cleaning approach based on Stacked Denoising Autoencoders(SDAE)and multi-sensor collaborations.We detect all abnormal data by SDAE,then differentiate the abnormal data by multi-sensor collaborations.The abnormal data caused by events are unchanged,while the abnormal data caused by other factors are repaired.Real data based simulations show the efficiency of the proposed approach. 展开更多
关键词 data cleaning wireless sensor networks stacked denoising autoencoders multi-sensor collaborations
下载PDF
A Review of Data Cleaning Methods for Web Information System
3
作者 Jinlin Wang Xing Wang +2 位作者 Yuchen Yang Hongli Zhang Binxing Fang 《Computers, Materials & Continua》 SCIE EI 2020年第3期1053-1075,共23页
Web information system(WIS)is frequently-used and indispensable in daily social life.WIS provides information services in many scenarios,such as electronic commerce,communities,and edutainment.Data cleaning plays an e... Web information system(WIS)is frequently-used and indispensable in daily social life.WIS provides information services in many scenarios,such as electronic commerce,communities,and edutainment.Data cleaning plays an essential role in various WIS scenarios to improve the quality of data service.In this paper,we present a review of the state-of-the-art methods for data cleaning in WIS.According to the characteristics of data cleaning,we extract the critical elements of WIS,such as interactive objects,application scenarios,and core technology,to classify the existing works.Then,after elaborating and analyzing each category,we summarize the descriptions and challenges of data cleaning methods with sub-elements such as data&user interaction,data quality rule,model,crowdsourcing,and privacy preservation.Finally,we analyze various types of problems and provide suggestions for future research on data cleaning in WIS from the technology and interactive perspective. 展开更多
关键词 data cleaning web information system data quality rule crowdsourcing privacy preservation
下载PDF
An Improvement of Data Cleaning Method for Grain Big Data Processing Using Task Merging 被引量:1
4
作者 Feiyu Lian Maixia Fu Xingang Ju 《Journal of Computer and Communications》 2020年第3期1-19,共19页
Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in... Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in high scalability mode, but due to the lack of effective design, there are amounts of computing redundancy in the process of data cleaning, which results in lower performance. In this research, we found that some tasks often are carried out multiple times on same input files, or require same operation results in the process of data cleaning. For this problem, we proposed a new optimization technique that is based on task merge. By merging simple or redundancy computations on same input files, the number of the loop computation in MapReduce can be reduced greatly. The experiment shows, by this means, the overall system runtime is significantly reduced, which proves that the process of data cleaning is optimized. In this paper, we optimized several modules of data cleaning such as entity identification, inconsistent data restoration, and missing value filling. Experimental results show that the proposed method in this paper can increase efficiency for grain big data cleaning. 展开更多
关键词 GRAIN BIG data data cleanING TASK MERGING Hadoop MapReduce
下载PDF
A Rule Management System for Knowledge Based Data Cleaning
5
作者 Louardi BRADJI Mahmoud BOUFAIDA 《Intelligent Information Management》 2011年第6期230-239,共10页
In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantag... In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantages of our system are threefold. First, it aims at proposing a strong and unified rule form based on first order structure that permits the representation and management of all the types of rules and their quality via some characteristics. Second, it leads to increase the quality of rules which conditions the quality of data cleaning. Third, it uses an appropriate knowledge acquisition process, which is the weakest task in the current rule and knowledge based systems. As several research works have shown that data cleaning is rather driven by domain knowledge than by data, we have identified and analyzed the properties that distinguish knowledge and rules from data for better determining the most components of the proposed system. In order to illustrate our system, we also present a first experiment with a case study at health sector where we demonstrate how the system is useful for the improvement of data quality. The autonomy, extensibility and platform-independency of the proposed rule management system facilitate its incorporation in any system that is interested in data quality management. 展开更多
关键词 RULE data Quality data cleanING KNOWLEDGE RULE Management SYSTEM RULE Based SYSTEM Structure
下载PDF
Cleaning of Multi-Source Uncertain Time Series Data Based on PageRank
6
作者 高嘉伟 孙纪舟 《Journal of Donghua University(English Edition)》 CAS 2023年第6期695-700,共6页
There are errors in multi-source uncertain time series data.Truth discovery methods for time series data are effective in finding more accurate values,but some have limitations in their usability.To tackle this challe... There are errors in multi-source uncertain time series data.Truth discovery methods for time series data are effective in finding more accurate values,but some have limitations in their usability.To tackle this challenge,we propose a new and convenient truth discovery method to handle time series data.A more accurate sample is closer to the truth and,consequently,to other accurate samples.Because the mutual-confirm relationship between sensors is very similar to the mutual-quote relationship between web pages,we evaluate sensor reliability based on PageRank and then estimate the truth by sensor reliability.Therefore,this method does not rely on smoothness assumptions or prior knowledge of the data.Finally,we validate the effectiveness and efficiency of the proposed method on real-world and synthetic data sets,respectively. 展开更多
关键词 big data data cleaning time series truth discovery PAGERANK
下载PDF
IoT data cleaning techniques: A survey
7
作者 Xiaoou Ding Hongzhi Wang +3 位作者 Genglong Li Haoxuan Li Yingze Li Yida Liu 《Intelligent and Converged Networks》 EI 2022年第4期325-339,共15页
Data cleaning is considered as an effective approach of improving data quality in order to help practitioners and researchers be devoted to downstream analysis and decision-making without worrying about data trustwort... Data cleaning is considered as an effective approach of improving data quality in order to help practitioners and researchers be devoted to downstream analysis and decision-making without worrying about data trustworthiness.This paper provides a systematic summary of the two main stages of data cleaning for Internet of Things(IoT)data with time series characteristics,including error data detection and data repairing.In respect to error data detection techniques,it categorizes an overview of quantitative data error detection methods for detecting single-point errors,continuous errors,and multidimensional time series data errors and qualitative data error detection methods for detecting rule-violating errors.Besides,it provides a detailed description of error data repairing techniques,involving statistics-based repairing,rule-based repairing,and human-involved repairing.We review the strengths and the limitations of the current data cleaning techniques under IoT data applications and conclude with an outlook on the future of IoT data cleaning. 展开更多
关键词 Internet of Things(IoT) data quality data cleaning error detection data repairing
原文传递
Data Cleaning About Student Information Based on Massive Open Online Course System
8
作者 Shengjun Yin Yaling Yi Hongzhi Wang 《国际计算机前沿大会会议论文集》 2020年第1期33-43,共11页
Recently,Massive Open Online Courses(MOOCs)is a major way of online learning for millions of people around the world,which generates a large amount of data in the meantime.However,due to errors produced from collectin... Recently,Massive Open Online Courses(MOOCs)is a major way of online learning for millions of people around the world,which generates a large amount of data in the meantime.However,due to errors produced from collecting,system,and so on,these data have various inconsistencies and missing values.In order to support accurate analysis,this paper studies the data cleaning technology for online open curriculum system,including missing value-time filling for time series,and rulebased input error correction.The data cleaning algorithm designed in this paper is divided into six parts:pre-processing,missing data processing,format and content error processing,logical error processing,irrelevant data processing and correlation analysis.This paper designs and implements missing-value-filling algorithm based on time series in the missing data processing part.According to the large number of descriptive variables existing in the format and content error processing module,it proposed one-based and separability-based criteria Hot+J3+PCA.The online course data cleaning algorithm was analyzed in detail on algorithm design,implementation and testing.After a lot of rigorous testing,the function of each module performs normally,and the cleaning performance of the algorithm is of expectation. 展开更多
关键词 MOOC data cleaning Time series Intermittent missing Dimension reduction
原文传递
基于GA的RBF神经网络气液两相流持液率预测模型优化
9
作者 廖锐全 李龙威 +2 位作者 王伟 马斌 潘元 《长江大学学报(自然科学版)》 2024年第2期91-100,共10页
为了提高气液两相流持液率预测精度,针对传统径向基函数(RBF)神经网络预测气液两相流持液率网络拓扑结构困难和收敛速度慢等问题,提出一种基于遗传算法(GA)优化径向基函数神经网络的气液两相流持液率预测模型。通过系统聚类算法和灰色... 为了提高气液两相流持液率预测精度,针对传统径向基函数(RBF)神经网络预测气液两相流持液率网络拓扑结构困难和收敛速度慢等问题,提出一种基于遗传算法(GA)优化径向基函数神经网络的气液两相流持液率预测模型。通过系统聚类算法和灰色关联度分析(GRA)对收集的实验数据进行处理,优选出最优模型特征,同时结合遗传算法确定了RBF神经网络结构参数。基于室内实验数据进行训练,并与常用于持液率预测的反向传播(BP)神经网络、GA-BP神经网络及RBF神经网络进行对比,评估了模型的准确性及可行性。结果表明:GA-RBF神经网络模型均方误差为0.0017,均方根误差为0.0416,平均绝对误差为0.0281,拟合度为0.9483。相较于其他神经网络模型,该预测模型表现出更高的计算精度和更强的泛化能力。 展开更多
关键词 持液率 气液两相流 RBF神经网络 遗传算法 数据清洗
下载PDF
基于MEC边缘云的智慧商城数据更新控制算法
10
作者 陈占伟 胡晓 《计算机仿真》 2024年第2期477-481,共5页
针对智慧商城中数据量较大的问题,提出基于MEC边缘云的智慧商城数据更新控制算法。在MEC边缘云环境中将数据信息经过整合分类处理后直接推送至用户端,将万有引力搜索算法和布谷鸟搜索算法融合构建CS-GSA算法,用于MEC边缘云智慧商城数据... 针对智慧商城中数据量较大的问题,提出基于MEC边缘云的智慧商城数据更新控制算法。在MEC边缘云环境中将数据信息经过整合分类处理后直接推送至用户端,将万有引力搜索算法和布谷鸟搜索算法融合构建CS-GSA算法,用于MEC边缘云智慧商城数据库数据清洗。在差分数据更新方法中引入游程编码和哈夫曼编码,采用游程编码将连续重复的数据压缩处理,降低数据规模,然后通过哈夫曼编码再次压缩,降低传输过程中的通信开销,实现MEC边缘云智慧商城数据更新。实验结果表明,所提方法的解编码时间、数据更新时间更短、通信开销较小,说明其能够提高数据更新质量。 展开更多
关键词 边缘云 智慧商城 数据更新控制 数据清洗 差分数据更新
下载PDF
基于K-means聚类和BP神经网络的电梯能耗实时监测方法
11
作者 彭诚 《通化师范学院学报》 2024年第4期50-56,共7页
针对现有方法在对电梯能耗进行监测时,存在监测精度低、用时长、监测结果不理想的问题,该文提出一种基于K-means聚类算法和BP神经网络相结合的电梯能耗实时监测方法 .在经过清洗的能耗数据中提取影响建筑能耗实时监测的主要因素特征值,... 针对现有方法在对电梯能耗进行监测时,存在监测精度低、用时长、监测结果不理想的问题,该文提出一种基于K-means聚类算法和BP神经网络相结合的电梯能耗实时监测方法 .在经过清洗的能耗数据中提取影响建筑能耗实时监测的主要因素特征值,利用相似系数法进行相似度计算,获取相似系数.对相似电梯能耗数据进行小波分解获取高低频序列,分别采用LSSVM-GSA检测方法和均方加权处理方法对低频和高频部分进行处理,将两个结果进行重构,得到最终的实时监测结果 .仿真实验结果表明:所提方法能够获取高精度、低耗时、高稳定性的监测结果 . 展开更多
关键词 电梯能耗 K-MEANS聚类算法 BP神经网络 数据清洗
下载PDF
基于小波变换与IAGA-BP神经网络的短期风电功率预测 被引量:1
12
作者 孙国良 伊力哈木·亚尔买买提 +3 位作者 张宽 吐松江·卡日 李振恩 邸强 《电测与仪表》 北大核心 2024年第5期126-134,145,共10页
为提高风功率预测精度,减轻输出风能波动性对风电并网不利影响,提出了基于WT-IAGA-BP神经网络的短期风电功率预测方法。利用风速分区、3σ准则及拉格朗日插值法清洗风电场历史数据;其次,依据小波重构误差,选择db4小波分别提取风速、风... 为提高风功率预测精度,减轻输出风能波动性对风电并网不利影响,提出了基于WT-IAGA-BP神经网络的短期风电功率预测方法。利用风速分区、3σ准则及拉格朗日插值法清洗风电场历史数据;其次,依据小波重构误差,选择db4小波分别提取风速、风向、历史风功率的不同频率特征信号,并引入改进自适应遗传算法(IAGA)对各序列BP神经网络的初始权值与阈值寻优,使用Sigmiod函数通过适应度值自适应改变交叉概率与变异概率;构建各序列的WT-IAGA-BP模型对短期风功率组合预测。通过仿真分析,并与ELM、IAGA-BP、WT-ELM及WT-LSSVM方法对比,验证该方法具有更高的预测精度和更好的预测性能。 展开更多
关键词 风电功率预测 数据清洗 小波变换 改进自适应遗传算法 神经网络
下载PDF
协同过滤下混合大数据无损挖掘算法研究 被引量:1
13
作者 卢思安 刘江平 《计算机仿真》 2024年第4期485-488,共4页
大数据具有大规模性、多样性以及价值性,由于海量数据间的较高相似度,导致数据挖掘过程易受冗余干扰,出现数据丢失、损坏等问题。为解决上述问题,提出基于协同过滤算法的混合大数据无损挖掘方法。对混合大数据集成预处理,去除冗余,将不... 大数据具有大规模性、多样性以及价值性,由于海量数据间的较高相似度,导致数据挖掘过程易受冗余干扰,出现数据丢失、损坏等问题。为解决上述问题,提出基于协同过滤算法的混合大数据无损挖掘方法。对混合大数据集成预处理,去除冗余,将不同来源的相同数据无损融合。采用协同过滤算法的时间衰减函数,计算挖掘项目间相似性。在混合大数据特征关联度的约束下,实现混合大数据无损挖掘。实验结果表明,所提方法应用下,混合大数据量高达25000MB时,数据挖掘所需时间仅为45ms左右,且挖掘精度高达95%以上,数据挖掘结果与目标具有一致性。 展开更多
关键词 协同过滤算法 混合大数据 无损挖掘 数据清理 数据集成
下载PDF
基于融合模型的联合站能耗优化技术研究
14
作者 高岩 《石油石化节能与计量》 CAS 2024年第4期36-40,共5页
联合站是油田集输系统中的耗能大户,为降低生产运行能耗,在收集和整理现场SCADA(数据采集与监视控制系统)数据的基础上,对数据进行完整性、重复值和异常值校验,利用箱线图识别超过工况范畴的异常值,对缺失值采用三次样条曲线进行重构,... 联合站是油田集输系统中的耗能大户,为降低生产运行能耗,在收集和整理现场SCADA(数据采集与监视控制系统)数据的基础上,对数据进行完整性、重复值和异常值校验,利用箱线图识别超过工况范畴的异常值,对缺失值采用三次样条曲线进行重构,再对能耗影响因素进行相关性分析,并代入支持向量机模型建立吨液综合能耗和影响因素之间的非线性关系,最后采用粒子群算法实现联合站能耗的持续优化。结果表明:利用中位数替代异常值和三次样条曲线重构缺失值,对于数据清洗的效果较好,清洗后数据完整性大幅提升;加热炉耗气量和处理液量对吨液综合能耗的影响较大,说明热力消耗在能耗中的占比较大;优化后,联合站的单位液量气耗、单位液量电耗、单位液量综合能耗均有所下降,预计全年节约运行费用26.6万元。研究结果可为联合站运行方案的制定提供实际参考。 展开更多
关键词 清洗数据 数据校验 支持向量机 粒子群算法 单位液量综合能耗
下载PDF
考虑时空相关性的风电机组风速清洗方法
15
作者 李莉 梁袁 +3 位作者 林娜 阎洁 孟航 刘永前 《太阳能学报》 EI CAS CSCD 北大核心 2024年第6期461-469,共9页
为获得完整可靠的风速数据,提出一种考虑时空相关性的风电机组机舱风速清洗方法。利用图卷积神经网络(GCN)提取风速的空间相关信息、利用双向长短期记忆神经网络(Bi-LSTM)提取时间相关信息,建立GCN-LSTM模型重构各机组风速序列,实现对... 为获得完整可靠的风速数据,提出一种考虑时空相关性的风电机组机舱风速清洗方法。利用图卷积神经网络(GCN)提取风速的空间相关信息、利用双向长短期记忆神经网络(Bi-LSTM)提取时间相关信息,建立GCN-LSTM模型重构各机组风速序列,实现对异常风速数据的识别和清洗。分析风速的时空特性及其对模型清洗精度的影响,确定最优时间尺度和机组节点数量2个重要的建模参数;以中国4个不同地形风电场为例对GCN-LSTM模型进行验证,结果表明考虑时空相关性可有效提高风速清洗精度,风速的时空相关性越高风速清洗误差越小,且该模型在不同地形风电场的风速清洗中表现出良好的鲁棒性。 展开更多
关键词 风电场 风电机组 图神经网络 长短期记忆神经网络 风速时空相关性 数据清洗
下载PDF
面向周期性工业时序数据的流式清洗系统
16
作者 王耀 赵炯 +3 位作者 周奇才 熊肖磊 陈传林 张恒 《同济大学学报(自然科学版)》 EI CAS CSCD 北大核心 2024年第3期462-471,共10页
为了高效清洗具有时序性、周期性等特点的工业数据,首先利用分布式组件设计了一套流式清洗系统,系统以Mosquitto作为采集数据的汇集中心,以Flume为连接组件,以Kafka为缓冲组件,对接数据清洗组件,使系统具有高吞吐、大缓冲等优势。然后... 为了高效清洗具有时序性、周期性等特点的工业数据,首先利用分布式组件设计了一套流式清洗系统,系统以Mosquitto作为采集数据的汇集中心,以Flume为连接组件,以Kafka为缓冲组件,对接数据清洗组件,使系统具有高吞吐、大缓冲等优势。然后基于速度约束模型,设计了一种周期性数据清洗算法,综合工业数据的时序性、周期性、物理意义等特性,在原有速度约束算法基础上增加周期性检测和数据切片机制,以解决速度约束算法处理周期性数据的失真问题,提高可用度。最后文中以盾构掘进数据集为样本,验证了系统和算法的有效性,以及改进算法的适用性。 展开更多
关键词 数据清洗 工业大数据 时序数据 速度约束 周期性
下载PDF
基于检索增强的噪声标签细粒度图像分类方法
17
作者 暴恒 邓理睿 +1 位作者 张良 陈训逊 《北京航空航天大学学报》 EI CAS CSCD 北大核心 2024年第7期2284-2292,共9页
在互联网音视频内容分析的应用中,快速建立低标注代价的图像细粒度分类方法具有重要意义。由于类别间具有相似的外观特征,并且存在光照、视角、背景遮挡等干扰因素,细粒度图像分类面临类别数量多、类间差异性小,以及标注代价高、标签信... 在互联网音视频内容分析的应用中,快速建立低标注代价的图像细粒度分类方法具有重要意义。由于类别间具有相似的外观特征,并且存在光照、视角、背景遮挡等干扰因素,细粒度图像分类面临类别数量多、类间差异性小,以及标注代价高、标签信噪比低等挑战。为改善在带有噪声标签的数据环境下海量图像细粒度分类的效果,提出一种基于检索增强的图像细粒度分类方法,在迭代清洗噪声标签的基础上,利用检索范式通过简单类别标注获取更具表达性的特征,提升分类器的识别能力,并在包含1 500个细粒度的食物类别和超过50万张图像的数据集上取得良好的效果。 展开更多
关键词 细粒度图像分类 网络安全 图像检索 数据清洗 噪声标签
下载PDF
医院麻醉信息系统数据科研化预处理方法探索
18
作者 向茹梅 魏星 +6 位作者 戴维 张丽君 徐玮 田杰 张宏伟 孙佳昕 石丘玲 《中国医院统计》 2024年第3期219-229,共11页
目的准确、规范的数据是得出可靠研究结果的基础。本文以肺部手术为例,分析麻醉信息系统的数据特征,并进行清洗、转换、集成和归约等预处理,构建可用于科研分析的数据集。方法收集四川省某肿瘤医院2021年4月至2022年11月行肺部手术患者... 目的准确、规范的数据是得出可靠研究结果的基础。本文以肺部手术为例,分析麻醉信息系统的数据特征,并进行清洗、转换、集成和归约等预处理,构建可用于科研分析的数据集。方法收集四川省某肿瘤医院2021年4月至2022年11月行肺部手术患者麻醉信息系统的相关数据。分析源数据特征,并基于Python和SAS软件提出数据预处理流程和宏代码。通过Python的SPLIT语句,SAS宏和函数将文本数据转换为易于数据挖掘的数值数据;通过数据清洗和维归约,填补缺失值、纠正异常和不一致的数据,去除冗余数据;通过NOUNIQUEKEY、SQL和LAG语句实现数据集成,扩大数据体量。结果从麻醉信息系统和医院信息系统中导出2个Excel表,共计1835条麻醉记录和46612条医嘱记录。源数据分析发现麻醉信息系统存在医疗术语不规范、语义表达多样性、同一药物多种量纲、部分药物带有后缀“备用”的特点。基于上述数据特点和半结构化的数据结构,编译了3个宏(macro),清洗核查全部药物名称、规范化医疗术语以及统一量纲,最终提取麻醉前、术中和镇痛泵的药物各12、24、12种;完成缺失数据的二次补充,平滑噪声和清理不一致数据;剔除了48条(2.62%)非肺手术的麻醉记录,去除与挖掘任务无关的10个字段;经过数据集成,1748(97.82%)例麻醉数据与医嘱数据相匹配。通过上述数据预处理流程,最终结构化的数据集中共有1748例患者,99个变量。结论通过对源数据的分析,制定特异的麻醉数据预处理流程,进而得到了规范、准确的麻醉用药数据。为其他机构麻醉信息的数据科研化提供了方法学参考,同时为需要利用高质量麻醉用药数据的研究提供了可靠的数据基础。 展开更多
关键词 麻醉信息系统 预处理 数据清洗 数据结构化 SAS软件
下载PDF
基于IKNN和LOF的变压器回复电压数据清洗方法研究
19
作者 陈啸轩 邹阳 +3 位作者 翁祖辰 林锦茄 林昕亮 张云霄 《电子测量与仪器学报》 CSCD 北大核心 2024年第2期92-100,共9页
基于回复电压极化谱提取特征参量是目前广泛应用的变压器油纸绝缘状态评估方法,但极化谱易受工况干扰、人工失误等因素影响而出现特征数据异常的情况,严重降低评估准确性。针对上述问题,该文提出了一种基于局部离群因子(LOF)和改进K最近... 基于回复电压极化谱提取特征参量是目前广泛应用的变压器油纸绝缘状态评估方法,但极化谱易受工况干扰、人工失误等因素影响而出现特征数据异常的情况,严重降低评估准确性。针对上述问题,该文提出了一种基于局部离群因子(LOF)和改进K最近邻(IKNN)的回复电压数据清洗方法。首先,选取回复电压极化谱的回复电压极大值Urmax、初始斜率Sr与主时间常数tcdom作为老化特征参量,并基于LOF算法对非标准极化谱中的异常特征量数据进行识别与筛除。其次,利用模糊C均值(FCM)聚类算法减小噪声点对KNN算法的干扰,并通过加权欧氏距离标度突出各特征量间的关联性,进而构建出基于IKNN的数据填补模型架构以实现特征缺失数据的填补。最后,代入多组实测数据验证所提数据清洗方法的实效性。结果表明,数据清洗后的状态评估准确率相较于原有数据上升了50%左右,有效提高了变压器回复电压数据质量,为准确感知变压器运行状况奠定坚实的基础。 展开更多
关键词 油纸绝缘 特征数据清洗 局部离群因子算法 回复电压极化谱 改进K最近邻算法
下载PDF
南水北调中线总干渠水情数据智能清洗
20
作者 陈晓楠 顾起豪 +2 位作者 张召 靳燕国 顾沁扬 《南水北调与水利科技(中英文)》 CAS CSCD 北大核心 2024年第3期436-444,共9页
南水北调中线总干渠水位、流量等实时水情数据受外界扰动、测量系统误差等因素影响而产生的病态水情数据将造成调度模型计算失真,甚至导致计算失败。为此,针对上下游流量数据空间上的逻辑错误和水位数据时间序列的跳变,分别建立基于粒... 南水北调中线总干渠水位、流量等实时水情数据受外界扰动、测量系统误差等因素影响而产生的病态水情数据将造成调度模型计算失真,甚至导致计算失败。为此,针对上下游流量数据空间上的逻辑错误和水位数据时间序列的跳变,分别建立基于粒子群优化的水量平衡模型和指数加权滑动平均模型,对病态水情数据在空间、时间上实施横向、纵向清洗处理。以穿黄节制闸至漳河节制闸间的渠段为典型研究区间,利用模型自动识别流量倒挂点,并对该渠段涉及的12座节制闸、26处分水点的流量数据进行统一修正,实现了上下游逻辑上的合理性。同时,选取研究渠段内的闫河节制闸为代表,在48 h内运行基本稳定状态下,对每2 h的闸前水位数据序列进行分析,自动识别出跳变数据并进行合理修正。结果表明:建立的模型可自动识别病态水情数据并进行智能清洗,处理后的数据能够较好地满足输水调度分析决策的需要,因此该模型具有推广应用的价值。 展开更多
关键词 南水北调中线 数据清洗 输水调度 粒子群优化算法 指数加权滑动平均模型
下载PDF
上一页 1 2 47 下一页 到第
使用帮助 返回顶部