期刊文献+
共找到3,348篇文章
< 1 2 168 >
每页显示 20 50 100
Industrial Spatial Agglomeration Using Distance-based Approach in Beijing, China 被引量:7
1
作者 LI Jiaming ZHANG Wenzhong +1 位作者 YU Jianhui CHEN Hongxia 《Chinese Geographical Science》 SCIE CSCD 2015年第6期698-712,共15页
To study the difference of industrial location among different industries, this article is to test the spatial agglomeration across industries and firm sizes at the city level. Our research bases on a unique plant-lev... To study the difference of industrial location among different industries, this article is to test the spatial agglomeration across industries and firm sizes at the city level. Our research bases on a unique plant-level data set of Beijing and employs a distance-based approach, which considers space as continuous. Unlike previous studies, we set two sets of references for service and manufacturing industries respectively to adapt to the investigation in the intra-urban area. Comparing among eight types of industries and different firm sizes, we find that: 1) producer service, high-tech industries and labor-intensive manufacturing industries are more likely to cluster, whereas personal service and capital-intensive industries tend to be randomly dispersed in Beijing; 2) the spillover of the co-location of finns is more important to knowledge-intensive industries and has more significant impact on their allocation than business-oriented services in the intra-urban area; 3) the spatial agglomeration of service industries are driven by larger establishments, whereas manufac- turing industries are mixed. 展开更多
关键词 distance-based approach spatial agglomeration intra-urban area BEIJING
下载PDF
Comparison of Analyses of Genetic Structure among Chinese Indigenous Chicken Breeds using Distance-based and Model-based Methods
2
作者 LI Hui-fang CHEN Kuan-wei +5 位作者 HAN Wei ZHANG Xue-yu GAO Yu-shi CHEN Guo-hong ZHU Yun-fen WANG Qiang 《畜牧兽医学报》 CAS CSCD 北大核心 2009年第S1期8-12,共5页
The Nei's improved genetic distance(DA)and gene flow(Nm)were measured using sixteen microsatellite markers.Dendograms based on DA genetic distance using the neighbor-joining(NJ)method and STRUCTURE program were co... The Nei's improved genetic distance(DA)and gene flow(Nm)were measured using sixteen microsatellite markers.Dendograms based on DA genetic distance using the neighbor-joining(NJ)method and STRUCTURE program were constructed to analyze the genetic structure and relationship among 10 Chinese indigenous chicken breeds.The results showed that dendograms of DA genetic distance using the NJ method divided the 10 chicken breeds into two main clusters;one consisted of breeds of low weight body(CHA,TTB,XIA,GUS and BAI),the other contained heavier breeds(LAN,DAG,YOU,XIS and LUY).In the lighter breeds,TIB and CHA clustered together,as did XIA and GUS.In the heavier breeds,XIS and LUY was clustered together in one branch,but LAN,DAG and YOU clustered in independent branches.The results were consistent with Nm estimates among the 10 indigenous chicken breeds.The STRUCTURE program properly inferred the presence of genetic structure despite not pre-defining the origin of individuals.The genetic cluster inferred by STRUCTURE was basically the same as that from the DA distance clustering method.An advantage of the STRUCTURE program was its ability to identify the migrants and admixed individuals in the 10 chicken populations;this could not be achieved by use of the DA distance clustering method. 展开更多
关键词 microsatellite CHINESE chicken BREEDS distance-based CLUSTERING METHOD MODEL-BASED CLUSTERING METHOD
下载PDF
A Novel Method for Prediction of Protein Domain Using Distance-Based Maximal Entropy
3
作者 Shu-xue Zou Yan-xin Huang Yan Wang Chun-guang Zhou 《Journal of Bionic Engineering》 SCIE EI CSCD 2008年第3期215-223,共9页
Detecting the boundaries of protein domains is an important and challenging task in both experimental and computational structural biology. In this paper, a promising method for detecting the domain structure of a pro... Detecting the boundaries of protein domains is an important and challenging task in both experimental and computational structural biology. In this paper, a promising method for detecting the domain structure of a protein from sequence information alone is presented. The method is based on analyzing multiple sequence alignments derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence. Then they are combined into a single predictor using support vector machine. What is more important, the domain detection is first taken as an imbal- anced data learning problem. A novel undersampling method is proposed on distance-based maximal entropy in the feature space of Support Vector Machine (SVM). The overall precision is about 80%. Simulation results demonstrate that the method can help not only in predicting the complete 3D structure of a protein but also in the machine learning system on general im- balanced datasets. 展开更多
关键词 protein domain boundary SVM imbalanced data learning distance-based maximal entropy
下载PDF
Bi-Level Programming for the Optimal Nonlinear Distance-Based Transit Fare Structure Incorporating Principal-Agent Game
4
作者 Xin Sun Shuyan Chen Yongfeng Ma 《Journal of Harbin Institute of Technology(New Series)》 CAS 2022年第5期69-77,共9页
The urban transit fare structure and level can largely affect passengers’travel behavior and route choices.The commonly used transit fare policies in the present transit network would lead to the unbalanced transit a... The urban transit fare structure and level can largely affect passengers’travel behavior and route choices.The commonly used transit fare policies in the present transit network would lead to the unbalanced transit assignment and improper transit resources distribution.In order to distribute transit passenger flow evenly and efficiently,this paper introduces a new distance-based fare pattern with Euclidean distance.A bi-level programming model is developed for determining the optimal distance-based fare pattern,with the path-based stochastic transit assignment(STA)problem with elastic demand being proposed at the lower level.The upper-level intends to address a principal-agent game between transport authorities and transit enterprises pursing maximization of social welfare and financial interest,respectively.A genetic algorithm(GA)is implemented to solve the bi-level model,which is verified by a numerical example to illustrate that the proposed nonlinear distance-based fare pattern presents a better financial performance and distribution effect than other fare structures. 展开更多
关键词 bi-level programming model principal-agent game nonlinear distance-based fare path-based stochastic transit assignment
下载PDF
DISTANCE-BASED UPDATE STRATEGY IN LDCQ
5
作者 DongYi EdwardChan HuangZailu 《Journal of Electronics(China)》 2004年第4期337-341,共5页
A new update strategy, distance-based update strategy, is presented in Location Dependent Continuous Query (LDCQ) under error limitation. There are different possibilities to intersect when the distances between movin... A new update strategy, distance-based update strategy, is presented in Location Dependent Continuous Query (LDCQ) under error limitation. There are different possibilities to intersect when the distances between moving objects and the querying boundary are different.Therefore, moving objects have different influences to the query result. We set different deviation limits for different moving objects according to distances. A great number of unnecessary updates are reduced and the payload of the system is relieved. 展开更多
关键词 Location Dependent Continuous Query (LDCQ) distance-based update strategy
下载PDF
Early identification of scientific breakthroughs through outlier analysis based on research entities
6
作者 Yang Zhao Mengting Zhang +1 位作者 Xiaoli Chen Zhixiong Zhang 《Journal of Data and Information Science》 CSCD 2024年第4期90-109,共20页
Purpose:To address the“anomalies”that occur when scientific breakthroughs emerge,this study focuses on identifying early signs and nascent stages of breakthrough innovations from the perspective of outliers,aiming t... Purpose:To address the“anomalies”that occur when scientific breakthroughs emerge,this study focuses on identifying early signs and nascent stages of breakthrough innovations from the perspective of outliers,aiming to achieve early identification of scientific breakthroughs in papers.Design/methodology/approach:This study utilizes semantic technology to extract research entities from the titles and abstracts of papers to represent each paper’s research content.Outlier detection methods are then employed to measure and analyze the anomalies in breakthrough papers during their early stages.The development and evolution process are traced using literature time tags.Finally,a case study is conducted using the key publications of the 2021 Nobel Prize laureates in Physiology or Medicine.Findings:Through manual analysis of all identified outlier papers,the effectiveness of the proposed method for early identifying potential scientific breakthroughs is verified.Research limitations:The study’s applicability has only been empirically tested in the biomedical field.More data from various fields are needed to validate the robustness and generalizability of the method.Practical implications:This study provides a valuable supplement to current methods for early identification of scientific breakthroughs,effectively supporting technological intelligence decision-making and services.Originality/value:The study introduces a novel approach to early identification of scientific breakthroughs by leveraging outlier analysis of research entities,offering a more sensitive,precise,and fine-grained alternative method compared to traditional citation-based evaluations,which enhances the ability to identify nascent breakthrough innovations. 展开更多
关键词 Scientific breakthroughs outlier analysis Research entities
下载PDF
A Study on Outlier Detection and Feature Engineering Strategies in Machine Learning for Heart Disease Prediction
7
作者 Varada Rajkumar Kukkala Surapaneni Phani Praveen +1 位作者 Naga Satya Koti Mani Kumar Tirumanadham Parvathaneni Naga Srinivasu 《Computer Systems Science & Engineering》 2024年第5期1085-1112,共28页
This paper investigates the application ofmachine learning to develop a response model to cardiovascular problems and the use of AdaBoost which incorporates an application of Outlier Detection methodologies namely;Z-S... This paper investigates the application ofmachine learning to develop a response model to cardiovascular problems and the use of AdaBoost which incorporates an application of Outlier Detection methodologies namely;Z-Score incorporated with GreyWolf Optimization(GWO)as well as Interquartile Range(IQR)coupled with Ant Colony Optimization(ACO).Using a performance index,it is shown that when compared with the Z-Score and GWO with AdaBoost,the IQR and ACO,with AdaBoost are not very accurate(89.0%vs.86.0%)and less discriminative(Area Under the Curve(AUC)score of 93.0%vs.91.0%).The Z-Score and GWO methods also outperformed the others in terms of precision,scoring 89.0%;and the recall was also found to be satisfactory,scoring 90.0%.Thus,the paper helps to reveal various specific benefits and drawbacks associated with different outlier detection and feature selection techniques,which can be important to consider in further improving various aspects of diagnostics in cardiovascular health.Collectively,these findings can enhance the knowledge of heart disease prediction and patient treatment using enhanced and innovativemachine learning(ML)techniques.These findings when combined improve patient therapy knowledge and cardiac disease prediction through the use of cutting-edge and improved machine learning approaches.This work lays the groundwork for more precise diagnosis models by highlighting the benefits of combining multiple optimization methodologies.Future studies should focus on maximizing patient outcomes and model efficacy through research on these combinations. 展开更多
关键词 Grey wolf optimization ant colony optimization Z-SCORE interquartile range(IQR) ADABOOST outlier
下载PDF
Changepoint Detection with Outliers Based on RWPCA
8
作者 Xin Zhang Sanzhi Shi Yuting Guo 《Journal of Applied Mathematics and Physics》 2024年第7期2634-2651,共18页
Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method,... Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method, RWPCA-RFPOP method. Our method is double robust which is suitable for detecting mean changepoints in multivariate normal data with high correlations between variables that include outliers. Simulation results demonstrate that our method provides strong guarantees on both the number and location of changepoints in the presence of outliers. Finally, our method is well applied in an ACGH dataset. 展开更多
关键词 RWPCA-RFPOP Double Robust outlier Detection Biweight Loss
下载PDF
基于快速SVDD的无线传感器网络Outlier检测 被引量:8
9
作者 谢迎新 陈祥光 +2 位作者 余向明 岳彬 郭静 《仪器仪表学报》 EI CAS CSCD 北大核心 2011年第1期46-51,共6页
Outlier是基于无线传感器网络的数据收集应用中常见的数据故障类型,严重影响数据质量。本文提出一种基于快速SVDD的无线传感器网络Outlier检测方法,其基本思想是:首先利用快速SVDD算法获得包含正常样本的最小球形边界,然后通过该边界判... Outlier是基于无线传感器网络的数据收集应用中常见的数据故障类型,严重影响数据质量。本文提出一种基于快速SVDD的无线传感器网络Outlier检测方法,其基本思想是:首先利用快速SVDD算法获得包含正常样本的最小球形边界,然后通过该边界判断未知样本的类别,本法采用训练集约减策略和基于二阶逼近的SMO算法来加速SVDD的训练。基于合成数据和真实数据的仿真实验表明,该方法在确保分类精度的同时,运行速度快,内存开销小,适用于资源有限的无线传感器网络。 展开更多
关键词 无线传感器网络 outlier检测 SVDD 训练集约简 SMO算法
下载PDF
η-one-class问题和η-outlier及其LP学习算法 被引量:1
10
作者 陶卿 齐红威 +1 位作者 吴高巍 章显 《计算机学报》 EI CSCD 北大核心 2004年第8期1102-1108,共7页
用SVM方法研究one class和outlier问题 .在将one class问题理解为一种函数估计问题的基础上 ,作者首次定义了 η one class和 η outlier问题的泛化错误 ,进而定义了线性可分性和边缘 ,得到了求解one class问题的最大边缘、软边缘和ν ... 用SVM方法研究one class和outlier问题 .在将one class问题理解为一种函数估计问题的基础上 ,作者首次定义了 η one class和 η outlier问题的泛化错误 ,进而定义了线性可分性和边缘 ,得到了求解one class问题的最大边缘、软边缘和ν 软边缘算法 .这些学习算法具有统计学习理论依据并可归结为求解线性规划问题 .算法的实现采用与boosting类似的思路 .实验结果表明该文的算法是有实际意义的 . 展开更多
关键词 one-class问题 outlier 最大边缘 统计学习理论 支持向量机 线性规划问题 BOOSTING
下载PDF
Outlier-DivideConquer:近似聚集查询中离群分治取样算法 被引量:1
11
作者 胡文瑜 孙志挥 张柏礼 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2011年第5期524-531,共8页
取样是一种通用有效的近似技术,利用取样技术进行近似聚集查询处理是决策支持系统和数据挖掘实现技术中的常用方法.如何正确有效地给出近似查询结果并最小化近似查询误差是近似查询处理的关键和目标.在深入研究近似聚集查询取样方法的... 取样是一种通用有效的近似技术,利用取样技术进行近似聚集查询处理是决策支持系统和数据挖掘实现技术中的常用方法.如何正确有效地给出近似查询结果并最小化近似查询误差是近似查询处理的关键和目标.在深入研究近似聚集查询取样方法的基础上,本文提出了一个有误差确界且只需单遍扫描数据集的离群分治取样Outlier-DivideConquer算法,该算法在聚集属性内部存在高方差分布时能克服随机均匀取样局限,可显著降低近似查询误差,且执行效率优于同类算法.最后通过与传统均匀取样算法的实验比较验证了Outlier-DivideConquer算法的有效性和正确性. 展开更多
关键词 数据挖掘 决策支持 近似聚集查询 均匀取样 离群分治
下载PDF
一类结合Outlier分析的单变量ARIMA模型在股票市场中的应用
12
作者 曹韫建 《中国管理科学》 CSSCI 1998年第1期10-15,共6页
本文通过对当前广泛使用的经济时间序列预测方法的分析比较,针对如股票价格这一类易受到大量外部因素影响且难以通过多变量建模分析的经济现象,采用了单变量ARIMA模型并结合Outlier分析的方法。
关键词 单变量 ARIMA模型 应用 股票市场 outlier分析
下载PDF
来自于Multiple-Outlier模型的最小次序统计量序性质(英文)
13
作者 程美芳 方龙祥 杨芳 《应用概率统计》 CSCD 北大核心 2017年第3期317-330,共14页
本文中,我们研究来自于两个multiple-outlier模型的最小次序统计量的随机比较,其中两个模型中独立同分布的随机变量个数不同.令X_(1:n)(p,q)和X_(1:n~*)(p~*,q~*)分别表示来自于X_1,…,X_p,X_(p+1),…,X_n和X_1,…,X_(p),X_(p~*+1),…,X... 本文中,我们研究来自于两个multiple-outlier模型的最小次序统计量的随机比较,其中两个模型中独立同分布的随机变量个数不同.令X_(1:n)(p,q)和X_(1:n~*)(p~*,q~*)分别表示来自于X_1,…,X_p,X_(p+1),…,X_n和X_1,…,X_(p),X_(p~*+1),…,X_(n)的最小次序统计量,这里q=n-p,q~*=n~*-p~*.在参数(p,q)和(p~*,q~*)满足某些优化序条件下,我们根据普通随机序,失效率序和似然比序给出了X_(1:n)(p,q)和X_(1:n~*)(p~*,q~*)的序比较. 展开更多
关键词 multiple-outlier模型 普通随机序 失效率序 似然比序 最小次序统计量 比例失效率模型
下载PDF
Probabilistic Automatic Outlier Detection for Surface Air Quality Measurements from the China National Environmental Monitoring Network 被引量:12
14
作者 Huangjian WU Xiao TANG +4 位作者 Zifa WANG Lin WU Miaomiao LU Lianfang WEI Jiang ZHU 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 2018年第12期1522-1532,共11页
Although quality assurance and quality control procedures are routinely applied in most air quality networks, outliers can still occur due to instrument malfunctions, the influence of harsh environments and the limita... Although quality assurance and quality control procedures are routinely applied in most air quality networks, outliers can still occur due to instrument malfunctions, the influence of harsh environments and the limitation of measuring methods. Such outliers pose challenges for data-powered applications such as data assimilation, statistical analysis of pollution characteristics and ensemble forecasting. Here, a fully automatic outlier detection method was developed based on the probability of residuals, which are the discrepancies between the observed and the estimated concentration values. The estimation can be conducted using filtering—or regressions when appropriate—to discriminate four types of outliers characterized by temporal and spatial inconsistency, instrument-induced low variances, periodic calibration exceptions, and less PM_(10) than PM_(2.5) in concentration observations, respectively. This probabilistic method was applied to detect all four types of outliers in hourly surface measurements of six pollutants(PM_(2.5), PM_(10),SO_2,NO_2,CO and O_3) from 1436 stations of the China National Environmental Monitoring Network during 2014-16. Among the measurements, 0.65%-5.68% are marked as outliers. with PM_(10) and CO more prone to outliers. Our method successfully identifies a trend of decreasing outliers from 2014 to 2016,which corresponds to known improvements in the quality assurance and quality control procedures of the China National Environmental Monitoring Network. The outliers can have a significant impact on the annual mean concentrations of PM_(2.5),with differences exceeding 10 μg m^(-3) at 66 sites. 展开更多
关键词 PROBABILISTIC AUTOMATIC outlier detection air quality observation low PASS filter spatial regression BIVARIATE normal distribution
下载PDF
Density-based trajectory outlier detection algorithm 被引量:10
15
作者 Zhipeng Liu Dechang Pi Jinfeng Jiang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2013年第2期335-340,共6页
With the development of global position system(GPS),wireless technology and location aware services,it is possible to collect a large quantity of trajectory data.In the field of data mining for moving objects,the pr... With the development of global position system(GPS),wireless technology and location aware services,it is possible to collect a large quantity of trajectory data.In the field of data mining for moving objects,the problem of anomaly detection is a hot topic.Based on the development of anomalous trajectory detection of moving objects,this paper introduces the classical trajectory outlier detection(TRAOD) algorithm,and then proposes a density-based trajectory outlier detection(DBTOD) algorithm,which compensates the disadvantages of the TRAOD algorithm that it is unable to detect anomalous defects when the trajectory is local and dense.The results of employing the proposed algorithm to Elk1993 and Deer1995 datasets are also presented,which show the effectiveness of the algorithm. 展开更多
关键词 density-based algorithm trajectory outlier detection(TRAOD) partition-and-detect framework Hausdorff distance
下载PDF
GA-iForest: An Efficient Isolated Forest Framework Based on Genetic Algorithm for Numerical Data Outlier Detection 被引量:4
16
作者 LI Kexin LI Jing +3 位作者 LIU Shuji LI Zhao BO Jue LIU Biqi 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2019年第6期1026-1038,共13页
With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorith... With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method. 展开更多
关键词 outlier detection isolation tree isolated forest genetic algorithm feature selection
下载PDF
Outliers Mining in Time Series Data Sets 被引量:3
17
作者 Zheng Binxiang,Du Xiuhua & Xi Yugeng Institute of Automation, Shanghai Jiaotong University,Shanghai 200030,P.R.China 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2002年第1期93-97,共5页
In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be ma... In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be mapped as the points in k -dimensional space.For these points, a cluster-based algorithm is developed to mine the outliers from these points.The algorithm first partitions the input points into disjoint clusters and then prunes the clusters,through judgment that can not contain outliers.Our algorithm has been run in the electrical load time series of one steel enterprise and proved to be effective. 展开更多
关键词 Data mining Time series outlier mining.
下载PDF
Rough Outlier Detection Based Security Risk Analysis Methodology 被引量:13
18
作者 Li Qianmu Li Jia 《China Communications》 SCIE CSCD 2012年第7期14-21,共8页
Security is a nonfunctional information system attribute that plays a crucial role in wide sensor network application domains. Security risk can be quantified as the combination of the probability that a sensor networ... Security is a nonfunctional information system attribute that plays a crucial role in wide sensor network application domains. Security risk can be quantified as the combination of the probability that a sensor network system may fail and the evaluation of the severity of the damage caused by the failure. In this paper, we devise a methodology of Rough Outlier Detection (ROD) for the detection of security-based risk factor, which originates from violations of attack requirements (namely, attack risks). The methodology elaborates dimension reduction method to analyze the attack risk probability from high dimensional and nonlinear data set, and combines it with rough redundancy reduction and the distance measurement of kernel function which is obtained using the ROD. In this way, it is possible to determine the risky scenarios, and the analysis feedback can be used to improve the sensor network system design. We illustrate the methodology in the DARPA case set study using step-by-step approach and then prove that the method is effective in lowering the rate of false alarm. 展开更多
关键词 rough outlier risk analysis dimensionality reduction
下载PDF
异常(Outlier)检测算法综述 被引量:3
19
作者 陈华 李继波 《大众科技》 2005年第9期96-97,共2页
文章主要介绍了数据挖掘中主要的异常(outlier)检测算法的分类和算法思想,并对这些算法进行了精要的评述。
关键词 outlier 定义 分类 算法
下载PDF
Outlier screening for ironmaking data on blast furnaces 被引量:6
20
作者 Jun Zhao Shao-fei Chen +3 位作者 Xiao-jie Liu Xin Li Hong-yang Li Qing Lyu 《International Journal of Minerals,Metallurgy and Materials》 SCIE EI CAS CSCD 2021年第6期1001-1010,共10页
Blast furnace data processing is prone to problems such as outliers.To overcome these problems and identify an improved method for processing blast furnace data,we conducted an in-depth study of blast furnace data.Bas... Blast furnace data processing is prone to problems such as outliers.To overcome these problems and identify an improved method for processing blast furnace data,we conducted an in-depth study of blast furnace data.Based on data samples from selected iron and steel companies,data types were classified according to different characteristics;then,appropriate methods were selected to process them in order to solve the deficiencies and outliers of the original blast furnace data.Linear interpolation was used to fill in the divided continuation data,the Knearest neighbor(KNN)algorithm was used to fill in correlation data with the internal law,and periodic statistical data were filled by the average.The error rate in the filling was low,and the fitting degree was over 85%.For the screening of outliers,corresponding indicator parameters were added according to the continuity,relevance,and periodicity of different data.Also,a variety of algorithms were used for processing.Through the analysis of screening results,a large amount of efficient information in the data was retained,and ineffective outliers were eliminated.Standardized processing of blast furnace big data as the basis of applied research on blast furnace big data can serve as an important means to improve data quality and retain data value. 展开更多
关键词 blast furnace data missing outlierS data processing data mining
下载PDF
上一页 1 2 168 下一页 到第
使用帮助 返回顶部