期刊文献+
共找到16篇文章
< 1 >
每页显示 20 50 100
A study on fast post-processing massive data of casting numerical simulation on personal computers 被引量:1
1
作者 Chen Tao Liao Dunming +1 位作者 Pang Shenyong Zhou Jianxin 《China Foundry》 SCIE CAS 2013年第5期321-324,共4页
When castings become complicated and the demands for precision of numerical simulation become higher,the numerical data of casting numerical simulation become more massive.On a general personal computer,these massive ... When castings become complicated and the demands for precision of numerical simulation become higher,the numerical data of casting numerical simulation become more massive.On a general personal computer,these massive numerical data may probably exceed the capacity of available memory,resulting in failure of rendering.Based on the out-of-core technique,this paper proposes a method to effectively utilize external storage and reduce memory usage dramatically,so as to solve the problem of insufficient memory for massive data rendering on general personal computers.Based on this method,a new postprocessor is developed.It is capable to illustrate filling and solidification processes of casting,as well as thermal stess.The new post-processor also provides fast interaction to simulation results.Theoretical analysis as well as several practical examples prove that the memory usage and loading time of the post-processor are independent of the size of the relevant files,but the proportion of the number of cells on surface.Meanwhile,the speed of rendering and fetching of value from the mouse is appreciable,and the demands of real-time and interaction are satisfied. 展开更多
关键词 casting numerical simulation massive data fast post-processing
下载PDF
Optimal decorrelated score subsampling for generalized linear models with massive data 被引量:1
2
作者 Junzhuo Gao Lei Wang Heng Lian 《Science China Mathematics》 SCIE CSCD 2024年第2期405-430,共26页
In this paper, we consider the unified optimal subsampling estimation and inference on the lowdimensional parameter of main interest in the presence of the nuisance parameter for low/high-dimensionalgeneralized linear... In this paper, we consider the unified optimal subsampling estimation and inference on the lowdimensional parameter of main interest in the presence of the nuisance parameter for low/high-dimensionalgeneralized linear models (GLMs) with massive data. We first present a general subsampling decorrelated scorefunction to reduce the influence of the less accurate nuisance parameter estimation with the slow convergencerate. The consistency and asymptotic normality of the resultant subsample estimator from a general decorrelatedscore subsampling algorithm are established, and two optimal subsampling probabilities are derived under theA- and L-optimality criteria to downsize the data volume and reduce the computational burden. The proposedoptimal subsampling probabilities provably improve the asymptotic efficiency of the subsampling schemes in thelow-dimensional GLMs and perform better than the uniform subsampling scheme in the high-dimensional GLMs.A two-step algorithm is further proposed to implement, and the asymptotic properties of the correspondingestimators are also given. Simulations show satisfactory performance of the proposed estimators, and twoapplications to census income and Fashion-MNIST datasets also demonstrate its practical applicability. 展开更多
关键词 A-OPTIMALITY decorrelated score subsampling high-dimensional inference L-optimality massive data
原文传递
Distributed Penalized Modal Regression for Massive Data
3
作者 JIN Jun LIU Shuangzhe MA Tiefeng 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2023年第2期798-821,共24页
Nowadays,researchers are frequently confronted with challenges from massive data computing by a number of limitations of computer primary memory.Modal regression(MR)is a good alternative of the mean regression and lik... Nowadays,researchers are frequently confronted with challenges from massive data computing by a number of limitations of computer primary memory.Modal regression(MR)is a good alternative of the mean regression and likelihood based methods,because of its robustness and high efficiency.To this end,the authors extend MR to massive data analysis and propose a computationally and statistically efficient divide and conquer MR method(DC-MR).The major novelty of this method consists of splitting one entire dataset into several blocks,implementing the MR method on data in each block,and deriving final results through combining these regression results via a weighted average,which provides approximate estimates of regression results on the entire dataset.The proposed method significantly reduces the required amount of primary memory,and the resulting estimator is theoretically as efficient as the traditional MR on the entire data set.The authors also investigate a multiple hypothesis testing variable selection approach to select significant parametric components and prove the approach possessing the oracle property.In addition,the authors propose a practical modified modal expectation-maximization(MEM)algorithm for the proposed procedures.Numerical studies on simulated and real datasets are conducted to assess and showcase the practical and effective performance of our proposed methods. 展开更多
关键词 Asymptotic distribution divide and conquer massive data modal regression multiple hypothesis testing
原文传递
Research on data load balancing technology of massive storage systems for wearable devices 被引量:1
4
作者 Shujun Liang Jing Cheng Jianwei Zhang 《Digital Communications and Networks》 SCIE CSCD 2022年第2期143-149,共7页
Because of the limited memory of the increasing amount of information in current wearable devices,the processing capacity of the servers in the storage system can not keep up with the speed of information growth,resul... Because of the limited memory of the increasing amount of information in current wearable devices,the processing capacity of the servers in the storage system can not keep up with the speed of information growth,resulting in low load balancing,long load balancing time and data processing delay.Therefore,a data load balancing technology is applied to the massive storage systems of wearable devices in this paper.We first analyze the object-oriented load balancing method,and formally describe the dynamic load balancing issues,taking the load balancing as a mapping problem.Then,the task of assigning each data node and the request of the corresponding data node’s actual processing capacity are completed.Different data is allocated to the corresponding data storage node to complete the calculation of the comprehensive weight of the data storage node.According to the load information of each data storage node collected by the scheduler in the storage system,the load weight of the current data storage node is calculated and distributed.The data load balancing of the massive storage system for wearable devices is realized.The experimental results show that the average time of load balancing using this method is 1.75h,which is much lower than the traditional methods.The results show the data load balancing technology of the massive storage system of wearable devices has the advantages of short data load balancing time,high load balancing,strong data processing capability,short processing time and obvious application. 展开更多
关键词 Wearable device massive data data storage system Load balancing Weigh
下载PDF
Parallelized User Clicks Recognition from Massive HTTP Data Based on Dependency Graph Model 被引量:1
5
作者 FANG Chcng LIU Jun LEI Zhenming 《China Communications》 SCIE CSCD 2014年第12期13-25,共13页
With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this pap... With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods. 展开更多
关键词 cloud computing massive data graph model web usage mining
下载PDF
Semiparametric Likelihood-based Inference for Censored Data with Auxiliary Information from External Massive Data Sources
6
作者 Yue-xin FANG Yong ZHOU 《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 2020年第3期642-656,共15页
Published auxiliary information can be helpful in conducting statistical inference in a new study.In this paper,we synthesize the auxiliary information with semiparametric likelihood-based inference for censoring data... Published auxiliary information can be helpful in conducting statistical inference in a new study.In this paper,we synthesize the auxiliary information with semiparametric likelihood-based inference for censoring data with the total sample size is available.We express the auxiliary information as constraints on the regression coefficients and the covariate distribution,then use empirical likelihood method for general estimating equations to improve the efficiency of the interested parameters in the specified model.The consistency and asymptotic normality of the resulting regression parameter estimators established.Also numerical simulation and application with different supposed conditions show that the proposed method yields a substantial gain in efficiency of the interested parameters. 展开更多
关键词 Auxiliary information massive data Censored data Empirical likelihood Estimation equations
原文传递
Massive Data Covert Transmission Scheme Based on Shamir Threshold
7
作者 ZHANG Tao WANG Yadi RONG Xing 《Wuhan University Journal of Natural Sciences》 CAS 2010年第3期227-231,共5页
Massive data covert transmission scheme based on Shamir threshold is proposed in this paper. This method applies Shamir threshold scheme to divide data, uses information hiding technology to cover shadows, and realize... Massive data covert transmission scheme based on Shamir threshold is proposed in this paper. This method applies Shamir threshold scheme to divide data, uses information hiding technology to cover shadows, and realizes massive data covert transmission through transmitting stego-covers. Analysis proves that compared with the natural division method, this scheme not only improves the time-efficiency of transmitting but also enhances the security. 展开更多
关键词 information hiding and transmission Shamir threshold scheme massive data time-efficiency SECURITY
原文传递
Linear expectile regression under massive data
8
作者 Shanshan Song Yuanyuan Lin Yong Zhou 《Fundamental Research》 CAS 2021年第5期574-585,共12页
In this paper,we study the large-scale inference for a linear expectile regression model.To mitigate the computational challenges in the classical asymmetric least squares(ALS)estimation under massive data,we propose ... In this paper,we study the large-scale inference for a linear expectile regression model.To mitigate the computational challenges in the classical asymmetric least squares(ALS)estimation under massive data,we propose a communication-efficient divide and conquer algorithm to combine the information from sub-machines through confidence distributions.The resulting pooled estimator has a closed-form expression,and its consistency and asymptotic normality are established under mild conditions.Moreover,we derive the Bahadur representation of the ALS estimator,which serves as an important tool to study the relationship between the number of submachines K and the sample size.Numerical studies including both synthetic and real data examples are presented to illustrate the finite-sample performance of our method and support the theoretical results. 展开更多
关键词 Divide and conquer algorithm Expectile regression (Asymptotic)confidence distribution massive data
原文传递
CBA: multi source fusion model for fast and intelligent target intention identification
9
作者 WAN Shichang LI Qingshan +1 位作者 WANG Xuhua LU Nanhua 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第2期406-416,共11页
How to mine valuable information from massive multisource heterogeneous data and identify the intention of aerial targets is a major research focus at present. Aiming at the longterm dependence of air target intention... How to mine valuable information from massive multisource heterogeneous data and identify the intention of aerial targets is a major research focus at present. Aiming at the longterm dependence of air target intention recognition, this paper deeply explores the potential attribute features from the spatiotemporal sequence data of the target. First, we build an intelligent dynamic intention recognition framework, including a series of specific processes such as data source, data preprocessing,target space-time, convolutional neural networks-bidirectional gated recurrent unit-atteneion (CBA) model and intention recognition. Then, we analyze and reason the designed CBA model in detail. Finally, through comparison and analysis with other recognition model experiments, our proposed method can effectively improve the accuracy of air target intention recognition,and is of significance to the commanders’ operational command and situation prediction. 展开更多
关键词 INTENTION massive data deep network artificial intelligence
下载PDF
Design and development of real-time query platform for big data based on hadoop 被引量:1
10
作者 刘小利 Xu Pandeng +1 位作者 Liu Mingliang Zhu Guobin 《High Technology Letters》 EI CAS 2015年第2期231-238,共8页
This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extract... This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability. 展开更多
关键词 big data massive data storage real-time query HADOOP distributed computing
下载PDF
Adaptive Distributed Inference for Multi-source Massive Heterogeneous Data
11
作者 Xin YANG Qi Jing YAN Mi Xia WU 《Acta Mathematica Sinica,English Series》 SCIE 2024年第11期2751-2770,共20页
In this paper,we consider the distributed inference for heterogeneous linear models with massive datasets.Noting that heterogeneity may exist not only in the expectations of the subpopulations,but also in their varian... In this paper,we consider the distributed inference for heterogeneous linear models with massive datasets.Noting that heterogeneity may exist not only in the expectations of the subpopulations,but also in their variances,we propose the heteroscedasticity-adaptive distributed aggregation(HADA)estimation,which is shown to be communication-efficient and asymptotically optimal,regardless of homoscedasticity or heteroscedasticity.Furthermore,a distributed test for parameter heterogeneity across subpopulations is constructed based on the HADA estimator.The finite-sample performance of the proposed methods is evaluated using simulation studies and the NYC flight data. 展开更多
关键词 Distributed estimation heterogeneity Levene’s test massive heterogeneous data
原文传递
Distributed wide field electromagnetic method based on high-order 2^(n) sequence pseudo random signal 被引量:4
12
作者 Yang YANG Ji-shan HE +1 位作者 Fan LING Yu-zhen ZHU 《Transactions of Nonferrous Metals Society of China》 SCIE EI CAS CSCD 2022年第5期1609-1622,共14页
To make three-dimensional electromagnetic exploration achievable,the distributed wide field electromagnetic method(WFEM)based on the high-order 2^(n) sequence pseudo-random signal is proposed and realized.In this meth... To make three-dimensional electromagnetic exploration achievable,the distributed wide field electromagnetic method(WFEM)based on the high-order 2^(n) sequence pseudo-random signal is proposed and realized.In this method,only one set of high-order pseudo-random waveforms,which contains all target frequencies,is needed.Based on high-order sequence pseudo-random signal construction algorithm,the waveform can be customized according to different exploration tasks.And the receivers are independent with each other and dynamically adjust the acquisition parameters according to different requirements.A field test in the deep iron ore of Qihe−Yucheng showed that the distributed WFEM based on high-order pseudo-random signal realizes the high-efficiency acquisition of massive electromagnetic data in quite a short time.Compared with traditional controlled-source electromagnetic methods,the distributed WFEM is much more efficient.Distributed WFEM can be applied to the large scale and high-resolution exploration for deep resources and minerals. 展开更多
关键词 distributed wide field electromagnetic method(WFEM) high-order pseudo-random signal MULTIFREQUENCY massive data
下载PDF
Optimization algorithm for rapid 3D gravity inversion 被引量:2
13
作者 Jing Lei Yao Chang-Li +3 位作者 Yang Ya-Bin Xu Meng-Long Zhang Guang-Zhi Ji Ruo-Ye 《Applied Geophysics》 SCIE CSCD 2019年第4期507-518,561,共13页
The practical application of 3D inversion of gravity data requires a lot of computation time and storage space.To solve this problem,we present an integrated optimization algorithm with the following components:(1)tar... The practical application of 3D inversion of gravity data requires a lot of computation time and storage space.To solve this problem,we present an integrated optimization algorithm with the following components:(1)targeting high accuracy in the space domain and fast computation in the wavenumber domain,we design a fast 3D forward algorithm with high precision;and(2)taking advantage of the symmetry of the inversion matrix,the main calculation in gravity conjugate gradient inversion is decomposed into two forward calculations,thus optimizing the computational efficiency of 3D gravity inversion.We verify the calculation accuracy and efficiency of the optimization algorithm by testing various grid-number models through numerical simulation experiments. 展开更多
关键词 GRAVITY 3D inversion optimization algorithm high efficiency massive data
下载PDF
A Parallel Platform for Web Text Mining
14
作者 Ping Lu Zhenjiang Dong +4 位作者 Shengmei Luo Lixia Liu Shanshan Guan Shengyu Liu Qingcai Chen 《ZTE Communications》 2013年第3期56-61,共6页
With user-generated content, anyone can De a content creator. This phenomenon has infinitely increased the amount of information circulated online, and it is beeoming harder to efficiently obtain required information.... With user-generated content, anyone can De a content creator. This phenomenon has infinitely increased the amount of information circulated online, and it is beeoming harder to efficiently obtain required information. In this paper, we describe how natural language processing and text mining can be parallelized using Hadoop and Message Passing Interface. We propose a parallel web text mining platform that processes massive amounts data quickly and efficiently. Our web knowledge service platform is designed to collect information about the IT and telecommunications industries from the web and process this in-formation using natural language processing and data-mining techniques. 展开更多
关键词 natural language processing text mining massive data paral-lel web knowledge service
下载PDF
Research and Simulation of Mass Random Data Association Rules Based on Fuzzy Cluster Analysis
15
作者 Huaisheng Wu Qin Li and Xiumng Li 《国际计算机前沿大会会议论文集》 2021年第1期80-89,共10页
Because the traditional method is difficult to obtain the internal relationshipand association rules of data when dealingwith massive data, a fuzzy clusteringmethod is proposed to analyze massive data. Firstly, the sa... Because the traditional method is difficult to obtain the internal relationshipand association rules of data when dealingwith massive data, a fuzzy clusteringmethod is proposed to analyze massive data. Firstly, the sample matrix wasnormalized through the normalization of sample data. Secondly, a fuzzy equivalencematrix was constructed by using fuzzy clustering method based on thenormalization matrix, and then the fuzzy equivalence matrix was applied as thebasis for dynamic clustering. Finally, a series of classifications were carried out onthe mass data at the cut-set level successively and a dynamic cluster diagram wasgenerated. The experimental results show that using data fuzzy clustering methodcan effectively identify association rules of data sets by multiple iterations ofmassive data, and the clustering process has short running time and good robustness.Therefore, it can be widely applied to the identification and classification ofassociation rules of massive data such as sound, image and natural resources. 展开更多
关键词 Fuzzy clustering massive random data Management rules Cut-set levels
原文传递
A MapReduced-Based and Cell-Based Outlier Detection Algorithm
16
作者 ZHU Sunjing LI Jing +2 位作者 HUANG Jilin LUO Simin PENG Weiping 《Wuhan University Journal of Natural Sciences》 CAS 2014年第3期199-205,共7页
Outlier detection is a very important type of data mining,which is extensively used in application areas.The traditional cell-based outlier detection algorithm not only takes a large amount of time in processing massi... Outlier detection is a very important type of data mining,which is extensively used in application areas.The traditional cell-based outlier detection algorithm not only takes a large amount of time in processing massive data,but also uses lots of machine resources,which results in the imbalance of the machine load.This paper presents an algorithm of the MapReduce-based and cell-based outlier detection,combined with the single-layer perceptron,which achieves the parallelization of outlier detection.These experiments show that this improved algorithm is able to effectively improve the efficiency of the outlier detection as well as the accuracy. 展开更多
关键词 outlier MapReduce data mining cell massive data
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部