期刊文献+
共找到4,135篇文章
< 1 2 207 >
每页显示 20 50 100
Big Data Analytics:Deep Content-Based Prediction with Sampling Perspective
1
作者 Waleed Albattah Saleh Albahli 《Computer Systems Science & Engineering》 SCIE EI 2023年第4期531-544,共14页
The world of information technology is more than ever being flooded with huge amounts of data,nearly 2.5 quintillion bytes every day.This large stream of data is called big data,and the amount is increasing each day.T... The world of information technology is more than ever being flooded with huge amounts of data,nearly 2.5 quintillion bytes every day.This large stream of data is called big data,and the amount is increasing each day.This research uses a technique called sampling,which selects a representative subset of the data points,manipulates and analyzes this subset to identify patterns and trends in the larger dataset being examined,and finally,creates models.Sampling uses a small proportion of the original data for analysis and model training,so that it is relatively faster while maintaining data integrity and achieving accurate results.Two deep neural networks,AlexNet and DenseNet,were used in this research to test two sampling techniques,namely sampling with replacement and reservoir sampling.The dataset used for this research was divided into three classes:acceptable,flagged as easy,and flagged as hard.The base models were trained with the whole dataset,whereas the other models were trained on 50%of the original dataset.There were four combinations of model and sampling technique.The F-measure for the AlexNet model was 0.807 while that for the DenseNet model was 0.808.Combination 1 was the AlexNet model and sampling with replacement,achieving an average F-measure of 0.8852.Combination 3 was the AlexNet model and reservoir sampling.It had an average F-measure of 0.8545.Combination 2 was the DenseNet model and sampling with replacement,achieving an average F-measure of 0.8017.Finally,combination 4 was the DenseNet model and reservoir sampling.It had an average F-measure of 0.8111.Overall,we conclude that both models trained on a sampled dataset gave equal or better results compared to the base models,which used the whole dataset. 展开更多
关键词 sampling big data deep learning AlexNet DenseNet
下载PDF
Effective data sampling strategies and boundary condition constraints of physics-informed neural networks for identifying material properties in solid mechanics
2
作者 W.WU M.DANEKER +2 位作者 M.A.JOLLEY K.T.TURNER L.LU 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI CSCD 2023年第7期1039-1068,共30页
Material identification is critical for understanding the relationship between mechanical properties and the associated mechanical functions.However,material identification is a challenging task,especially when the ch... Material identification is critical for understanding the relationship between mechanical properties and the associated mechanical functions.However,material identification is a challenging task,especially when the characteristic of the material is highly nonlinear in nature,as is common in biological tissue.In this work,we identify unknown material properties in continuum solid mechanics via physics-informed neural networks(PINNs).To improve the accuracy and efficiency of PINNs,we develop efficient strategies to nonuniformly sample observational data.We also investigate different approaches to enforce Dirichlet-type boundary conditions(BCs)as soft or hard constraints.Finally,we apply the proposed methods to a diverse set of time-dependent and time-independent solid mechanic examples that span linear elastic and hyperelastic material space.The estimated material parameters achieve relative errors of less than 1%.As such,this work is relevant to diverse applications,including optimizing structural integrity and developing novel materials. 展开更多
关键词 solid mechanics material identification physics-informed neural network(PINN) data sampling boundary condition(BC)constraint
下载PDF
EDSUCh:A robust ensemble data summarization method for effective medical diagnosis
3
作者 Mohiuddin Ahmed A.N.M.Bazlur Rashid 《Digital Communications and Networks》 SCIE CSCD 2024年第1期182-189,共8页
Identifying rare patterns for medical diagnosis is a challenging task due to heterogeneity and the volume of data.Data summarization can create a concise version of the original data that can be used for effective dia... Identifying rare patterns for medical diagnosis is a challenging task due to heterogeneity and the volume of data.Data summarization can create a concise version of the original data that can be used for effective diagnosis.In this paper,we propose an ensemble summarization method that combines clustering and sampling to create a summary of the original data to ensure the inclusion of rare patterns.To the best of our knowledge,there has been no such technique available to augment the performance of anomaly detection techniques and simultaneously increase the efficiency of medical diagnosis.The performance of popular anomaly detection algorithms increases significantly in terms of accuracy and computational complexity when the summaries are used.Therefore,the medical diagnosis becomes more effective,and our experimental results reflect that the combination of the proposed summarization scheme and all underlying algorithms used in this paper outperforms the most popular anomaly detection techniques. 展开更多
关键词 data summarization ENSEMBLE Medical diagnosis sampling
下载PDF
RAD-seq data reveals robust phylogeny and morphological evolutionary history of Rhododendron
4
作者 Yuanting Shen Gang Yao +6 位作者 Yunfei Li Xiaoling Tian Shiming Li Nian Wang Chengjun Zhang Fei Wang Yongpeng Ma 《Horticultural Plant Journal》 SCIE CAS CSCD 2024年第3期866-878,共13页
Rhododendron is famous for its high ornamental value.However,the genus is taxonomically difficult and the relationships within Rhododendron remain unresolved.In addition,the origin of key morphological characters with... Rhododendron is famous for its high ornamental value.However,the genus is taxonomically difficult and the relationships within Rhododendron remain unresolved.In addition,the origin of key morphological characters with high horticulture value need to be explored.Both problems largely hinder utilization of germplasm resources.Most studies attempted to disentangle the phylogeny of Rhododendron,but only used a few genomic markers and lacked large-scale sampling,resulting in low clade support and contradictory phylogenetic signals.Here,we used restriction-site associated DNA sequencing(RAD-seq)data and morphological traits for 144 species of Rhododendron,representing all subgenera and most sections and subsections of this species-rich genus,to decipher its intricate evolutionary history and reconstruct ancestral state.Our results revealed high resolutions at subgenera and section levels of Rhododendron based on RAD-seq data.Both optimal phylogenetic tree and split tree recovered five lineages among Rhododendron.Subg.Therorhodion(cladeⅠ)formed the basal lineage.Subg.Tsutsusi and Azaleastrum formed cladeⅡand had sister relationships.CladeⅢincluded all scaly rhododendron species.Subg.Pentanthera(cladeⅣ)formed a sister group to Subg.Hymenanthes(cladeⅤ).The results of ancestral state reconstruction showed that Rhododendron ancestor was a deciduous woody plant with terminal inflorescence,ten stamens,leaf blade without scales and broadly funnelform corolla with pink or purple color.This study shows significant distinguishability to resolve the evolutionary history of Rhododendron based on high clade support of phylogenetic tree constructed by RAD-seq data.It also provides an example to resolve discordant signals in phylogenetic trees and demonstrates the application feasibility of RAD-seq with large amounts of missing data in deciphering intricate evolutionary relationships.Additionally,the reconstructed ancestral state of six important characters provides insights into the innovation of key characters in Rhododendron. 展开更多
关键词 RHODODENDRON RAD-seq Missing data Quartet sampling(QS) Ancestral state reconstruction
下载PDF
Data partitioning based on sampling for power load streams
5
作者 王永利 徐宏炳 +2 位作者 董逸生 钱江波 刘学军 《Journal of Southeast University(English Edition)》 EI CAS 2005年第3期293-298,共6页
A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,wh... A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing. 展开更多
关键词 data streams continuous queries parallel processing sampling data partitioning
下载PDF
Super point detection based on sampling and data streaming algorithms
6
作者 程光 强士卿 《Journal of Southeast University(English Edition)》 EI CAS 2009年第2期224-227,共4页
In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and... In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and proves that only sources or destinations with a lot of flows can be sampled probabilistically using the SDSD algorithm. The SDSD algorithm uses both the IP table and the flow bloom filter (BF) data structures to maintain the IP and flow information. The IP table is used to judge whether an IP address has been recorded. If the IP exists, then all its subsequent flows will be recorded into the flow BF; otherwise, the IP flow is sampled. This paper also analyzes the accuracy and memory requirements of the SDSD algorithm , and tests them using the CERNET trace. The theoretical analysis and experimental tests demonstrate that the most relative errors of the super points estimated by the SDSD algorithm are less than 5%, whereas the results of other algorithms are about 10%. Because of the BF structure, the SDSD algorithm is also better than previous algorithms in terms of memory consumption. 展开更多
关键词 super point flow sampling data streaming
下载PDF
Estimating aboveground biomass of Pinus densata-dominated forests using Landsat time series and permanent sample plot data 被引量:8
7
作者 Jialong Zhang Chi Lu +1 位作者 Hui Xu Guangxing Wang 《Journal of Forestry Research》 SCIE CAS CSCD 2019年第5期1689-1706,共18页
Southwest China is one of three major forest regions in China and plays an important role in carbon sequestration.Accurate estimations of changes in aboveground biomass are critical for understanding forest carbon cyc... Southwest China is one of three major forest regions in China and plays an important role in carbon sequestration.Accurate estimations of changes in aboveground biomass are critical for understanding forest carbon cycling and promoting climate change mitigation.Southwest China is characterized by complex topographic features and forest canopy structures,complicating methods for mapping aboveground biomass and its dynamics.The integration of continuous Landsat images and national forest inventory data provides an alternative approach to develop a long-term monitoring program of forest aboveground biomass dynamics.This study explores the development of a methodological framework using historical national forest inventory plot data and Landsat TM timeseries images.This method was formulated by comparing two parametric methods:Linear Regression for Multiple Independent Variables(MLR),and Partial Least Square Regression(PLSR);and two nonparametric methods:Random Forest(RF)and Gradient Boost Regression Tree(GBRT)based on the state of forest aboveground biomass and change models.The methodological framework mapped Pinus densata aboveground biomass and its changes over time in Shangri-la,Yunnan,China.Landsat images and national forest inventory data were acquired for 1987,1992,1997,2002 and 2007.The results show that:(1)correlation and homogeneity texture measures were able to characterize forest canopy structures,aboveground biomass and its dynamics;(2)GBRT and RF predicted Pinus densata aboveground biomass and its changes better than PLSR and MLR;(3)GBRT was the most reliable approach in the estimation of aboveground biomass and its changes;and,(4)the aboveground biomass change models showed a promising improvement of prediction accuracy.This study indicates that the combination of GBRT state and change models developed using temporal Landsat and national forest inventory data provides the potential for developing a methodological framework for the long-term mapping and monitoring program of forest aboveground biomass and its changes in Southwest China. 展开更多
关键词 Forest biomass change Gradient Boost Regression Tree LANDSAT MULTI-TEMPORAL images PERMANENT sample plotS PINUS densata Shangri-La China
下载PDF
Scaling up the DBSCAN Algorithm for Clustering Large Spatial Databases Based on Sampling Technique 被引量:9
8
作者 Guan Ji hong 1, Zhou Shui geng 2, Bian Fu ling 3, He Yan xiang 1 1. School of Computer, Wuhan University, Wuhan 430072, China 2.State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China 3.College of Remote Sensin 《Wuhan University Journal of Natural Sciences》 CAS 2001年第Z1期467-473,共7页
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni... Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases. 展开更多
关键词 spatial databases data mining CLUSTERING sampling DBSCAN algorithm
下载PDF
Characteristics analysis on high density spatial sampling seismic data 被引量:11
9
作者 Cai Xiling Liu Xuewei +1 位作者 Deng Chunyan Lv Yingme 《Applied Geophysics》 SCIE CSCD 2006年第1期48-54,共7页
China's continental deposition basins are characterized by complex geological structures and various reservoir lithologies. Therefore, high precision exploration methods are needed. High density spatial sampling is a... China's continental deposition basins are characterized by complex geological structures and various reservoir lithologies. Therefore, high precision exploration methods are needed. High density spatial sampling is a new technology to increase the accuracy of seismic exploration. We briefly discuss point source and receiver technology, analyze the high density spatial sampling in situ method, introduce the symmetric sampling principles presented by Gijs J. O. Vermeer, and discuss high density spatial sampling technology from the point of view of wave field continuity. We emphasize the analysis of the high density spatial sampling characteristics, including the high density first break advantages for investigation of near surface structure, improving static correction precision, the use of dense receiver spacing at short offsets to increase the effective coverage at shallow depth, and the accuracy of reflection imaging. Coherent noise is not aliased and the noise analysis precision and suppression increases as a result. High density spatial sampling enhances wave field continuity and the accuracy of various mathematical transforms, which benefits wave field separation. Finally, we point out that the difficult part of high density spatial sampling technology is the data processing. More research needs to be done on the methods of analyzing and processing huge amounts of seismic data. 展开更多
关键词 high density spatial sampling symmetric sampling static correction noise suppression wave field separation and data processing.
下载PDF
Over-sampling algorithm for imbalanced data classification 被引量:9
10
作者 XU Xiaolong CHEN Wen SUN Yanfei 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2019年第6期1182-1191,共10页
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic... For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value. 展开更多
关键词 imbalanced data density-based spatial clustering of applications with noise(DBSCAN) synthetic minority over sampling technique(SMOTE) over-sampling.
下载PDF
Brittleness index predictions from Lower Barnett Shale well-log data applying an optimized data matching algorithm at various sampling densities 被引量:1
11
作者 David A.Wood 《Geoscience Frontiers》 SCIE CAS CSCD 2021年第6期444-457,共14页
The capability of accurately predicting mineralogical brittleness index (BI) from basic suites of well logs is desirable as it provides a useful indicator of the fracability of tight formations.Measuring mineralogical... The capability of accurately predicting mineralogical brittleness index (BI) from basic suites of well logs is desirable as it provides a useful indicator of the fracability of tight formations.Measuring mineralogical components in rocks is expensive and time consuming.However,the basic well log curves are not well correlated with BI so correlation-based,machine-learning methods are not able to derive highly accurate BI predictions using such data.A correlation-free,optimized data-matching algorithm is configured to predict BI on a supervised basis from well log and core data available from two published wells in the Lower Barnett Shale Formation (Texas).This transparent open box (TOB) algorithm matches data records by calculating the sum of squared errors between their variables and selecting the best matches as those with the minimum squared errors.It then applies optimizers to adjust weights applied to individual variable errors to minimize the root mean square error (RMSE)between calculated and predicted (BI).The prediction accuracy achieved by TOB using just five well logs (Gr,ρb,Ns,Rs,Dt) to predict BI is dependent on the density of data records sampled.At a sampling density of about one sample per 0.5 ft BI is predicted with RMSE~0.056 and R^(2)~0.790.At a sampling density of about one sample per0.1 ft BI is predicted with RMSE~0.008 and R^(2)~0.995.Adding a stratigraphic height index as an additional (sixth)input variable method improves BI prediction accuracy to RMSE~0.003 and R^(2)~0.999 for the two wells with only 1 record in 10,000 yielding a BI prediction error of>±0.1.The model has the potential to be applied in an unsupervised basis to predict BI from basic well log data in surrounding wells lacking mineralogical measurements but with similar lithofacies and burial histories.The method could also be extended to predict elastic rock properties in and seismic attributes from wells and seismic data to improve the precision of brittleness index and fracability mapping spatially. 展开更多
关键词 Well-log brittleness index estimates data record sample densities Zoomed-in data interpolation Correlation-free prediction analysis Mineralogical and elastic influences
下载PDF
Power Analysis and Sample Size Determination for Crossover Trials with Application to Bioequivalence Assessment of Topical Ophthalmic Drugs Using Serial Sampling Pharmacokinetic Data
12
作者 YU Yong Pei YAN Xiao Yan +1 位作者 YAO Chen XIA Jie Lai 《Biomedical and Environmental Sciences》 SCIE CAS CSCD 2019年第8期614-623,共10页
Objective To develop methods for determining a suitable sample size for bioequivalence assessment of generic topical ophthalmic drugs using crossover design with serial sampling schemes.Methods The power functions of ... Objective To develop methods for determining a suitable sample size for bioequivalence assessment of generic topical ophthalmic drugs using crossover design with serial sampling schemes.Methods The power functions of the Fieller-type confidence interval and the asymptotic confidence interval in crossover designs with serial-sampling data are here derived.Simulation studies were conducted to evaluate the derived power functions.Results Simulation studies show that two power functions can provide precise power estimates when normality assumptions are satisfied and yield conservative estimates of power in cases when data are log-normally distributed.The intra-correlation showed a positive correlation with the power of the bioequivalence test.When the expected ratio of the AUCs was less than or equal to 1, the power of the Fieller-type confidence interval was larger than the asymptotic confidence interval.If the expected ratio of the AUCs was larger than 1, the asymptotic confidence interval had greater power.Sample size can be calculated through numerical iteration with the derived power functions.Conclusion The Fieller-type power function and the asymptotic power function can be used to determine sample sizes of crossover trials for bioequivalence assessment of topical ophthalmic drugs. 展开更多
关键词 Serial-sampling data CROSSOVER design TOPICAL OPHTHALMIC drug BIOEQUIVALENCE Sample size
下载PDF
Novel Stability Criteria for Sampled-Data Systems With Variable Sampling Periods 被引量:2
13
作者 Hanyong Shao Jianrong Zhao Dan Zhang 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2020年第1期257-262,共6页
This paper is concerned with a novel Lyapunovlike functional approach to the stability of sampled-data systems with variable sampling periods. The Lyapunov-like functional has four striking characters compared to usua... This paper is concerned with a novel Lyapunovlike functional approach to the stability of sampled-data systems with variable sampling periods. The Lyapunov-like functional has four striking characters compared to usual ones. First, it is time-dependent. Second, it may be discontinuous. Third, not every term of it is required to be positive definite. Fourth, the Lyapunov functional includes not only the state and the sampled state but also the integral of the state. By using a recently reported inequality to estimate the derivative of this Lyapunov functional, a sampled-interval-dependent stability criterion with reduced conservatism is obtained. The stability criterion is further extended to sampled-data systems with polytopic uncertainties. Finally, three examples are given to illustrate the reduced conservatism of the stability criteria. 展开更多
关键词 Lyapunov functional sampled-data systems sampling-interval-dependent stability
下载PDF
Assessment of the State of Forests Based on Joint Statistical Processing of Sentinel-2B Remote Sensing Data and the Data from Network of Ground-Based ICP-Forests Sample Plots
14
作者 Alexander S. Alekseev Dmitry M. Chernikhovskii 《Open Journal of Ecology》 2022年第8期513-528,共16页
The research was carried out on the territory of the Karelian Isthmus of the Leningrad Region using Sentinel-2B images and data from a network of ground sample plots. The ground sample plots are located in the studied... The research was carried out on the territory of the Karelian Isthmus of the Leningrad Region using Sentinel-2B images and data from a network of ground sample plots. The ground sample plots are located in the studied territory mainly in a regular manner, laid and surveyed according to the ICP-Forests methodology with some additions. The total area of the sample plots is a small part of the entire study area. One of the objectives of the study was to determine the possibility of using the k-NN (nearest neighbor method) to assess the state of forests throughout the whole studied territory by joint statistical processing of data from ground sample plots and Sentinel-2B imagery. The data of the ground-based sample plots were divided into 2 equal parts, one for the application of the k-NN method, the second for checking the results of the method application. The systematic error in determining the mean damage class of the tree stands on sample plots by the k-NN method turned out to be zero, the random error is equal to one point. These results offer a possibility to determine the state of the forest in the entire study area. The second objective of the study was to examine the possibility of using the short-wave vegetation index (SWVI) to assess the state of forests. As a result, a close statistically reliable dependence of the average score of the state of plantations and the value of the SWVI index was established, which makes it possible to use the established relationship to determine the state of forests throughout the studied territory. The joint use and statistical processing of remotely sensed data and ground-based test areas by the two studied methods make it possible to assess the state of forests throughout the large studied area within the image. The results obtained can be used to monitor the state of forests in large areas and design appropriate forestry protective measures. 展开更多
关键词 Remote Sensing Sentinel-2B Imagery ICP-Forest Sample plot Tree Stand Damage Class k-NN (Nearest Neighbor Method) Vegetation Index SWVI Nonlinear Regression Systematic Error Random Error
下载PDF
Minimum Data Sampling Method in the Inverse Scattering Problem
15
作者 Yu Wenhua(Res. Inst. of EM Field and Microwave Tech.), Southwest Jiaotong University, Chengdu 610031 ,ChinaPeng Zhongqiu(Beijng Remote Sensing and Information Institute),Beijing 100011,ChinaRen Lang(Res.inst. of EM Field and Microwave Tech.), Southwest J 《Journal of Modern Transportation》 1994年第2期114-118,共5页
Fourier transform is a basis of the analysis. This paper presents a kind ofmethod of minimum sampling data determined profile of the inverted object ininverse scattering.
关键词 inverse scattering nonuniqueness sampling data
下载PDF
Consensus of heterogeneous multi-agent systems based on sampled data with a small sampling delay
16
作者 王娜 吴治海 彭力 《Chinese Physics B》 SCIE EI CAS CSCD 2014年第10期617-625,共9页
In this paper, consensus problems of heterogeneous multi-agent systems based on sampled data with a small sampling delay are considered. First, a consensus protocol based on sampled data with a small sampling delay fo... In this paper, consensus problems of heterogeneous multi-agent systems based on sampled data with a small sampling delay are considered. First, a consensus protocol based on sampled data with a small sampling delay for heterogeneous multi-agent systems is proposed. Then, the algebra graph theory, the matrix method, the stability theory of linear systems, and some other techniques are employed to derive the necessary and sufficient conditions guaranteeing heterogeneous multi-agent systems to asymptotically achieve the stationary consensus. Finally, simulations are performed to demonstrate the correctness of the theoretical results. 展开更多
关键词 heterogeneous multi-agent systems CONSENSUS SAMPLED-data small sampling delay
下载PDF
Major Data of China's 1% National Population Sampling Survey,1995
17
《China Population Today》 1996年第4期7-8,共2页
MajorDataofChina′s1%NationalPopulationSamplingSurvey,1995Major Data of China's 1% National Population Sampling Survey,199...
关键词 Major data of China’s 1 National Population sampling Survey 1995
下载PDF
Bayesian Computation for the Parameters of a Zero-Inflated Cosine Geometric Distribution with Application to COVID-19 Pandemic Data
18
作者 Sunisa Junnumtuam Sa-Aat Niwitpong Suparat Niwitpong 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第5期1229-1254,共26页
A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The b... A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The basic statistical properties of the new distribution,such as the moment generating function,mean,and variance are presented.Furthermore,confidence intervals are constructed by using the Wald,Bayesian,and highest posterior density(HPD)methods to estimate the true confidence intervals for the parameters of the ZICG distribution.Their efficacies were investigated by using both simulation and real-world data comprising the number of daily COVID-19 positive cases at the Olympic Games in Tokyo 2020.The results show that the HPD interval performed better than the other methods in terms of coverage probability and average length in most cases studied. 展开更多
关键词 Bayesian analysis confidence interval gibbs sampling random-walk metropolis zero-inflated count data
下载PDF
Data reliability of the emerging citizen science in the Greater Bay Area of China
19
作者 Xilin Huang Yihong Wang +1 位作者 Yang Liu Lyu Bing Zhang 《Avian Research》 SCIE CSCD 2023年第3期354-360,共7页
The potential of citizen science projects in research has been increasingly acknowledged,but the substantial engagement of these projects is restricted by the quality of citizen science data.Based on the largest emerg... The potential of citizen science projects in research has been increasingly acknowledged,but the substantial engagement of these projects is restricted by the quality of citizen science data.Based on the largest emerging citizen science project in the country-Birdreport Online Database(BOD),we examined the biases of birdwatching data from the Greater Bay Area of China.The results show that the sampling effort is disparate among land cover types due to contributors’ preference towards urban and suburban areas,indicating the environment suitable for species existence could be underrepresented in the BOD data.We tested the contributors’ skill of species identification via a questionnaire targeting the citizen birders in the Greater Bay Area.The questionnaire show that most citizen birdwatchers could correctly identify the common species widely distributed in Southern China and the less common species with conspicuous morphological characteristics,while failed to identify the species from Alaudidae;Caprimulgidae,Emberizidae,Phylloscopidae,Scolopacidae and Scotocercidae.With a study example,we demonstrate that spatially clustered bird watching visits can cause underestimation of species richness in insufficiently sampled areas;and the result of species richness mapping is sensitive to the contributors’ skill of identifying bird species.Our results address how avian research can be influenced by the reliability of citizen science data in a region of generally high accessibility,and highlight the necessity of pre-analysis scrutiny on data reliability regarding to research aims at all spatial and temporal scales.To improve the data quality,we suggest to equip the data collection frame of BOD with a flexible filter for bird abundance,and questionnaires that collect information related to contributors’ bird identification skill.Statistic modelling approaches are encouraged to apply for correcting the bias of sampling effort. 展开更多
关键词 Bird identification skill Citizen science data quality sampling bias Species richness The Greater Bay Area of China
下载PDF
基于重采样和混合集成学习的不平衡窃电检测 被引量:3
20
作者 游文霞 梁皓 +3 位作者 杨楠 李清清 吴永华 李文武 《电网技术》 EI CSCD 北大核心 2024年第2期730-739,共10页
针对电力用户类别不平衡导致窃电检测具有偏向性问题,该文提出一种基于重采样和混合集成学习的不平衡窃电检测模型。首先以Easy-ensemble混合集成学习框架为基础确定最佳采样子集数;然后通过重采样自适应策略,即根据用户用电数据集的不... 针对电力用户类别不平衡导致窃电检测具有偏向性问题,该文提出一种基于重采样和混合集成学习的不平衡窃电检测模型。首先以Easy-ensemble混合集成学习框架为基础确定最佳采样子集数;然后通过重采样自适应策略,即根据用户用电数据集的不平衡度以及最佳采样子集数确定检测模型的重采样方式,使用电数据达到平衡;最后按照先串行集成减小偏差、后并行集成降低方差的混合集成方式,对重采样后的均衡样本进行窃电检测。算例对比分析表明所提检测模型通过重采样和混合集成有效解决了传统集成算法在不平衡窃电检测中的偏向问题,降低了由于用电数据的不平衡性对集成结果的影响,提高了用户类别不平衡的窃电检测效果,在多种不平衡度下模型的准确率、F1值和G均值均表现优异。 展开更多
关键词 窃电检测 不平衡数据 重采样 集成学习 Easy-Ensemble集成框架
下载PDF
上一页 1 2 207 下一页 到第
使用帮助 返回顶部