Peer-to-peer(P2P)overlay networks provide message transmission capabilities for blockchain systems.Improving data transmission efficiency in P2P networks can greatly enhance the performance of blockchain systems.Howev...Peer-to-peer(P2P)overlay networks provide message transmission capabilities for blockchain systems.Improving data transmission efficiency in P2P networks can greatly enhance the performance of blockchain systems.However,traditional blockchain P2P networks face a common challenge where there is often a mismatch between the upper-layer traffic requirements and the underlying physical network topology.This mismatch results in redundant data transmission and inefficient routing,severely constraining the scalability of blockchain systems.To address these pressing issues,we propose FPSblo,an efficient transmission method for blockchain networks.Our inspiration for FPSblo stems from the Farthest Point Sampling(FPS)algorithm,a well-established technique widely utilized in point cloud image processing.In this work,we analogize blockchain nodes to points in a point cloud image and select a representative set of nodes to prioritize message forwarding so that messages reach the network edge quickly and are evenly distributed.Moreover,we compare our model with the Kadcast transmission model,which is a classic improvement model for blockchain P2P transmission networks,the experimental findings show that the FPSblo model reduces 34.8%of transmission redundancy and reduces the overload rate by 37.6%.By conducting experimental analysis,the FPS-BT model enhances the transmission capabilities of the P2P network in blockchain.展开更多
In this paper,by combining sampling methods for food statistics with years of sample sampling experience,various sampling points and corresponding sampling methods are summarized.It hopes to discover food safety risks...In this paper,by combining sampling methods for food statistics with years of sample sampling experience,various sampling points and corresponding sampling methods are summarized.It hopes to discover food safety risks and improve the level of food safety.展开更多
点云是一个庞大点的集合而且拥有重要的几何结构。由于其庞大的数据量,不可避免地就会在某些区域内出现一些相似点,这就使得在进行特征提取时提取到一些重复的信息,造成计算冗余,降低训练的准确率。针对上述问题,提出了一种新的神经网...点云是一个庞大点的集合而且拥有重要的几何结构。由于其庞大的数据量,不可避免地就会在某些区域内出现一些相似点,这就使得在进行特征提取时提取到一些重复的信息,造成计算冗余,降低训练的准确率。针对上述问题,提出了一种新的神经网络——PointPCA,可以有效地解决上述问题;在PointPCA中,总共分为三个模块:a)采样模块,提出了一种average point sampling(APS)采样方法,可以有效地规避一些相似的点,得到一组近似代表这组点云的新的点集;b)特征提取模块,采用分组中的思想,对这组新的点的集合进行多尺度空间特征提取;c)拼接模块,将每一尺度提取的特征向量拼接到一起组合为一个特征向量。经过实验表明,PointPCA比PointNet在准确率方面提升了4.6%,比PointNet++提升了1.1%;而且在mIoU评估测试中也有不错的效果。展开更多
Background:A new variance estimator is derived and tested for big BAF(Basal Area Factor)sampling which is a forest inventory system that utilizes Bitterlich sampling(point sampling)with two BAF sizes,a small BAF for t...Background:A new variance estimator is derived and tested for big BAF(Basal Area Factor)sampling which is a forest inventory system that utilizes Bitterlich sampling(point sampling)with two BAF sizes,a small BAF for tree counts and a larger BAF on which tree measurements are made usually including DBHs and heights needed for volume estimation.Methods:The new estimator is derived using the Delta method from an existing formulation of the big BAF estimator as consisting of three sample means.The new formula is compared to existing big BAF estimators including a popular estimator based on Bruce’s formula.Results:Several computer simulation studies were conducted comparing the new variance estimator to all known variance estimators for big BAF currently in the forest inventory literature.In simulations the new estimator performed well and comparably to existing variance formulas.Conclusions:A possible advantage of the new estimator is that it does not require the assumption of negligible correlation between basal area counts on the small BAF factor and volume-basal area ratios based on the large BAF factor selection trees,an assumption required by all previous big BAF variance estimation formulas.Although this correlation was negligible on the simulation stands used in this study,it is conceivable that the correlation could be significant in some forest types,such as those in which the DBH-height relationship can be affected substantially by density perhaps through competition.We derived a formula that can be used to estimate the covariance between estimates of mean basal area and the ratio of estimates of mean volume and mean basal area.We also mathematically derived expressions for bias in the big BAF estimator that can be used to show the bias approaches zero in large samples on the order of 1n where n is the number of sample points.展开更多
Background:The double sampling method known as“big BAF sampling”has been advocated as a way to reduce sampling effort while still maintaining a reasonably precise estimate of volume.A well-known method for variance ...Background:The double sampling method known as“big BAF sampling”has been advocated as a way to reduce sampling effort while still maintaining a reasonably precise estimate of volume.A well-known method for variance determination,Bruce’s method,is customarily used because the volume estimator takes the form of a product of random variables.However,the genesis of Bruce’s method is not known to most foresters who use the method in practice.Methods:We establish that the Taylor series approximation known as the Delta method provides a plausible explanation for the origins of Bruce’s method.Simulations were conducted on two different tree populations to ascertain the similarities of the Delta method to the exact variance of a product.Additionally,two alternative estimators for the variance of individual tree volume-basal area ratios,which are part of the estimation process,were compared within the overall variance estimation procedure.Results:The simulation results demonstrate that Bruce’s method provides a robust method for estimating the variance of inventories conducted with the big BAF method.The simulations also demonstrate that the variance of the mean volume-basal area ratios can be computed using either the usual sample variance of the mean or the ratio variance estimators with equal accuracy,which had not been shown previously for Big BAF sampling.Conclusions:A plausible explanation for the origins of Bruce’s method has been set forth both historically and mathematically in the Delta Method.In most settings,there is evidently no practical difference between applying the exact variance of a product or the Delta method—either can be used.A caution is articulated concerning the aggregation of tree-wise attributes into point-wise summaries in order to test the correlation between the two as a possible indicator of the need for further covariance augmentation.展开更多
In the reliability analysis of complex structures,response surface method(RSM)has been suggested as an efficient technique to estimate the actual but implicit limit state function.A set of sample points are needed to ...In the reliability analysis of complex structures,response surface method(RSM)has been suggested as an efficient technique to estimate the actual but implicit limit state function.A set of sample points are needed to fit to the implicit function.It has been noted that the accuracy of RSM depends highly on the so-called sample points.However,the technique for point selection has had little attention.In the present study,an improved response surface method(IRSM)based on two sample point selection techniques,named the direction cosines projected strategy(DCS)and the limit step length iteration strategy(LSS),is investigated.Since it uses the sampling points selected to be located in the region close to the original failure surface,and since it needs only one response surface,the IRSM should be accurate and simple in practical structural problems.Applications to several typical examples have helped to elucidate the successful working of the IRSM.展开更多
Centroidal Voronoi tessellations(CVTs) have become a useful tool in many applications ranging from geometric modeling,image and data analysis,and numerical partial differential equations,to problems in physics,astroph...Centroidal Voronoi tessellations(CVTs) have become a useful tool in many applications ranging from geometric modeling,image and data analysis,and numerical partial differential equations,to problems in physics,astrophysics,chemistry,and biology. In this paper,we briefly review the CVT concept and a few of its generalizations and well-known properties.We then present an overview of recent advances in both mathematical and computational studies and in practical applications of CVTs.Whenever possible,we point out some outstanding issues that still need investigating.展开更多
Landscape pattern is a widely used concept for the demonstration of landscape characteristic features. The integral spatial distribution trend of landscape elements is interested point in the landscape ecological rese...Landscape pattern is a widely used concept for the demonstration of landscape characteristic features. The integral spatial distribution trend of landscape elements is interested point in the landscape ecological research, especially in those of complex secondary forest regions with confusing mosaics of land cover. Trend surface analysis which used in community and population ecological researches was introduced to reveal the landscape pattern. A reasonable and reliable approach for application of trend surface analysis was provided in detail. As key steps of the approach, uniform grid point sampling method was developed. The efforts were also concentrated at an example of Guandishan forested landscape. Some basic rules of spatial distribution of landscape elements were exclaimed. These will be benefit to the further study in the area to enhance the forest sustainable management and landscape planning.展开更多
Aiming at the limitation of the traditional method for determination of protection region, combined with the actual situation of a mine, a new method for determination of protection region was put forward (including ...Aiming at the limitation of the traditional method for determination of protection region, combined with the actual situation of a mine, a new method for determination of protection region was put forward (including the protection of working face layout and development direction), that is, gas flow observation analysis on the spot and gas content contrast method. The protection region was determined by gas flow observation analysis, gas content contrast, and computer numerical simulation combined with engineering practice. In the process of gas content test, the fixed sampling method "big hole drill reaming, small orifice drill rod connected with core tube" was employed. The results show that the determined protection region is in accordance with the actual site situation. The fixed sampling method ensures the accuracy of gas measurement of gas content.展开更多
DNA barcodes,short and unique DNA sequences,play a crucial role in sample identification when processing many samples simultaneously,which helps reduce experimental costs.Nevertheless,the low quality of long-read sequ...DNA barcodes,short and unique DNA sequences,play a crucial role in sample identification when processing many samples simultaneously,which helps reduce experimental costs.Nevertheless,the low quality of long-read sequencing makes it difficult to identify barcodes accurately,which poses significant challenges for the design of barcodes for large numbers of samples in a single sequencing run.Here,we present a comprehensive study of the generation of barcodes and develop a tool,PRO,that can be used for selecting optimal barcode sets and demultiplexing.We formulate the barcode design problem as a combinatorial problem and prove that finding the optimal largest barcode set in a given DNA sequence space in which all sequences have the same length is theoretically NP-complete.For practical applications,we developed the novel method PRO by introducing the probability divergence between two DNA sequences to expand the capacity of barcode kits while ensuring demultiplexing accuracy.Specifically,the maximum size of the barcode kits designed by PRO is 2,292,which keeps the length of barcodes the same as that of the official ones used by Oxford Nanopore Technologies(ONT).We validated the performance of PRO on a simulated nanopore dataset with high error rates.The demultiplexing accuracy of PRO reached 98.29%for a barcode kit of size 2,922,4.31%higher than that of Guppy,the official demultiplexing tool.When the size of the barcode kit generated by PRO is the same as the official size provided by ONT,both tools show superior and comparable demultiplexing accuracy.展开更多
The synthetic minority oversampling technique(SMOTE) is a popular algorithm to reduce the impact of class imbalance in building classifiers, and has received several enhancements over the past 20 years. SMOTE and its ...The synthetic minority oversampling technique(SMOTE) is a popular algorithm to reduce the impact of class imbalance in building classifiers, and has received several enhancements over the past 20 years. SMOTE and its variants synthesize a number of minority-class sample points in the original sample space to alleviate the adverse effects of class imbalance. This approach works well in many cases, but problems arise when synthetic sample points are generated in overlapping areas between different classes, which further complicates classifier training. To address this issue, this paper proposes a novel generalization-oriented rather than imputation-oriented minorityclass sample point generation algorithm, named overlapping minimization SMOTE(OM-SMOTE). This algorithm is designed specifically for binary imbalanced classification problems. OM-SMOTE first maps the original sample points into a new sample space by balancing sample encoding and classifier generalization. Then, OM-SMOTE employs a set of sophisticated minority-class sample point imputation rules to generate synthetic sample points that are as far as possible from overlapping areas between classes. Extensive experiments have been conducted on 32 imbalanced datasets to validate the effectiveness of OM-SMOTE. Results show that using OM-SMOTE to generate synthetic minority-class sample points leads to better classifier training performances for the naive Bayes,support vector machine, decision tree, and logistic regression classifiers than the 11 state-of-the-art SMOTE-based imputation algorithms. This demonstrates that OM-SMOTE is a viable approach for supporting the training of high-quality classifiers for imbalanced classification. The implementation of OM-SMOTE is shared publicly on the Git Hub platform at https://github.com/luxuan123123/OM-SMOTE/.展开更多
Under special conditions on data set and underlying distribution, the limit of finite sample breakdown point of Tukey's halfspace median (1) has been obtained in the literature. In this paper, we establish the resu...Under special conditions on data set and underlying distribution, the limit of finite sample breakdown point of Tukey's halfspace median (1) has been obtained in the literature. In this paper, we establish the result under weaker assumptions imposed on underlying distribution (weak smoothness) and on data set (not necessary in general position). The refined representation of Tukey's sample depth regions for data set not necessary in general position is also obtained, as a by-product of our derivation.展开更多
It is very important in accurately estimating the forests' carbon stock and spatial distribution in the regional scale because they possess a great rate in the carbon stock of the terrestrial ecosystem. Yet the curre...It is very important in accurately estimating the forests' carbon stock and spatial distribution in the regional scale because they possess a great rate in the carbon stock of the terrestrial ecosystem. Yet the current estimation of forest carbon stock in the regional scale mainly depends on the forest inventory data, and the whole process consumes too much labor, money and time. And meanwhile it has many negative influences on the forest carbon storage updating. In order to figure out these problems, this paper, based on High Accuracy Surface Modeling (HASM), proposes a forest vegetation carbon storage simulation method. This new method employs the output of LPJ-GUESS model as initial values of HASM and uses the inventory data as sample points of HASM to simulate the distribution of forest carbon storage in China. This study also adopts the seventh forest resources statistics of China as the data source to generate sample points, and it also works as the simulation accuracy test. The HASM simulation shows that the total forest carbon storage of China is 9.2405 Pg, while the calculated value based on forest resources statistics are 7.8115 Pg. The forest resources statistics is taken based on a forest canopy closure, and the result of HASM is much more suitable to the real forest carbon storage. The simulation result also indicates that the southwestern mountain region and the northeastern forests are the important forest carbon reservoirs in China, and they account for 39.82% and 20.46% of the country's total forest vegetation carbon stock respectively. Compared with the former value (1975-1995), it mani- fests that the carbon storage of the two regions do increase clearly. The results of this re- search show that the large-scale reforestation in the last decades in China attains a signifi- cant carbon sink.展开更多
To analyze the effect of the region of the model inputs on the model output,a novel concept about contribution to the sample failure probability plot(CSFP) is proposed based on the contribution to the sample mean plot...To analyze the effect of the region of the model inputs on the model output,a novel concept about contribution to the sample failure probability plot(CSFP) is proposed based on the contribution to the sample mean plot(CSM) and the contribution to the sample variance plot(CSV).The CSFP can be used to analyze the effect of the region of the model inputs on the failure probability.After the definition of CSFP,its property and the differences between CSFP and CSV/CSM are discussed.The proposed CSFP can not only provide the information about which input affects the failure probability mostly,but also identify the contribution of the regions of the input to the failure probability mostly.By employing the Kriging model method on optimized sample points,a solution for CSFP is obtained.The computational cost for solving CSFP is greatly decreased because of the efficiency of Kriging surrogate model.Some examples are used to illustrate the validity of the proposed CSFP and the applicability and feasibility of the Kriging surrogate method based solution for CSFP.展开更多
基金This present research work was supported by the National Key R&D Program of China(No.2021YFB2700800)the GHfund B(No.202302024490).
文摘Peer-to-peer(P2P)overlay networks provide message transmission capabilities for blockchain systems.Improving data transmission efficiency in P2P networks can greatly enhance the performance of blockchain systems.However,traditional blockchain P2P networks face a common challenge where there is often a mismatch between the upper-layer traffic requirements and the underlying physical network topology.This mismatch results in redundant data transmission and inefficient routing,severely constraining the scalability of blockchain systems.To address these pressing issues,we propose FPSblo,an efficient transmission method for blockchain networks.Our inspiration for FPSblo stems from the Farthest Point Sampling(FPS)algorithm,a well-established technique widely utilized in point cloud image processing.In this work,we analogize blockchain nodes to points in a point cloud image and select a representative set of nodes to prioritize message forwarding so that messages reach the network edge quickly and are evenly distributed.Moreover,we compare our model with the Kadcast transmission model,which is a classic improvement model for blockchain P2P transmission networks,the experimental findings show that the FPSblo model reduces 34.8%of transmission redundancy and reduces the overload rate by 37.6%.By conducting experimental analysis,the FPS-BT model enhances the transmission capabilities of the P2P network in blockchain.
文摘In this paper,by combining sampling methods for food statistics with years of sample sampling experience,various sampling points and corresponding sampling methods are summarized.It hopes to discover food safety risks and improve the level of food safety.
文摘点云是一个庞大点的集合而且拥有重要的几何结构。由于其庞大的数据量,不可避免地就会在某些区域内出现一些相似点,这就使得在进行特征提取时提取到一些重复的信息,造成计算冗余,降低训练的准确率。针对上述问题,提出了一种新的神经网络——PointPCA,可以有效地解决上述问题;在PointPCA中,总共分为三个模块:a)采样模块,提出了一种average point sampling(APS)采样方法,可以有效地规避一些相似的点,得到一组近似代表这组点云的新的点集;b)特征提取模块,采用分组中的思想,对这组新的点的集合进行多尺度空间特征提取;c)拼接模块,将每一尺度提取的特征向量拼接到一起组合为一个特征向量。经过实验表明,PointPCA比PointNet在准确率方面提升了4.6%,比PointNet++提升了1.1%;而且在mIoU评估测试中也有不错的效果。
基金Support was provided by Research Joint Venture Agreement 17-JV-11242306045,“Old Growth Forest Dynamics and Structure,”between the USDA Forest Service and the University of New HampshireAdditional support to MJD was provided by the USDA National Institute of Food and Agriculture McIntire-Stennis Project Accession Number 1020142,“Forest Structure,Volume,and Biomass in the Northeastern United States.”+1 种基金supported by the USDA National Institute of Food and Agriculture,McIntire-Stennis project OKL02834the Division of Agricultural Sciences and Natural Resources at Oklahoma State University.
文摘Background:A new variance estimator is derived and tested for big BAF(Basal Area Factor)sampling which is a forest inventory system that utilizes Bitterlich sampling(point sampling)with two BAF sizes,a small BAF for tree counts and a larger BAF on which tree measurements are made usually including DBHs and heights needed for volume estimation.Methods:The new estimator is derived using the Delta method from an existing formulation of the big BAF estimator as consisting of three sample means.The new formula is compared to existing big BAF estimators including a popular estimator based on Bruce’s formula.Results:Several computer simulation studies were conducted comparing the new variance estimator to all known variance estimators for big BAF currently in the forest inventory literature.In simulations the new estimator performed well and comparably to existing variance formulas.Conclusions:A possible advantage of the new estimator is that it does not require the assumption of negligible correlation between basal area counts on the small BAF factor and volume-basal area ratios based on the large BAF factor selection trees,an assumption required by all previous big BAF variance estimation formulas.Although this correlation was negligible on the simulation stands used in this study,it is conceivable that the correlation could be significant in some forest types,such as those in which the DBH-height relationship can be affected substantially by density perhaps through competition.We derived a formula that can be used to estimate the covariance between estimates of mean basal area and the ratio of estimates of mean volume and mean basal area.We also mathematically derived expressions for bias in the big BAF estimator that can be used to show the bias approaches zero in large samples on the order of 1n where n is the number of sample points.
基金Research Joint Venture Agreement 17-JV-11242306045,“Old Growth Forest Dynamics and Structure,”between the USDA Forest Service and the University of New Hampshire.Additional support to MJD was provided by the USDA National Institute of Food and Agriculture McIntire-Stennis Project Accession Number 1020142,“Forest Structure,Volume,and Biomass in the Northeastern United States.”TBL:This work was supported by the USDA National Institute of Food and Agriculture,McIntire-Stennis project OKL02834 and the Division of Agricultural Sciences and Natural Resources at Oklahoma State University.
文摘Background:The double sampling method known as“big BAF sampling”has been advocated as a way to reduce sampling effort while still maintaining a reasonably precise estimate of volume.A well-known method for variance determination,Bruce’s method,is customarily used because the volume estimator takes the form of a product of random variables.However,the genesis of Bruce’s method is not known to most foresters who use the method in practice.Methods:We establish that the Taylor series approximation known as the Delta method provides a plausible explanation for the origins of Bruce’s method.Simulations were conducted on two different tree populations to ascertain the similarities of the Delta method to the exact variance of a product.Additionally,two alternative estimators for the variance of individual tree volume-basal area ratios,which are part of the estimation process,were compared within the overall variance estimation procedure.Results:The simulation results demonstrate that Bruce’s method provides a robust method for estimating the variance of inventories conducted with the big BAF method.The simulations also demonstrate that the variance of the mean volume-basal area ratios can be computed using either the usual sample variance of the mean or the ratio variance estimators with equal accuracy,which had not been shown previously for Big BAF sampling.Conclusions:A plausible explanation for the origins of Bruce’s method has been set forth both historically and mathematically in the Delta Method.In most settings,there is evidently no practical difference between applying the exact variance of a product or the Delta method—either can be used.A caution is articulated concerning the aggregation of tree-wise attributes into point-wise summaries in order to test the correlation between the two as a possible indicator of the need for further covariance augmentation.
文摘In the reliability analysis of complex structures,response surface method(RSM)has been suggested as an efficient technique to estimate the actual but implicit limit state function.A set of sample points are needed to fit to the implicit function.It has been noted that the accuracy of RSM depends highly on the so-called sample points.However,the technique for point selection has had little attention.In the present study,an improved response surface method(IRSM)based on two sample point selection techniques,named the direction cosines projected strategy(DCS)and the limit step length iteration strategy(LSS),is investigated.Since it uses the sampling points selected to be located in the region close to the original failure surface,and since it needs only one response surface,the IRSM should be accurate and simple in practical structural problems.Applications to several typical examples have helped to elucidate the successful working of the IRSM.
基金supported by the US Department of Energy Office of Science Climate Change Prediction Program through grant numbers DE-FG02-07ER64431 and DE-FG02-07ER64432the US National Science Foundation under grant numbers DMS-0609575 and DMS-0913491
文摘Centroidal Voronoi tessellations(CVTs) have become a useful tool in many applications ranging from geometric modeling,image and data analysis,and numerical partial differential equations,to problems in physics,astrophysics,chemistry,and biology. In this paper,we briefly review the CVT concept and a few of its generalizations and well-known properties.We then present an overview of recent advances in both mathematical and computational studies and in practical applications of CVTs.Whenever possible,we point out some outstanding issues that still need investigating.
文摘Landscape pattern is a widely used concept for the demonstration of landscape characteristic features. The integral spatial distribution trend of landscape elements is interested point in the landscape ecological research, especially in those of complex secondary forest regions with confusing mosaics of land cover. Trend surface analysis which used in community and population ecological researches was introduced to reveal the landscape pattern. A reasonable and reliable approach for application of trend surface analysis was provided in detail. As key steps of the approach, uniform grid point sampling method was developed. The efforts were also concentrated at an example of Guandishan forested landscape. Some basic rules of spatial distribution of landscape elements were exclaimed. These will be benefit to the further study in the area to enhance the forest sustainable management and landscape planning.
文摘Aiming at the limitation of the traditional method for determination of protection region, combined with the actual situation of a mine, a new method for determination of protection region was put forward (including the protection of working face layout and development direction), that is, gas flow observation analysis on the spot and gas content contrast method. The protection region was determined by gas flow observation analysis, gas content contrast, and computer numerical simulation combined with engineering practice. In the process of gas content test, the fixed sampling method "big hole drill reaming, small orifice drill rod connected with core tube" was employed. The results show that the determined protection region is in accordance with the actual site situation. The fixed sampling method ensures the accuracy of gas measurement of gas content.
文摘DNA barcodes,short and unique DNA sequences,play a crucial role in sample identification when processing many samples simultaneously,which helps reduce experimental costs.Nevertheless,the low quality of long-read sequencing makes it difficult to identify barcodes accurately,which poses significant challenges for the design of barcodes for large numbers of samples in a single sequencing run.Here,we present a comprehensive study of the generation of barcodes and develop a tool,PRO,that can be used for selecting optimal barcode sets and demultiplexing.We formulate the barcode design problem as a combinatorial problem and prove that finding the optimal largest barcode set in a given DNA sequence space in which all sequences have the same length is theoretically NP-complete.For practical applications,we developed the novel method PRO by introducing the probability divergence between two DNA sequences to expand the capacity of barcode kits while ensuring demultiplexing accuracy.Specifically,the maximum size of the barcode kits designed by PRO is 2,292,which keeps the length of barcodes the same as that of the official ones used by Oxford Nanopore Technologies(ONT).We validated the performance of PRO on a simulated nanopore dataset with high error rates.The demultiplexing accuracy of PRO reached 98.29%for a barcode kit of size 2,922,4.31%higher than that of Guppy,the official demultiplexing tool.When the size of the barcode kit generated by PRO is the same as the official size provided by ONT,both tools show superior and comparable demultiplexing accuracy.
基金Project supported by the National Natural Science Foundation of China(No.61972261)the Natural Science Foundation of Guangdong Province,China(No.2023A1515011667)+1 种基金the Key Basic Research Foundation of Shenzhen,China(No.JCYJ20220818100205012)the Basic Research Foundation of Shenzhen,China(No.JCYJ20210324093609026)。
文摘The synthetic minority oversampling technique(SMOTE) is a popular algorithm to reduce the impact of class imbalance in building classifiers, and has received several enhancements over the past 20 years. SMOTE and its variants synthesize a number of minority-class sample points in the original sample space to alleviate the adverse effects of class imbalance. This approach works well in many cases, but problems arise when synthetic sample points are generated in overlapping areas between different classes, which further complicates classifier training. To address this issue, this paper proposes a novel generalization-oriented rather than imputation-oriented minorityclass sample point generation algorithm, named overlapping minimization SMOTE(OM-SMOTE). This algorithm is designed specifically for binary imbalanced classification problems. OM-SMOTE first maps the original sample points into a new sample space by balancing sample encoding and classifier generalization. Then, OM-SMOTE employs a set of sophisticated minority-class sample point imputation rules to generate synthetic sample points that are as far as possible from overlapping areas between classes. Extensive experiments have been conducted on 32 imbalanced datasets to validate the effectiveness of OM-SMOTE. Results show that using OM-SMOTE to generate synthetic minority-class sample points leads to better classifier training performances for the naive Bayes,support vector machine, decision tree, and logistic regression classifiers than the 11 state-of-the-art SMOTE-based imputation algorithms. This demonstrates that OM-SMOTE is a viable approach for supporting the training of high-quality classifiers for imbalanced classification. The implementation of OM-SMOTE is shared publicly on the Git Hub platform at https://github.com/luxuan123123/OM-SMOTE/.
基金Supported by NSF of China(Grant Nos.11601197,11461029 and 61563018)Ministry of Education Humanity Social Science Research Project of China(Grant No.15JYC910002)+2 种基金China Postdoctoral Science Foundation Funded Project(Grant Nos.2016M600511 and 2017T100475)NSF of Jiangxi Province(Grant Nos.20171ACB21030,20161BAB201024 and 20161ACB20009)the Key Science Fund Project of Jiangxi Provincial Education Department(Grant Nos.GJJ150439,KJLD13033 and KJLD14034)
文摘Under special conditions on data set and underlying distribution, the limit of finite sample breakdown point of Tukey's halfspace median (1) has been obtained in the literature. In this paper, we establish the result under weaker assumptions imposed on underlying distribution (weak smoothness) and on data set (not necessary in general position). The refined representation of Tukey's sample depth regions for data set not necessary in general position is also obtained, as a by-product of our derivation.
基金National High-tech R&D Program of the Ministry of Science and Technology of the People's Republic of China,No.2013AA122003National Key Technologies R&D Program of the Ministry of Science and Tech-nology of China,No.2013BACO3B05
文摘It is very important in accurately estimating the forests' carbon stock and spatial distribution in the regional scale because they possess a great rate in the carbon stock of the terrestrial ecosystem. Yet the current estimation of forest carbon stock in the regional scale mainly depends on the forest inventory data, and the whole process consumes too much labor, money and time. And meanwhile it has many negative influences on the forest carbon storage updating. In order to figure out these problems, this paper, based on High Accuracy Surface Modeling (HASM), proposes a forest vegetation carbon storage simulation method. This new method employs the output of LPJ-GUESS model as initial values of HASM and uses the inventory data as sample points of HASM to simulate the distribution of forest carbon storage in China. This study also adopts the seventh forest resources statistics of China as the data source to generate sample points, and it also works as the simulation accuracy test. The HASM simulation shows that the total forest carbon storage of China is 9.2405 Pg, while the calculated value based on forest resources statistics are 7.8115 Pg. The forest resources statistics is taken based on a forest canopy closure, and the result of HASM is much more suitable to the real forest carbon storage. The simulation result also indicates that the southwestern mountain region and the northeastern forests are the important forest carbon reservoirs in China, and they account for 39.82% and 20.46% of the country's total forest vegetation carbon stock respectively. Compared with the former value (1975-1995), it mani- fests that the carbon storage of the two regions do increase clearly. The results of this re- search show that the large-scale reforestation in the last decades in China attains a signifi- cant carbon sink.
基金supported by the National Natural Science Foundation of China (Grant No. 51175425)the Aviation Foundation (Grant No.2011ZA53015)
文摘To analyze the effect of the region of the model inputs on the model output,a novel concept about contribution to the sample failure probability plot(CSFP) is proposed based on the contribution to the sample mean plot(CSM) and the contribution to the sample variance plot(CSV).The CSFP can be used to analyze the effect of the region of the model inputs on the failure probability.After the definition of CSFP,its property and the differences between CSFP and CSV/CSM are discussed.The proposed CSFP can not only provide the information about which input affects the failure probability mostly,but also identify the contribution of the regions of the input to the failure probability mostly.By employing the Kriging model method on optimized sample points,a solution for CSFP is obtained.The computational cost for solving CSFP is greatly decreased because of the efficiency of Kriging surrogate model.Some examples are used to illustrate the validity of the proposed CSFP and the applicability and feasibility of the Kriging surrogate method based solution for CSFP.