Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on t...Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on the benchmark datasets have been proposed for multi-label classification task in the literature.Furthermore,several open-source tools implementing these approaches have also been developed.However,the characteristics of real-world multi-label patent and publication datasets are not completely in line with those of benchmark ones.Therefore,the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.Research limitations:Three real-world datasets differ in the following aspects:statement,data quality,and purposes.Additionally,open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection,which in turn impacts the performance of a multi-label classification approach.In the near future,we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.Practical implications:The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets,underscoring the complexity of real-world multi-label classification tasks.Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels.With ongoing enhancements in deep learning algorithms and large-scale models,it is expected that the efficacy of multi-label classification tasks will be significantly improved,reaching a level of practical utility in the foreseeable future.Originality/value:(1)Seven multi-label classification methods are comprehensively compared on three real-world datasets.(2)The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution.(3)The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution.展开更多
This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to...This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus.展开更多
This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which ...This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which is more realistic than synthetical datasets.In this paper,datasets containing different shapes are constructed based on the relative permittivities of human tissues.Then,a back-propagation scheme is used to obtain the rough reconstructions,which will be fed into a U-net convolutional neural network(CNN)to recover the high-resolution images.Numerical results show that the network trained on the datasets generated by the proposed method can obtain satisfying reconstruction results and is promising to be applied in real-time biomedical imaging.展开更多
Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phis...Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phishing attacks on websites and assesses the performance of three prominent Machine Learning(ML)models—Artificial Neural Networks(ANN),Convolutional Neural Networks(CNN),and Long Short-Term Memory(LSTM)—utilizing authentic datasets sourced from Kaggle and Mendeley repositories.Extensive experimentation and analysis reveal that the CNN model achieves a better accuracy of 98%.On the other hand,LSTM shows the lowest accuracy of 96%.These findings underscore the potential of ML techniques in enhancing phishing detection systems and bolstering cybersecurity measures against evolving phishing tactics,offering a promising avenue for safeguarding sensitive information and online security.展开更多
Recently,automotive intrusion detection systems(IDSs)have emerged as promising defense approaches to counter attacks on in-vehicle networks(IVNs).However,the effectiveness of IDSs relies heavily on the quality of the ...Recently,automotive intrusion detection systems(IDSs)have emerged as promising defense approaches to counter attacks on in-vehicle networks(IVNs).However,the effectiveness of IDSs relies heavily on the quality of the datasets used for training and evaluation.Despite the availability of several datasets for automotive IDSs,there has been a lack of comprehensive analysis focusing on assessing these datasets.This paper aims to address the need for dataset assessment in the context of automotive IDSs.It proposes qualitative and quantitative metrics that are independent of specific automotive IDSs,to evaluate the quality of datasets.These metrics take into consideration various aspects such as dataset description,collection environment,and attack complexity.This paper evaluates eight commonly used datasets for automotive IDSs using the proposed metrics.The evaluation reveals biases in the datasets,particularly in terms of limited contexts and lack of diversity.Additionally,it highlights that the attacks in the datasets were mostly injected without considering normal behaviors,which poses challenges for training and evaluating machine learning-based IDSs.This paper emphasizes the importance of addressing the identified limitations in existing datasets to improve the performance and adaptability of automotive IDSs.The proposed metrics can serve as valuable guidelines for researchers and practitioners in selecting and constructing high-quality datasets for automotive security applications.Finally,this paper presents the requirements for high-quality datasets,including the need for representativeness,diversity,and balance.展开更多
The CRA-Interim trial production of the global atmospheric reanalysis for 10 years from 2007 to 2016 was carried out by the China Meteorological Administration in 2017. The structural characteristics of the horizontal...The CRA-Interim trial production of the global atmospheric reanalysis for 10 years from 2007 to 2016 was carried out by the China Meteorological Administration in 2017. The structural characteristics of the horizontal shear line over the Tibetan Plateau (TPHSL) based on the CRA-Interim datasets are examined by objectively identifying the shear line, and are compared with the analysis results of the European Centre for Medium-Range Weather Forecasts reanalysis data (ERA-Interim). The case occurred at 18UTC on July 5, 2016. The results show that both of the ERA-Interim and CRA-Interim datasets can well reveal the circulation background and the dynamic and thermal structure characteristics of TPHSL, and they have shown some similar features. The middle and high latitudes at 500 hPa are characterized by the circulation situation of"two troughs and two ridges", and at 200 hPa, the TPHSL is located in the northeast quadrant of the South Asian High Pressure (SAHP). The TPHSL locates in the positive vorticity zone and passes through the positive vorticity center corresponding to the ascending motion. Near the TPHSL, the contours of pseudo-equivalent potential temperature (θse) tend to be intensive, with a high-value center on the south side of the TPHSL. The TPHSL can extend to460 hPa and vertically inclines northward. There is a positive vorticity zone near the TPHSL which is also characterized by the northward inclination with the height, the ascending motion near the TPHSL can extend to 300 hPa, and the atmospheric layer above the TPHSL is stable. However, the intensities of the TPHSL’s structure characteristics analyzed with the two datasets are different, revealing the relatively strong intensity of geopotential height field, vertical velocity field, vorticity field and divergence field from the CRA-Interim datasets. In addition, the vertical profiles of the dynamic and water vapor thermal physical quantities of the two datasets are also consistent in the east and west part of the TPHSL. In summary, the reliable and usable CRA-Interim datasets show excellent properties in the analysis on the structural characteristics of a horizontal shear line over the Tibetan Plateau.展开更多
Based on C-LSAT2.0,using high-and low-frequency components reconstruction methods,combined with observation constraint masking,a reconstructed C-LSAT2.0 with 756 ensemble members from the 1850s to 2018 has been develo...Based on C-LSAT2.0,using high-and low-frequency components reconstruction methods,combined with observation constraint masking,a reconstructed C-LSAT2.0 with 756 ensemble members from the 1850s to 2018 has been developed.These ensemble versions have been merged with the ERSSTv5 ensemble dataset,and an upgraded version of the CMSTInterim dataset with 5°×5°resolution has been developed.The CMST-Interim dataset has significantly improved the coverage rate of global surface temperature data.After reconstruction,the data coverage before 1950 increased from 78%−81%of the original CMST to 81%−89%.The total coverage after 1955 reached about 93%,including more than 98%in the Northern Hemisphere and 81%−89%in the Southern Hemisphere.Through the reconstruction ensemble experiments with different parameters,a good basis is provided for more systematic uncertainty assessment of C-LSAT2.0 and CMSTInterim.In comparison with the original CMST,the global mean surface temperatures are estimated to be cooler in the second half of 19th century and warmer during the 21st century,which shows that the global warming trend is further amplified.The global warming trends are updated from 0.085±0.004℃(10 yr)^(–1)and 0.128±0.006℃(10 yr)^(–1)to 0.089±0.004℃(10 yr)^(–1)and 0.137±0.007℃(10 yr)^(–1),respectively,since the start and the second half of 20th century.展开更多
Comparisons of the west Pacific subtropical high with the South Asia High are made using the NCEP/NCAR and ECMWF 500 hPa and 100 hPa monthly boreal geopotential height fields for the period 1961-2000. Discrepancies ar...Comparisons of the west Pacific subtropical high with the South Asia High are made using the NCEP/NCAR and ECMWF 500 hPa and 100 hPa monthly boreal geopotential height fields for the period 1961-2000. Discrepancies are found for the time prior to 1980. The west Pacific subtropical high in the NCEP/NCAR data is less intense than in ECMWF data before 1980. The range and strength of the west Pacific subtropical high variation described by the NCEP/NCAR data are larger than those depicted by ECMWF data. The same situation appears in the 100-hPa geopotential field. These discoveries suggest that the interdecadal variation of the two systems as shown by the NCEP/NCAR data may not be true. Besides, the South Asia High center in the NCEP/NCAR data is obviously stronger than in the ECMWF data during the periods 1969, 1979-1991 and 1992-1995. Furthermore, the range is larger from 1992 to 1995.展开更多
In this study, the upper ocean heat content (OHC) variations in the South China Sea (SCS) during 1993- 2006 were investigated by examining ocean temperatures in seven datasets, including World Ocean Atlas 2009 (W...In this study, the upper ocean heat content (OHC) variations in the South China Sea (SCS) during 1993- 2006 were investigated by examining ocean temperatures in seven datasets, including World Ocean Atlas 2009 (WOA09) (climatology), Ishii datasets, Ocean General Circulation ModeI for the Earth Simulator (OFES), Simple Ocean Data Assimilation system (SODA), Global Ocean Data Assimilation System (GODAS), China Oceanic ReAnalysis system (CORA) , and an ocean reanalysis dataset for the joining area of Asia and Indian-Pacific Ocean (AIPO1.0). Among these datasets, two were independent of any numerical model, four relied on data assimilation, and one was generated without any data assimilation. The annual cycles revealed by the seven datasets were similar, but the interannual variations were different. Vertical structures of temperatures along the 18~N, 12.75~N, and 120~E sections were compared with data collected during open cruises in 1998 and 2005-08. The results indicated that Ishii, OFES, CORA, and AIPO1.0 were more consistent with the observations. Through systematic shortcomings and advantages in presenting the upper comparisons, we found that each dataset had its own OHC in the SCS.展开更多
This is a study to compare three selected tropical cyclone datasets separately compiled by CMA Shanghai Typhoon Institute (CMA SHI), the Joint Typhoon Warning Center (JTWC), and the Japan Meteorological Agency (...This is a study to compare three selected tropical cyclone datasets separately compiled by CMA Shanghai Typhoon Institute (CMA SHI), the Joint Typhoon Warning Center (JTWC), and the Japan Meteorological Agency (JMA). The annual fi'equencies, observation times and destructive power index as the characteristic quantities are investigated of the tropical cyclones over the western North Pacific. The comparative study has resulted in the following findings: 1) Statistical gaps between the datasets compared are narrowing down as the intensity of tropical cyclones increases. 2) In the context of interdecadal distribution, there is for the 1950s a relatively large gap between the datasets, as compared with a narrowed gap for the period from the mid 1970s to the 1980s, and a recurring widened gap for the mid and late 1990s. Additionally, an approach is proposed in the paper to correct the wind speed data in the TC Yearbook.展开更多
Version 4(v4) of the Extended Reconstructed Sea Surface Temperature(ERSST) dataset is compared with its precedent, the widely used version 3b(v3b). The essential upgrades applied to v4 lead to remarkable differences i...Version 4(v4) of the Extended Reconstructed Sea Surface Temperature(ERSST) dataset is compared with its precedent, the widely used version 3b(v3b). The essential upgrades applied to v4 lead to remarkable differences in the characteristics of the sea surface temperature(SST) anomaly(SSTa) in both the temporal and spatial domains. First, the largest discrepancy of the global mean SSTa values around the 1940 s is due to ship-observation corrections made to reconcile observations from buckets and engine intake thermometers. Second, differences in global and regional mean SSTa values between v4 and v3b exhibit a downward trend(around-0.032℃ per decade) before the 1940s, an upward trend(around 0.014℃ per decade) during the period of 1950–2015, interdecadal oscillation with one peak around the 1980s, and two troughs during the 1960s and 2000s, respectively. This does not derive from treatments of the polar or the other data-void regions, since the difference of the SSTa does not share the common features. Third, the spatial pattern of the ENSO-related variability of v4 exhibits a wider but weaker cold tongue in the tropical region of the Pacific Ocean compared with that of v3b, which could be attributed to differences in gap-filling assumptions since the latter features satellite observations whereas the former features in situ ones. This intercomparison confirms that the structural uncertainty arising from underlying assumptions on the treatment of diverse SST observations even in the same SST product family is the main source of significant SST differences in the temporal domain. Why this uncertainty introduces artificial decadal oscillations remains unknown.展开更多
Anomaly based approaches in network intrusion detection suffer from evaluation, comparison and deployment which originate from the scarcity of adequate publicly available network trace datasets. Also, publicly availab...Anomaly based approaches in network intrusion detection suffer from evaluation, comparison and deployment which originate from the scarcity of adequate publicly available network trace datasets. Also, publicly available datasets are either outdated or generated in a controlled environment. Due to the ubiquity of cloud computing environments in commercial and government internet services, there is a need to assess the impacts of network attacks in cloud data centers. To the best of our knowledge, there is no publicly available dataset which captures the normal and anomalous network traces in the interactions between cloud users and cloud data centers. In this paper, we present an experimental platform designed to represent a practical interaction between cloud users and cloud services and collect network traces resulting from this interaction to conduct anomaly detection. We use Amazon web services (AWS) platform for conducting our experiments.展开更多
In the past few decades,meteorological datasets from remote sensing techniques in agricultural and water resources management have been used by various researchers and managers.Based on the literature,meteorological d...In the past few decades,meteorological datasets from remote sensing techniques in agricultural and water resources management have been used by various researchers and managers.Based on the literature,meteorological datasets are not more accurate than synoptic stations,but their various advantages,such as spatial coverage,time coverage,accessibility,and free use,have made these techniques superior,and sometimes we can use them instead of synoptic stations.In this study,we used four meteorological datasets,including Climatic Research Unit gridded Time Series(CRU TS),Global Precipitation Climatology Centre(GPCC),Agricultural National Aeronautics and Space Administration Modern-Era Retrospective Analysis for Research and Applications(AgMERRA),Agricultural Climate Forecast System Reanalysis(AgCFSR),to estimate climate variables,i.e.,precipitation,maximum temperature,and minimum temperature,and crop variables,i.e.,reference evapotranspiration,irrigation requirement,biomass,and yield of maize,in Qazvin Province of Iran during 1980-2009.At first,data were gathered from the four meteorological datasets and synoptic station in this province,and climate variables were calculated.Then,after using the AquaCrop model to calculate the crop variables,we compared the results of the synoptic station and meteorological datasets.All the four meteorological datasets showed strong performance for estimating climate variables.AgMERRA and AgCFSR had more accurate estimations for precipitation and maximum temperature.However,their normalized root mean square error was inferior to CRU for minimum temperature.Furthermore,they were all very efficient for estimating the biomass and yield of maize in this province.For reference evapotranspiration and irrigation requirement CRU TS and GPCC were the most efficient rather than AgMERRA and AgCFSR.But for the estimation of biomass and yield,all the four meteorological datasets were reliable.To sum up,GPCC and AgCFSR were the two best datasets in this study.This study suggests the use of meteorological datasets in water resource management and agricultural management to monitor past changes and estimate recent trends.展开更多
In this study,through experimental research and an investigation on large datasets of the durability parameters in ocean engineering,the values,ranges,and types of distribution of the durability parameters employed fo...In this study,through experimental research and an investigation on large datasets of the durability parameters in ocean engineering,the values,ranges,and types of distribution of the durability parameters employed for the durability design in ocean engineering in northern China were confirmed.Based on a modified theoretical model of chloride diffusion and the reliability theory,the service lives of concrete structures exposed to the splash,tidal,and underwater zones were calculated.Mixed concrete proportions meeting the requirement of a service life of 100 or 120 years were designed,and a cover thickness requirement was proposed.In addition,the effects of the different time-varying relationships of the boundary condition(Cs)and diffusion coefficient(Df)on the service life were compared;the results showed that the time-varying relationships used in this study(i.e.,Cscontinuously increased and then remained stable,and Dfcontinuously decreased and then remained stable)were beneficial for the durability design of concrete structures in marine environment.展开更多
The differences in the climatology of extratropical transition(ET) of western North Pacific tropical cyclones(TCs) were investigated in this study using the TCs best-track datasets of China Meteorological Administrati...The differences in the climatology of extratropical transition(ET) of western North Pacific tropical cyclones(TCs) were investigated in this study using the TCs best-track datasets of China Meteorological Administration(CMA),Japan Meteorological Agency(JMA) and the Joint Typhoon Warning Center(JTWC). The results show that the ET identification, ET completion time, and post-ET duration reported in the JTWC dataset are greatly different from those in CMA and JMA datasets during 2004-2010. However, the key differences between the CMA and JMA datasets from 1951 to 2010 are the ET identification and the post-ET duration, because of inconsistent objective ET criteria used in the centers. Further analysis indicates that annual ET percentage of CMA was lower than that of JMA, and exhibited an interannual decreasing trend, while that of JMA was an unchanged trend. The western North Pacific ET events occurred mainly during the period June to November. The latitude of ET occurrence shifted northward from February to August,followed by a southward shift. Most of ET events were observed between 35°N and 45°N. From a regional perspective,TCs tended to undergo ET in Japan and the ocean east to it. It is found that TCs which experienced the ET process at higher latitudes were generally more intense at the ET completion time. TCs completing the ET overland or offshore were weaker than those finishing the ET over the ocean. Most of the TCs weakened 24 h before the completion of ET.In contrast, 21%(27%) of the TCs showed an intensification process based on the CMA(JMA) dataset during the post-ET period. The results presented in this study indicate that consistent ET determination criteria are needed to reduce the uncertainty involved in ET identification among the centers.展开更多
Prediction of tunneling-induced ground settlements is an essential task,particularly for tunneling in urban settings.Ground settlements should be limited within a tolerable threshold to avoid damages to aboveground st...Prediction of tunneling-induced ground settlements is an essential task,particularly for tunneling in urban settings.Ground settlements should be limited within a tolerable threshold to avoid damages to aboveground structures.Machine learning(ML)methods are becoming popular in many fields,including tunneling and underground excavations,as a powerful learning and predicting technique.However,the available datasets collected from a tunneling project are usually small from the perspective of applying ML methods.Can ML algorithms effectively predict tunneling-induced ground settlements when the available datasets are small?In this study,seven ML methods are utilized to predict tunneling-induced ground settlement using 14 contributing factors measured before or during tunnel excavation.These methods include multiple linear regression(MLR),decision tree(DT),random forest(RF),gradient boosting(GB),support vector regression(SVR),back-propagation neural network(BPNN),and permutation importancebased BPNN(PI-BPNN)models.All methods except BPNN and PI-BPNN are shallow-structure ML methods.The effectiveness of these seven ML approaches on small datasets is evaluated using model accuracy and stability.The model accuracy is measured by the coefficient of determination(R2)of training and testing datasets,and the stability of a learning algorithm indicates robust predictive performance.Also,the quantile error(QE)criterion is introduced to assess model predictive performance considering underpredictions and overpredictions.Our study reveals that the RF algorithm outperforms all the other models with the highest model prediction accuracy(0.9)and stability(3.0210^(-27)).Deep-structure ML models do not perform well for small datasets with relatively low model accuracy(0.59)and stability(5.76).The PI-BPNN architecture is proposed and designed for small datasets,showing better performance than typical BPNN.Six important contributing factors of ground settlements are identified,including tunnel depth,the distance between tunnel face and surface monitoring points(DTM),weighted average soil compressibility modulus(ACM),grouting pressure,penetrating rate and thrust force.展开更多
Data-driven algorithms for predicting mechanical properties with small datasets are evaluated in a case study on gear steel hardenability.The limitations of current data-driven algorithms and empirical models are iden...Data-driven algorithms for predicting mechanical properties with small datasets are evaluated in a case study on gear steel hardenability.The limitations of current data-driven algorithms and empirical models are identified.Challenges in analysing small datasets are discussed,and solution is proposed to handle small datasets with multiple variables.Gaussian methods in combination with novel predictive algorithms are utilized to overcome the challenges in analysing gear steel hardenability data and to gain insight into alloying elements interaction and structure homogeneity.The gained fundamental knowledge integrated with machine learning is shown to be superior to the empirical equations in predicting hardenability.Metallurgical-property relationships between chemistry,sample size,and hardness are predicted via two optimized machine learning algorithms:neural networks(NNs)and extreme gradient boosting(XGboost).A comparison is drawn between all algorithms,evaluating their performance based on small data sets.The results reveal that XGboost has the highest potential for predicting hardenability using small datasets with class imbalance and large inhomogeneity issues.展开更多
Total Cloud Cover (TCC) over China deter- mined from four climate datasets including the Interna- tional Satellite Cloud Climatology Project (ISCCP), the 40-year Re-Analysis Project of the European Centre for Medi...Total Cloud Cover (TCC) over China deter- mined from four climate datasets including the Interna- tional Satellite Cloud Climatology Project (ISCCP), the 40-year Re-Analysis Project of the European Centre for Medium-Range Weather Forecasts (ERA-40), Climate Research Unit Time Series 3.0 (CRU3), and ground sta- tion datasets are used to show spatial and temporal varia- tion of TCC and their differences. It is demonstrated that the four datasets show similar spatial pattern and seasonal variation. The maximum value is derived from ISCCE TCC value in North China derived from ERA-40 is 50% larger than that from the station dataset; however, the value is 50% less than that in South China. The annual TCC of ISCCP, ERA-40, and ground station datasets shows a decreasing trend during 1984-2002; however, an increasing trend is derived from CRU3. The results of this study imply remarkable differences of TCC derived from surface and satellite observations as well as model simu- lations. The potential effects of these differences on cloud climatology and associated climatic issues should be carefully considered.展开更多
In recent years,long-term continuous sea-ice datasets have been developed,and they cover the periods before and after the satellite era.How these datasets differ from one another before the satellite era,and whether o...In recent years,long-term continuous sea-ice datasets have been developed,and they cover the periods before and after the satellite era.How these datasets differ from one another before the satellite era,and whether one is more reliable than the other,is important but unclear because the sea-ice record before 1979 is sparse and not continuous.In this letter,two sets of sea-ice datasets are evaluated:one is the HadISST1 dataset from the Hadley Centre,and the other is the SIBT1850(Gridded Monthly Sea Ice Extent and Concentration,from 1850 Onward)dataset from the National Snow and Ice Data Center(NSIDC).In view of its substantial importance for climate,the winter sea ice in the Barents and Kara seas(BKS)is of particular focus.A reconstructed BKS sea-ice extent(SIE)is developed using linear regression from the mean of observed surface air temperature at two adjacent islands,Novaya Zemlya and Franz Josef Land(proxy).One validation illustrates that the proxy is substantially coherent with the BKS sea-ice anomaly in the observations and the CMIP5(phase 5 of the Coupled Model Intercomparison Project)historical experiments.This result indicates that the proxy is reasonable.Therefore,the establishment of the reconstructed BKS SIE is also reasonable.The evaluation results based on the proxy suggest that the sea-ice concentration prior to the satellite era in the NSIDC dataset is more realistic and reliable than that in the Hadley Centre dataset,and thus is more appropriate for use.展开更多
Hydro-climatological study is difficult in most of the developing countries due to the paucity of monitoring stations. Gridded climatological data provides an opportunity to extrapolate climate to areas without monito...Hydro-climatological study is difficult in most of the developing countries due to the paucity of monitoring stations. Gridded climatological data provides an opportunity to extrapolate climate to areas without monitoring stations based on their ability to replicate the Spatio-temporal distribution and variability of observed datasets. Simple correlation and error analyses are not enough to predict the variability and distribution of precipitation and temperature. In this study, the coefficient of correlation (R2), Root mean square error (RMSE), mean bias error (MBE) and mean wet and dry spell lengths were used to evaluate the performance of three widely used daily gridded precipitation, maximum and minimum temperature datasets from the Climatic Research Unit (CRU), Princeton University Global Meteorological Forcing (PGF) and Climate Forecast System Reanalysis (CFSR) datasets available over the Niger Delta part of Nigeria. The Standardised Precipitation Index was used to assess the confidence of using gridded precipitation products on water resource management. Results of correlation, error, and spell length analysis revealed that the CRU and PGF datasets performed much better than the CFSR datasets. SPI values also indicate a good association between station and CRU precipitation products. The CFSR datasets in comparison with the other data products in many years overestimated and underestimated the SPI. This indicates weak accuracy in predictability, hence not reliable for water resource management in the study area. However, CRU data products were found to perform much better in most of the statistical assessments conducted. This makes the methods used in this study to be useful for the assessment of various gridded datasets in various hydrological and climatic applications.展开更多
基金the Natural Science Foundation of China(Grant Numbers 72074014 and 72004012).
文摘Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on the benchmark datasets have been proposed for multi-label classification task in the literature.Furthermore,several open-source tools implementing these approaches have also been developed.However,the characteristics of real-world multi-label patent and publication datasets are not completely in line with those of benchmark ones.Therefore,the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.Research limitations:Three real-world datasets differ in the following aspects:statement,data quality,and purposes.Additionally,open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection,which in turn impacts the performance of a multi-label classification approach.In the near future,we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.Practical implications:The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets,underscoring the complexity of real-world multi-label classification tasks.Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels.With ongoing enhancements in deep learning algorithms and large-scale models,it is expected that the efficacy of multi-label classification tasks will be significantly improved,reaching a level of practical utility in the foreseeable future.Originality/value:(1)Seven multi-label classification methods are comprehensively compared on three real-world datasets.(2)The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution.(3)The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution.
文摘This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus.
基金National Natural Science Foundation of China(No.61971036)Fundamental Research Funds for the Central Universities(No.2023CX01011)Beijing Nova Program(No.20230484361)。
文摘This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which is more realistic than synthetical datasets.In this paper,datasets containing different shapes are constructed based on the relative permittivities of human tissues.Then,a back-propagation scheme is used to obtain the rough reconstructions,which will be fed into a U-net convolutional neural network(CNN)to recover the high-resolution images.Numerical results show that the network trained on the datasets generated by the proposed method can obtain satisfying reconstruction results and is promising to be applied in real-time biomedical imaging.
文摘Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phishing attacks on websites and assesses the performance of three prominent Machine Learning(ML)models—Artificial Neural Networks(ANN),Convolutional Neural Networks(CNN),and Long Short-Term Memory(LSTM)—utilizing authentic datasets sourced from Kaggle and Mendeley repositories.Extensive experimentation and analysis reveal that the CNN model achieves a better accuracy of 98%.On the other hand,LSTM shows the lowest accuracy of 96%.These findings underscore the potential of ML techniques in enhancing phishing detection systems and bolstering cybersecurity measures against evolving phishing tactics,offering a promising avenue for safeguarding sensitive information and online security.
基金supported in part by the 2021 Autonomous Driving Development Innovation Project of the Ministry of Science and ICT,‘Development of Technology for Security and Ultra-High-Speed Integrity of the Next-Generation Internal Net-Work of Autonomous Vehicles’(No.2021-0-01348)and in part by the National Research Foundation of Korea(NRF)grant funded by the Korean Government Ministry of Science and ICT(MSIT)under Grant NRF-2021R1A2C2014428.
文摘Recently,automotive intrusion detection systems(IDSs)have emerged as promising defense approaches to counter attacks on in-vehicle networks(IVNs).However,the effectiveness of IDSs relies heavily on the quality of the datasets used for training and evaluation.Despite the availability of several datasets for automotive IDSs,there has been a lack of comprehensive analysis focusing on assessing these datasets.This paper aims to address the need for dataset assessment in the context of automotive IDSs.It proposes qualitative and quantitative metrics that are independent of specific automotive IDSs,to evaluate the quality of datasets.These metrics take into consideration various aspects such as dataset description,collection environment,and attack complexity.This paper evaluates eight commonly used datasets for automotive IDSs using the proposed metrics.The evaluation reveals biases in the datasets,particularly in terms of limited contexts and lack of diversity.Additionally,it highlights that the attacks in the datasets were mostly injected without considering normal behaviors,which poses challenges for training and evaluating machine learning-based IDSs.This paper emphasizes the importance of addressing the identified limitations in existing datasets to improve the performance and adaptability of automotive IDSs.The proposed metrics can serve as valuable guidelines for researchers and practitioners in selecting and constructing high-quality datasets for automotive security applications.Finally,this paper presents the requirements for high-quality datasets,including the need for representativeness,diversity,and balance.
基金National Science Foundation of China (42030611,91937301)The Second Tibetan Plateau Scientific Expedition and Research (STEP) Program (2019QZKK0105)。
文摘The CRA-Interim trial production of the global atmospheric reanalysis for 10 years from 2007 to 2016 was carried out by the China Meteorological Administration in 2017. The structural characteristics of the horizontal shear line over the Tibetan Plateau (TPHSL) based on the CRA-Interim datasets are examined by objectively identifying the shear line, and are compared with the analysis results of the European Centre for Medium-Range Weather Forecasts reanalysis data (ERA-Interim). The case occurred at 18UTC on July 5, 2016. The results show that both of the ERA-Interim and CRA-Interim datasets can well reveal the circulation background and the dynamic and thermal structure characteristics of TPHSL, and they have shown some similar features. The middle and high latitudes at 500 hPa are characterized by the circulation situation of"two troughs and two ridges", and at 200 hPa, the TPHSL is located in the northeast quadrant of the South Asian High Pressure (SAHP). The TPHSL locates in the positive vorticity zone and passes through the positive vorticity center corresponding to the ascending motion. Near the TPHSL, the contours of pseudo-equivalent potential temperature (θse) tend to be intensive, with a high-value center on the south side of the TPHSL. The TPHSL can extend to460 hPa and vertically inclines northward. There is a positive vorticity zone near the TPHSL which is also characterized by the northward inclination with the height, the ascending motion near the TPHSL can extend to 300 hPa, and the atmospheric layer above the TPHSL is stable. However, the intensities of the TPHSL’s structure characteristics analyzed with the two datasets are different, revealing the relatively strong intensity of geopotential height field, vertical velocity field, vorticity field and divergence field from the CRA-Interim datasets. In addition, the vertical profiles of the dynamic and water vapor thermal physical quantities of the two datasets are also consistent in the east and west part of the TPHSL. In summary, the reliable and usable CRA-Interim datasets show excellent properties in the analysis on the structural characteristics of a horizontal shear line over the Tibetan Plateau.
文摘Based on C-LSAT2.0,using high-and low-frequency components reconstruction methods,combined with observation constraint masking,a reconstructed C-LSAT2.0 with 756 ensemble members from the 1850s to 2018 has been developed.These ensemble versions have been merged with the ERSSTv5 ensemble dataset,and an upgraded version of the CMSTInterim dataset with 5°×5°resolution has been developed.The CMST-Interim dataset has significantly improved the coverage rate of global surface temperature data.After reconstruction,the data coverage before 1950 increased from 78%−81%of the original CMST to 81%−89%.The total coverage after 1955 reached about 93%,including more than 98%in the Northern Hemisphere and 81%−89%in the Southern Hemisphere.Through the reconstruction ensemble experiments with different parameters,a good basis is provided for more systematic uncertainty assessment of C-LSAT2.0 and CMSTInterim.In comparison with the original CMST,the global mean surface temperatures are estimated to be cooler in the second half of 19th century and warmer during the 21st century,which shows that the global warming trend is further amplified.The global warming trends are updated from 0.085±0.004℃(10 yr)^(–1)and 0.128±0.006℃(10 yr)^(–1)to 0.089±0.004℃(10 yr)^(–1)and 0.137±0.007℃(10 yr)^(–1),respectively,since the start and the second half of 20th century.
基金Key Laboratory on Natural Disasters for Jiangsu Province (KLME050210)
文摘Comparisons of the west Pacific subtropical high with the South Asia High are made using the NCEP/NCAR and ECMWF 500 hPa and 100 hPa monthly boreal geopotential height fields for the period 1961-2000. Discrepancies are found for the time prior to 1980. The west Pacific subtropical high in the NCEP/NCAR data is less intense than in ECMWF data before 1980. The range and strength of the west Pacific subtropical high variation described by the NCEP/NCAR data are larger than those depicted by ECMWF data. The same situation appears in the 100-hPa geopotential field. These discoveries suggest that the interdecadal variation of the two systems as shown by the NCEP/NCAR data may not be true. Besides, the South Asia High center in the NCEP/NCAR data is obviously stronger than in the ECMWF data during the periods 1969, 1979-1991 and 1992-1995. Furthermore, the range is larger from 1992 to 1995.
基金supported by the National Basic Research Program of China (Grant Nos. 2010CB950400 and 2013CB430301)the National Natural Science Foundation of China (Grant Nos. 41276025 and 41176023)+2 种基金the R&D Special Fund for Public Welfare Industry (Meteorology) (Grant No. GYHY201106036)The OFES simulation was conducted on the Earth Simulator under the support of JAMSTECsupported by the Data Sharing Infrastructure of Earth System Science-Data Sharing Service Center of the South China Sea and adjacent regions
文摘In this study, the upper ocean heat content (OHC) variations in the South China Sea (SCS) during 1993- 2006 were investigated by examining ocean temperatures in seven datasets, including World Ocean Atlas 2009 (WOA09) (climatology), Ishii datasets, Ocean General Circulation ModeI for the Earth Simulator (OFES), Simple Ocean Data Assimilation system (SODA), Global Ocean Data Assimilation System (GODAS), China Oceanic ReAnalysis system (CORA) , and an ocean reanalysis dataset for the joining area of Asia and Indian-Pacific Ocean (AIPO1.0). Among these datasets, two were independent of any numerical model, four relied on data assimilation, and one was generated without any data assimilation. The annual cycles revealed by the seven datasets were similar, but the interannual variations were different. Vertical structures of temperatures along the 18~N, 12.75~N, and 120~E sections were compared with data collected during open cruises in 1998 and 2005-08. The results indicated that Ishii, OFES, CORA, and AIPO1.0 were more consistent with the observations. Through systematic shortcomings and advantages in presenting the upper comparisons, we found that each dataset had its own OHC in the SCS.
基金Monitoring and Detection of Upperair Climate Change in China (GYHY200906014)
文摘This is a study to compare three selected tropical cyclone datasets separately compiled by CMA Shanghai Typhoon Institute (CMA SHI), the Joint Typhoon Warning Center (JTWC), and the Japan Meteorological Agency (JMA). The annual fi'equencies, observation times and destructive power index as the characteristic quantities are investigated of the tropical cyclones over the western North Pacific. The comparative study has resulted in the following findings: 1) Statistical gaps between the datasets compared are narrowing down as the intensity of tropical cyclones increases. 2) In the context of interdecadal distribution, there is for the 1950s a relatively large gap between the datasets, as compared with a narrowed gap for the period from the mid 1970s to the 1980s, and a recurring widened gap for the mid and late 1990s. Additionally, an approach is proposed in the paper to correct the wind speed data in the TC Yearbook.
基金supported by the National Key Basic Research and Development Plan (No.2015CB953900)the Natural Science Foundation of China (Nos.41330960 and 41776032)
文摘Version 4(v4) of the Extended Reconstructed Sea Surface Temperature(ERSST) dataset is compared with its precedent, the widely used version 3b(v3b). The essential upgrades applied to v4 lead to remarkable differences in the characteristics of the sea surface temperature(SST) anomaly(SSTa) in both the temporal and spatial domains. First, the largest discrepancy of the global mean SSTa values around the 1940 s is due to ship-observation corrections made to reconcile observations from buckets and engine intake thermometers. Second, differences in global and regional mean SSTa values between v4 and v3b exhibit a downward trend(around-0.032℃ per decade) before the 1940s, an upward trend(around 0.014℃ per decade) during the period of 1950–2015, interdecadal oscillation with one peak around the 1980s, and two troughs during the 1960s and 2000s, respectively. This does not derive from treatments of the polar or the other data-void regions, since the difference of the SSTa does not share the common features. Third, the spatial pattern of the ENSO-related variability of v4 exhibits a wider but weaker cold tongue in the tropical region of the Pacific Ocean compared with that of v3b, which could be attributed to differences in gap-filling assumptions since the latter features satellite observations whereas the former features in situ ones. This intercomparison confirms that the structural uncertainty arising from underlying assumptions on the treatment of diverse SST observations even in the same SST product family is the main source of significant SST differences in the temporal domain. Why this uncertainty introduces artificial decadal oscillations remains unknown.
文摘Anomaly based approaches in network intrusion detection suffer from evaluation, comparison and deployment which originate from the scarcity of adequate publicly available network trace datasets. Also, publicly available datasets are either outdated or generated in a controlled environment. Due to the ubiquity of cloud computing environments in commercial and government internet services, there is a need to assess the impacts of network attacks in cloud data centers. To the best of our knowledge, there is no publicly available dataset which captures the normal and anomalous network traces in the interactions between cloud users and cloud data centers. In this paper, we present an experimental platform designed to represent a practical interaction between cloud users and cloud services and collect network traces resulting from this interaction to conduct anomaly detection. We use Amazon web services (AWS) platform for conducting our experiments.
文摘In the past few decades,meteorological datasets from remote sensing techniques in agricultural and water resources management have been used by various researchers and managers.Based on the literature,meteorological datasets are not more accurate than synoptic stations,but their various advantages,such as spatial coverage,time coverage,accessibility,and free use,have made these techniques superior,and sometimes we can use them instead of synoptic stations.In this study,we used four meteorological datasets,including Climatic Research Unit gridded Time Series(CRU TS),Global Precipitation Climatology Centre(GPCC),Agricultural National Aeronautics and Space Administration Modern-Era Retrospective Analysis for Research and Applications(AgMERRA),Agricultural Climate Forecast System Reanalysis(AgCFSR),to estimate climate variables,i.e.,precipitation,maximum temperature,and minimum temperature,and crop variables,i.e.,reference evapotranspiration,irrigation requirement,biomass,and yield of maize,in Qazvin Province of Iran during 1980-2009.At first,data were gathered from the four meteorological datasets and synoptic station in this province,and climate variables were calculated.Then,after using the AquaCrop model to calculate the crop variables,we compared the results of the synoptic station and meteorological datasets.All the four meteorological datasets showed strong performance for estimating climate variables.AgMERRA and AgCFSR had more accurate estimations for precipitation and maximum temperature.However,their normalized root mean square error was inferior to CRU for minimum temperature.Furthermore,they were all very efficient for estimating the biomass and yield of maize in this province.For reference evapotranspiration and irrigation requirement CRU TS and GPCC were the most efficient rather than AgMERRA and AgCFSR.But for the estimation of biomass and yield,all the four meteorological datasets were reliable.To sum up,GPCC and AgCFSR were the two best datasets in this study.This study suggests the use of meteorological datasets in water resource management and agricultural management to monitor past changes and estimate recent trends.
基金financial support provided by the National Natural Science Foundation of China(51508272,11832013,51878350,and 51678304)。
文摘In this study,through experimental research and an investigation on large datasets of the durability parameters in ocean engineering,the values,ranges,and types of distribution of the durability parameters employed for the durability design in ocean engineering in northern China were confirmed.Based on a modified theoretical model of chloride diffusion and the reliability theory,the service lives of concrete structures exposed to the splash,tidal,and underwater zones were calculated.Mixed concrete proportions meeting the requirement of a service life of 100 or 120 years were designed,and a cover thickness requirement was proposed.In addition,the effects of the different time-varying relationships of the boundary condition(Cs)and diffusion coefficient(Df)on the service life were compared;the results showed that the time-varying relationships used in this study(i.e.,Cscontinuously increased and then remained stable,and Dfcontinuously decreased and then remained stable)were beneficial for the durability design of concrete structures in marine environment.
基金National Natural Science Foundation of China(41465003)National Natural Science Foundation of China(41665006)China National Special Funding Project for Meteorology(GYHY201406010,GYHY201306071)
文摘The differences in the climatology of extratropical transition(ET) of western North Pacific tropical cyclones(TCs) were investigated in this study using the TCs best-track datasets of China Meteorological Administration(CMA),Japan Meteorological Agency(JMA) and the Joint Typhoon Warning Center(JTWC). The results show that the ET identification, ET completion time, and post-ET duration reported in the JTWC dataset are greatly different from those in CMA and JMA datasets during 2004-2010. However, the key differences between the CMA and JMA datasets from 1951 to 2010 are the ET identification and the post-ET duration, because of inconsistent objective ET criteria used in the centers. Further analysis indicates that annual ET percentage of CMA was lower than that of JMA, and exhibited an interannual decreasing trend, while that of JMA was an unchanged trend. The western North Pacific ET events occurred mainly during the period June to November. The latitude of ET occurrence shifted northward from February to August,followed by a southward shift. Most of ET events were observed between 35°N and 45°N. From a regional perspective,TCs tended to undergo ET in Japan and the ocean east to it. It is found that TCs which experienced the ET process at higher latitudes were generally more intense at the ET completion time. TCs completing the ET overland or offshore were weaker than those finishing the ET over the ocean. Most of the TCs weakened 24 h before the completion of ET.In contrast, 21%(27%) of the TCs showed an intensification process based on the CMA(JMA) dataset during the post-ET period. The results presented in this study indicate that consistent ET determination criteria are needed to reduce the uncertainty involved in ET identification among the centers.
基金funded by the University Transportation Center for Underground Transportation Infrastructure(UTC-UTI)at the Colorado School of Mines under Grant No.69A3551747118 from the US Department of Transportation(DOT).
文摘Prediction of tunneling-induced ground settlements is an essential task,particularly for tunneling in urban settings.Ground settlements should be limited within a tolerable threshold to avoid damages to aboveground structures.Machine learning(ML)methods are becoming popular in many fields,including tunneling and underground excavations,as a powerful learning and predicting technique.However,the available datasets collected from a tunneling project are usually small from the perspective of applying ML methods.Can ML algorithms effectively predict tunneling-induced ground settlements when the available datasets are small?In this study,seven ML methods are utilized to predict tunneling-induced ground settlement using 14 contributing factors measured before or during tunnel excavation.These methods include multiple linear regression(MLR),decision tree(DT),random forest(RF),gradient boosting(GB),support vector regression(SVR),back-propagation neural network(BPNN),and permutation importancebased BPNN(PI-BPNN)models.All methods except BPNN and PI-BPNN are shallow-structure ML methods.The effectiveness of these seven ML approaches on small datasets is evaluated using model accuracy and stability.The model accuracy is measured by the coefficient of determination(R2)of training and testing datasets,and the stability of a learning algorithm indicates robust predictive performance.Also,the quantile error(QE)criterion is introduced to assess model predictive performance considering underpredictions and overpredictions.Our study reveals that the RF algorithm outperforms all the other models with the highest model prediction accuracy(0.9)and stability(3.0210^(-27)).Deep-structure ML models do not perform well for small datasets with relatively low model accuracy(0.59)and stability(5.76).The PI-BPNN architecture is proposed and designed for small datasets,showing better performance than typical BPNN.Six important contributing factors of ground settlements are identified,including tunnel depth,the distance between tunnel face and surface monitoring points(DTM),weighted average soil compressibility modulus(ACM),grouting pressure,penetrating rate and thrust force.
文摘Data-driven algorithms for predicting mechanical properties with small datasets are evaluated in a case study on gear steel hardenability.The limitations of current data-driven algorithms and empirical models are identified.Challenges in analysing small datasets are discussed,and solution is proposed to handle small datasets with multiple variables.Gaussian methods in combination with novel predictive algorithms are utilized to overcome the challenges in analysing gear steel hardenability data and to gain insight into alloying elements interaction and structure homogeneity.The gained fundamental knowledge integrated with machine learning is shown to be superior to the empirical equations in predicting hardenability.Metallurgical-property relationships between chemistry,sample size,and hardness are predicted via two optimized machine learning algorithms:neural networks(NNs)and extreme gradient boosting(XGboost).A comparison is drawn between all algorithms,evaluating their performance based on small data sets.The results reveal that XGboost has the highest potential for predicting hardenability using small datasets with class imbalance and large inhomogeneity issues.
基金supported by the "Strategic Priority Research Program" of the Chinese Academy of Sciences(XDA05100300)the National Basic Research Program of China(2013CB955801)the National Natural Science Foundation of China(41175030)
文摘Total Cloud Cover (TCC) over China deter- mined from four climate datasets including the Interna- tional Satellite Cloud Climatology Project (ISCCP), the 40-year Re-Analysis Project of the European Centre for Medium-Range Weather Forecasts (ERA-40), Climate Research Unit Time Series 3.0 (CRU3), and ground sta- tion datasets are used to show spatial and temporal varia- tion of TCC and their differences. It is demonstrated that the four datasets show similar spatial pattern and seasonal variation. The maximum value is derived from ISCCE TCC value in North China derived from ERA-40 is 50% larger than that from the station dataset; however, the value is 50% less than that in South China. The annual TCC of ISCCP, ERA-40, and ground station datasets shows a decreasing trend during 1984-2002; however, an increasing trend is derived from CRU3. The results of this study imply remarkable differences of TCC derived from surface and satellite observations as well as model simu- lations. The potential effects of these differences on cloud climatology and associated climatic issues should be carefully considered.
基金jointly supported by the National Natural Science Foundation of China [grant numbers 41790473 and41421004]the Strategic Priority Research Program of the Chinese Academy of Sciences [grant number XDA19070402]
文摘In recent years,long-term continuous sea-ice datasets have been developed,and they cover the periods before and after the satellite era.How these datasets differ from one another before the satellite era,and whether one is more reliable than the other,is important but unclear because the sea-ice record before 1979 is sparse and not continuous.In this letter,two sets of sea-ice datasets are evaluated:one is the HadISST1 dataset from the Hadley Centre,and the other is the SIBT1850(Gridded Monthly Sea Ice Extent and Concentration,from 1850 Onward)dataset from the National Snow and Ice Data Center(NSIDC).In view of its substantial importance for climate,the winter sea ice in the Barents and Kara seas(BKS)is of particular focus.A reconstructed BKS sea-ice extent(SIE)is developed using linear regression from the mean of observed surface air temperature at two adjacent islands,Novaya Zemlya and Franz Josef Land(proxy).One validation illustrates that the proxy is substantially coherent with the BKS sea-ice anomaly in the observations and the CMIP5(phase 5 of the Coupled Model Intercomparison Project)historical experiments.This result indicates that the proxy is reasonable.Therefore,the establishment of the reconstructed BKS SIE is also reasonable.The evaluation results based on the proxy suggest that the sea-ice concentration prior to the satellite era in the NSIDC dataset is more realistic and reliable than that in the Hadley Centre dataset,and thus is more appropriate for use.
文摘Hydro-climatological study is difficult in most of the developing countries due to the paucity of monitoring stations. Gridded climatological data provides an opportunity to extrapolate climate to areas without monitoring stations based on their ability to replicate the Spatio-temporal distribution and variability of observed datasets. Simple correlation and error analyses are not enough to predict the variability and distribution of precipitation and temperature. In this study, the coefficient of correlation (R2), Root mean square error (RMSE), mean bias error (MBE) and mean wet and dry spell lengths were used to evaluate the performance of three widely used daily gridded precipitation, maximum and minimum temperature datasets from the Climatic Research Unit (CRU), Princeton University Global Meteorological Forcing (PGF) and Climate Forecast System Reanalysis (CFSR) datasets available over the Niger Delta part of Nigeria. The Standardised Precipitation Index was used to assess the confidence of using gridded precipitation products on water resource management. Results of correlation, error, and spell length analysis revealed that the CRU and PGF datasets performed much better than the CFSR datasets. SPI values also indicate a good association between station and CRU precipitation products. The CFSR datasets in comparison with the other data products in many years overestimated and underestimated the SPI. This indicates weak accuracy in predictability, hence not reliable for water resource management in the study area. However, CRU data products were found to perform much better in most of the statistical assessments conducted. This makes the methods used in this study to be useful for the assessment of various gridded datasets in various hydrological and climatic applications.