This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which ...This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which is more realistic than synthetical datasets.In this paper,datasets containing different shapes are constructed based on the relative permittivities of human tissues.Then,a back-propagation scheme is used to obtain the rough reconstructions,which will be fed into a U-net convolutional neural network(CNN)to recover the high-resolution images.Numerical results show that the network trained on the datasets generated by the proposed method can obtain satisfying reconstruction results and is promising to be applied in real-time biomedical imaging.展开更多
Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on t...Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on the benchmark datasets have been proposed for multi-label classification task in the literature.Furthermore,several open-source tools implementing these approaches have also been developed.However,the characteristics of real-world multi-label patent and publication datasets are not completely in line with those of benchmark ones.Therefore,the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.Research limitations:Three real-world datasets differ in the following aspects:statement,data quality,and purposes.Additionally,open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection,which in turn impacts the performance of a multi-label classification approach.In the near future,we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.Practical implications:The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets,underscoring the complexity of real-world multi-label classification tasks.Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels.With ongoing enhancements in deep learning algorithms and large-scale models,it is expected that the efficacy of multi-label classification tasks will be significantly improved,reaching a level of practical utility in the foreseeable future.Originality/value:(1)Seven multi-label classification methods are comprehensively compared on three real-world datasets.(2)The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution.(3)The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution.展开更多
This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to...This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus.展开更多
In this study, the upper ocean heat content (OHC) variations in the South China Sea (SCS) during 1993- 2006 were investigated by examining ocean temperatures in seven datasets, including World Ocean Atlas 2009 (W...In this study, the upper ocean heat content (OHC) variations in the South China Sea (SCS) during 1993- 2006 were investigated by examining ocean temperatures in seven datasets, including World Ocean Atlas 2009 (WOA09) (climatology), Ishii datasets, Ocean General Circulation ModeI for the Earth Simulator (OFES), Simple Ocean Data Assimilation system (SODA), Global Ocean Data Assimilation System (GODAS), China Oceanic ReAnalysis system (CORA) , and an ocean reanalysis dataset for the joining area of Asia and Indian-Pacific Ocean (AIPO1.0). Among these datasets, two were independent of any numerical model, four relied on data assimilation, and one was generated without any data assimilation. The annual cycles revealed by the seven datasets were similar, but the interannual variations were different. Vertical structures of temperatures along the 18~N, 12.75~N, and 120~E sections were compared with data collected during open cruises in 1998 and 2005-08. The results indicated that Ishii, OFES, CORA, and AIPO1.0 were more consistent with the observations. Through systematic shortcomings and advantages in presenting the upper comparisons, we found that each dataset had its own OHC in the SCS.展开更多
Based on C-LSAT2.0,using high-and low-frequency components reconstruction methods,combined with observation constraint masking,a reconstructed C-LSAT2.0 with 756 ensemble members from the 1850s to 2018 has been develo...Based on C-LSAT2.0,using high-and low-frequency components reconstruction methods,combined with observation constraint masking,a reconstructed C-LSAT2.0 with 756 ensemble members from the 1850s to 2018 has been developed.These ensemble versions have been merged with the ERSSTv5 ensemble dataset,and an upgraded version of the CMSTInterim dataset with 5°×5°resolution has been developed.The CMST-Interim dataset has significantly improved the coverage rate of global surface temperature data.After reconstruction,the data coverage before 1950 increased from 78%−81%of the original CMST to 81%−89%.The total coverage after 1955 reached about 93%,including more than 98%in the Northern Hemisphere and 81%−89%in the Southern Hemisphere.Through the reconstruction ensemble experiments with different parameters,a good basis is provided for more systematic uncertainty assessment of C-LSAT2.0 and CMSTInterim.In comparison with the original CMST,the global mean surface temperatures are estimated to be cooler in the second half of 19th century and warmer during the 21st century,which shows that the global warming trend is further amplified.The global warming trends are updated from 0.085±0.004℃(10 yr)^(–1)and 0.128±0.006℃(10 yr)^(–1)to 0.089±0.004℃(10 yr)^(–1)and 0.137±0.007℃(10 yr)^(–1),respectively,since the start and the second half of 20th century.展开更多
Version 4(v4) of the Extended Reconstructed Sea Surface Temperature(ERSST) dataset is compared with its precedent, the widely used version 3b(v3b). The essential upgrades applied to v4 lead to remarkable differences i...Version 4(v4) of the Extended Reconstructed Sea Surface Temperature(ERSST) dataset is compared with its precedent, the widely used version 3b(v3b). The essential upgrades applied to v4 lead to remarkable differences in the characteristics of the sea surface temperature(SST) anomaly(SSTa) in both the temporal and spatial domains. First, the largest discrepancy of the global mean SSTa values around the 1940 s is due to ship-observation corrections made to reconcile observations from buckets and engine intake thermometers. Second, differences in global and regional mean SSTa values between v4 and v3b exhibit a downward trend(around-0.032℃ per decade) before the 1940s, an upward trend(around 0.014℃ per decade) during the period of 1950–2015, interdecadal oscillation with one peak around the 1980s, and two troughs during the 1960s and 2000s, respectively. This does not derive from treatments of the polar or the other data-void regions, since the difference of the SSTa does not share the common features. Third, the spatial pattern of the ENSO-related variability of v4 exhibits a wider but weaker cold tongue in the tropical region of the Pacific Ocean compared with that of v3b, which could be attributed to differences in gap-filling assumptions since the latter features satellite observations whereas the former features in situ ones. This intercomparison confirms that the structural uncertainty arising from underlying assumptions on the treatment of diverse SST observations even in the same SST product family is the main source of significant SST differences in the temporal domain. Why this uncertainty introduces artificial decadal oscillations remains unknown.展开更多
Prediction of tunneling-induced ground settlements is an essential task,particularly for tunneling in urban settings.Ground settlements should be limited within a tolerable threshold to avoid damages to aboveground st...Prediction of tunneling-induced ground settlements is an essential task,particularly for tunneling in urban settings.Ground settlements should be limited within a tolerable threshold to avoid damages to aboveground structures.Machine learning(ML)methods are becoming popular in many fields,including tunneling and underground excavations,as a powerful learning and predicting technique.However,the available datasets collected from a tunneling project are usually small from the perspective of applying ML methods.Can ML algorithms effectively predict tunneling-induced ground settlements when the available datasets are small?In this study,seven ML methods are utilized to predict tunneling-induced ground settlement using 14 contributing factors measured before or during tunnel excavation.These methods include multiple linear regression(MLR),decision tree(DT),random forest(RF),gradient boosting(GB),support vector regression(SVR),back-propagation neural network(BPNN),and permutation importancebased BPNN(PI-BPNN)models.All methods except BPNN and PI-BPNN are shallow-structure ML methods.The effectiveness of these seven ML approaches on small datasets is evaluated using model accuracy and stability.The model accuracy is measured by the coefficient of determination(R2)of training and testing datasets,and the stability of a learning algorithm indicates robust predictive performance.Also,the quantile error(QE)criterion is introduced to assess model predictive performance considering underpredictions and overpredictions.Our study reveals that the RF algorithm outperforms all the other models with the highest model prediction accuracy(0.9)and stability(3.0210^(-27)).Deep-structure ML models do not perform well for small datasets with relatively low model accuracy(0.59)and stability(5.76).The PI-BPNN architecture is proposed and designed for small datasets,showing better performance than typical BPNN.Six important contributing factors of ground settlements are identified,including tunnel depth,the distance between tunnel face and surface monitoring points(DTM),weighted average soil compressibility modulus(ACM),grouting pressure,penetrating rate and thrust force.展开更多
We present a high performance modularly-built open-source software-OpenIFEM.OpenIFEM is a C++implementation of the modified immersed finite element method(mIFEM)to solve fluid-structure interaction(FSI)problems.This s...We present a high performance modularly-built open-source software-OpenIFEM.OpenIFEM is a C++implementation of the modified immersed finite element method(mIFEM)to solve fluid-structure interaction(FSI)problems.This software is modularly built to perform multiple tasks including fluid dynamics(incompressible and slightly compressible fluid models),linear and nonlinear solid mechanics,and fully coupled fluid-structure interactions.Most of open-source software packages are restricted to certain discretization methods;some are under-tested,under-documented,and lack modularity as well as extensibility.OpenIFEM is designed and built to include a set of generic classes for users to adapt so that any fluid and solid solvers can be coupled through the FSI algorithm.In addition,the package utilizes well-developed and tested libraries.It also comes with standard test cases that serve as software and algorithm validation.The software can be built on cross-platform,i.e.,Linux,Windows,and Mac OS,using CMake.Efficient parallelization is also implemented for high-performance computing for large-sized problems.OpenIFEM is documented using Doxygen and publicly available to download on GitHub.It is expected to benefit the future development of FSI algorithms and be applied to a variety of FSI applications.展开更多
Seismic phase pickers based on deep neural networks have been extensively used recently,demonstrating their advantages on both performance and efficiency.However,these pickers are trained with and applied to different...Seismic phase pickers based on deep neural networks have been extensively used recently,demonstrating their advantages on both performance and efficiency.However,these pickers are trained with and applied to different data.A comprehensive benchmark based on a single dataset is therefore lacking.Here,using the recently released DiTing dataset,we analyzed performances of seven phase pickers with different network structures,the efficiencies are also evaluated using both CPU and GPU devices.Evaluations based on F1-scores reveal that the recurrent neural network(RNN)and EQTransformer exhibit the best performance,likely owing to their large receptive fields.Similar performances are observed among PhaseNet(UNet),UNet++,and the lightweight phase picking network(LPPN).However,the LPPN models are the most efficient.The RNN and EQTransformer have similar speeds,which are slower than those of the LPPN and PhaseNet.UNet++requires the most computational effort among the pickers.As all of the pickers perform well after being trained with a large-scale dataset,users may choose the one suitable for their applications.For beginners,we provide a tutorial on training and validating the pickers using the DiTing dataset.We also provide two sets of models trained using datasets with both 50 Hz and 100 Hz sampling rates for direct application by end-users.All of our models are open-source and publicly accessible.展开更多
Topology optimization(TO),a numerical technique to find the optimalmaterial layoutwith a given design domain,has attracted interest from researchers in the field of structural optimization in recent years.For beginner...Topology optimization(TO),a numerical technique to find the optimalmaterial layoutwith a given design domain,has attracted interest from researchers in the field of structural optimization in recent years.For beginners,opensource codes are undoubtedly the best alternative to learning TO,which can elaborate the implementation of a method in detail and easily engage more people to employ and extend the method.In this paper,we present a summary of various open-source codes and related literature on TO methods,including solid isotropic material with penalization(SIMP),evolutionary method,level set method(LSM),moving morphable components/voids(MMC/MMV)methods,multiscale topology optimization method,etc.Simultaneously,we classify the codes into five levels,fromeasy to difficult,depending on their difficulty,so that beginners can get started and understand the form of code implementation more quickly.展开更多
Data-driven algorithms for predicting mechanical properties with small datasets are evaluated in a case study on gear steel hardenability.The limitations of current data-driven algorithms and empirical models are iden...Data-driven algorithms for predicting mechanical properties with small datasets are evaluated in a case study on gear steel hardenability.The limitations of current data-driven algorithms and empirical models are identified.Challenges in analysing small datasets are discussed,and solution is proposed to handle small datasets with multiple variables.Gaussian methods in combination with novel predictive algorithms are utilized to overcome the challenges in analysing gear steel hardenability data and to gain insight into alloying elements interaction and structure homogeneity.The gained fundamental knowledge integrated with machine learning is shown to be superior to the empirical equations in predicting hardenability.Metallurgical-property relationships between chemistry,sample size,and hardness are predicted via two optimized machine learning algorithms:neural networks(NNs)and extreme gradient boosting(XGboost).A comparison is drawn between all algorithms,evaluating their performance based on small data sets.The results reveal that XGboost has the highest potential for predicting hardenability using small datasets with class imbalance and large inhomogeneity issues.展开更多
Scientific research requires the collection of data in order to study, monitor, analyze, describe, or understand a particular process or event. Data collection efforts are often a compromise: manual measurements can b...Scientific research requires the collection of data in order to study, monitor, analyze, describe, or understand a particular process or event. Data collection efforts are often a compromise: manual measurements can be time-consuming and labor-intensive, resulting in data being collected at a low frequency, while automating the data-collection process can reduce labor requirements and increase the frequency of measurements, but at the cost of added expense of electronic data-collecting instrumentation. Rapid advances in electronic technologies have resulted in a variety of new and inexpensive sensing, monitoring, and control capabilities which offer opportunities for implementation in agricultural and natural-resource research applications. An Open Source Hardware project called Arduino consists of a programmable microcontroller development platform, expansion capability through add-on boards, and a programming development environment for creating custom microcontroller software. All circuit-board and electronic component specifications, as well as the programming software, are open-source and freely available for anyone to use or modify. Inexpensive sensors and the Arduino development platform were used to develop several inexpensive, automated sensing and datalogging systems for use in agricultural and natural-resources related research projects. Systems were developed and implemented to monitor soil-moisture status of field crops for irrigation scheduling and crop-water use studies, to measure daily evaporation-pan water levels for quantifying evaporative demand, and to monitor environmental parameters under forested conditions. These studies demonstrate the usefulness of automated measurements, and offer guidance for other researchers in developing inexpensive sensing and monitoring systems to further their research.展开更多
Comparisons of the west Pacific subtropical high with the South Asia High are made using the NCEP/NCAR and ECMWF 500 hPa and 100 hPa monthly boreal geopotential height fields for the period 1961-2000. Discrepancies ar...Comparisons of the west Pacific subtropical high with the South Asia High are made using the NCEP/NCAR and ECMWF 500 hPa and 100 hPa monthly boreal geopotential height fields for the period 1961-2000. Discrepancies are found for the time prior to 1980. The west Pacific subtropical high in the NCEP/NCAR data is less intense than in ECMWF data before 1980. The range and strength of the west Pacific subtropical high variation described by the NCEP/NCAR data are larger than those depicted by ECMWF data. The same situation appears in the 100-hPa geopotential field. These discoveries suggest that the interdecadal variation of the two systems as shown by the NCEP/NCAR data may not be true. Besides, the South Asia High center in the NCEP/NCAR data is obviously stronger than in the ECMWF data during the periods 1969, 1979-1991 and 1992-1995. Furthermore, the range is larger from 1992 to 1995.展开更多
This is a study to compare three selected tropical cyclone datasets separately compiled by CMA Shanghai Typhoon Institute (CMA SHI), the Joint Typhoon Warning Center (JTWC), and the Japan Meteorological Agency (...This is a study to compare three selected tropical cyclone datasets separately compiled by CMA Shanghai Typhoon Institute (CMA SHI), the Joint Typhoon Warning Center (JTWC), and the Japan Meteorological Agency (JMA). The annual fi'equencies, observation times and destructive power index as the characteristic quantities are investigated of the tropical cyclones over the western North Pacific. The comparative study has resulted in the following findings: 1) Statistical gaps between the datasets compared are narrowing down as the intensity of tropical cyclones increases. 2) In the context of interdecadal distribution, there is for the 1950s a relatively large gap between the datasets, as compared with a narrowed gap for the period from the mid 1970s to the 1980s, and a recurring widened gap for the mid and late 1990s. Additionally, an approach is proposed in the paper to correct the wind speed data in the TC Yearbook.展开更多
Resource-scarce regions with serious COVID-19 outbreaks do not have enough ventilators to support critically ill patients,and these shortages are especially devastating in developing countries.To help alleviate this s...Resource-scarce regions with serious COVID-19 outbreaks do not have enough ventilators to support critically ill patients,and these shortages are especially devastating in developing countries.To help alleviate this strain,we have designed and tested the accessible low-barrier in vivo-validated economical ventilator(ALIVE Vent),a COVID-19-inspired,cost-effective,open-source,in vivo-validated solution made from commercially available components.The ALIVE Vent operates using compressed oxygen and air to drive inspiration,while two solenoid valves ensure one-way flow and precise cycle timing.The device was functionally tested and profiled using a variable resistance and compliance artificial lung and validated in anesthetized large animals.Our functional test results revealed its effective operation under a wide variety of ventilation conditions defined by the American Association of Respiratory Care guidelines for ventilator stockpiling.The large animal test showed that our ventilator performed similarly if not better than a standard ventilator in maintaining optimal ventilation status.The FiO2,respiratory rate,inspiratory to expiratory time ratio,positive-end expiratory pressure,and peak inspiratory pressure were successfully maintained within normal,clinically validated ranges,and the animals were recovered without any complications.In regions with limited access to ventilators,the ALIVE Vent can help alleviate shortages,and we have ensured that all used materials are publicly available.While this pandemic has elucidated enormous global inequalities in healthcare,innovative,cost-effective solutions aimed at reducing socio-economic barriers,such as the ALIVE Vent,can help enable access to prompt healthcare and life saving technology on a global scale and beyond COVID-19.展开更多
In today’s society with advanced Internet,the amount of information increases dramatically with each passing day,which leads to increasingly complex processes of open-source intelligence.Therefore,it is more importan...In today’s society with advanced Internet,the amount of information increases dramatically with each passing day,which leads to increasingly complex processes of open-source intelligence.Therefore,it is more important to rationalize the operation mode and improve the operation efficiency of open-source intelligence under the premise of satisfying users’needs.This paper focuses on the simulation study of the process system of opensource intelligence from the user’s perspective.First,the basic concept and development status of open-source intelligence are introduced in details.Second,six existing intelligence operation process models are summarized and their advantages and disadvantages are compared in focus.Based on users’preference,the open-source intelligence system simulation theory model is constructed from four aspects:intelligence collection,intelligence processing,intelligence analysis,and intelligence delivery.Meanwhile,the dynamics model of the open-source intelligence process system is constructed based on the open-source intelligence system simulation theoretical model,which specifically includes five parts:determination of system boundary,construction of causal loop diagram,construction of stock flow diagram,writing ofmathematical equations,and system sensitivity test.Finally,the system simulation results were analyzed.It was found that improving the system of intelligence agencies,opening up government affairs,improving the professional level of intelligence personnel,strengthening the communication and cooperation among personnel of various intelligence departments,and expressing intelligence products through diverse forms can effectively improve the operational efficiency of the open-source intelligence process system.展开更多
Total Cloud Cover (TCC) over China deter- mined from four climate datasets including the Interna- tional Satellite Cloud Climatology Project (ISCCP), the 40-year Re-Analysis Project of the European Centre for Medi...Total Cloud Cover (TCC) over China deter- mined from four climate datasets including the Interna- tional Satellite Cloud Climatology Project (ISCCP), the 40-year Re-Analysis Project of the European Centre for Medium-Range Weather Forecasts (ERA-40), Climate Research Unit Time Series 3.0 (CRU3), and ground sta- tion datasets are used to show spatial and temporal varia- tion of TCC and their differences. It is demonstrated that the four datasets show similar spatial pattern and seasonal variation. The maximum value is derived from ISCCE TCC value in North China derived from ERA-40 is 50% larger than that from the station dataset; however, the value is 50% less than that in South China. The annual TCC of ISCCP, ERA-40, and ground station datasets shows a decreasing trend during 1984-2002; however, an increasing trend is derived from CRU3. The results of this study imply remarkable differences of TCC derived from surface and satellite observations as well as model simu- lations. The potential effects of these differences on cloud climatology and associated climatic issues should be carefully considered.展开更多
In this study,through experimental research and an investigation on large datasets of the durability parameters in ocean engineering,the values,ranges,and types of distribution of the durability parameters employed fo...In this study,through experimental research and an investigation on large datasets of the durability parameters in ocean engineering,the values,ranges,and types of distribution of the durability parameters employed for the durability design in ocean engineering in northern China were confirmed.Based on a modified theoretical model of chloride diffusion and the reliability theory,the service lives of concrete structures exposed to the splash,tidal,and underwater zones were calculated.Mixed concrete proportions meeting the requirement of a service life of 100 or 120 years were designed,and a cover thickness requirement was proposed.In addition,the effects of the different time-varying relationships of the boundary condition(Cs)and diffusion coefficient(Df)on the service life were compared;the results showed that the time-varying relationships used in this study(i.e.,Cscontinuously increased and then remained stable,and Dfcontinuously decreased and then remained stable)were beneficial for the durability design of concrete structures in marine environment.展开更多
Anomaly based approaches in network intrusion detection suffer from evaluation, comparison and deployment which originate from the scarcity of adequate publicly available network trace datasets. Also, publicly availab...Anomaly based approaches in network intrusion detection suffer from evaluation, comparison and deployment which originate from the scarcity of adequate publicly available network trace datasets. Also, publicly available datasets are either outdated or generated in a controlled environment. Due to the ubiquity of cloud computing environments in commercial and government internet services, there is a need to assess the impacts of network attacks in cloud data centers. To the best of our knowledge, there is no publicly available dataset which captures the normal and anomalous network traces in the interactions between cloud users and cloud data centers. In this paper, we present an experimental platform designed to represent a practical interaction between cloud users and cloud services and collect network traces resulting from this interaction to conduct anomaly detection. We use Amazon web services (AWS) platform for conducting our experiments.展开更多
With the rapid development of Open-Source(OS),more and more software projects are maintained and developed in the form of OS.These Open-Source projects depend on and influence each other,gradually forming a huge OS pr...With the rapid development of Open-Source(OS),more and more software projects are maintained and developed in the form of OS.These Open-Source projects depend on and influence each other,gradually forming a huge OS project network,namely an Open-Source Software ECOsystem(OSSECO).Unfortunately,not all OS projects in the open-source ecosystem can be healthy and stable in the long term,and more projects will go from active to inactive and gradually die.In a tightly connected ecosystem,the death of one project can potentially cause the collapse of the entire ecosystem network.How can we effectively prevent such situations from happening?In this paper,we first identify the basic project characteristics that affect the survival of OS projects at both project and ecosystem levels through the proportional hazards model.Then,we utilize graph convolutional networks based on the ecosystem network to extract the ecosystem environment characteristics of OS projects.Finally,we fuse basic project characteristics and environmental project characteristics and construct a Hybrid Structured Prediction Model(HSPM)to predict the OS project survival state.The experimental results show that HSPM significantly improved compared to the traditional prediction model.Our work can substantially assist OS project managers in maintaining their projects’health.It can also provide an essential reference for developers when choosing the right open-source project for their production activities.展开更多
基金National Natural Science Foundation of China(No.61971036)Fundamental Research Funds for the Central Universities(No.2023CX01011)Beijing Nova Program(No.20230484361)。
文摘This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which is more realistic than synthetical datasets.In this paper,datasets containing different shapes are constructed based on the relative permittivities of human tissues.Then,a back-propagation scheme is used to obtain the rough reconstructions,which will be fed into a U-net convolutional neural network(CNN)to recover the high-resolution images.Numerical results show that the network trained on the datasets generated by the proposed method can obtain satisfying reconstruction results and is promising to be applied in real-time biomedical imaging.
基金the Natural Science Foundation of China(Grant Numbers 72074014 and 72004012).
文摘Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on the benchmark datasets have been proposed for multi-label classification task in the literature.Furthermore,several open-source tools implementing these approaches have also been developed.However,the characteristics of real-world multi-label patent and publication datasets are not completely in line with those of benchmark ones.Therefore,the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.Research limitations:Three real-world datasets differ in the following aspects:statement,data quality,and purposes.Additionally,open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection,which in turn impacts the performance of a multi-label classification approach.In the near future,we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.Practical implications:The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets,underscoring the complexity of real-world multi-label classification tasks.Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels.With ongoing enhancements in deep learning algorithms and large-scale models,it is expected that the efficacy of multi-label classification tasks will be significantly improved,reaching a level of practical utility in the foreseeable future.Originality/value:(1)Seven multi-label classification methods are comprehensively compared on three real-world datasets.(2)The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution.(3)The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution.
文摘This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus.
基金supported by the National Basic Research Program of China (Grant Nos. 2010CB950400 and 2013CB430301)the National Natural Science Foundation of China (Grant Nos. 41276025 and 41176023)+2 种基金the R&D Special Fund for Public Welfare Industry (Meteorology) (Grant No. GYHY201106036)The OFES simulation was conducted on the Earth Simulator under the support of JAMSTECsupported by the Data Sharing Infrastructure of Earth System Science-Data Sharing Service Center of the South China Sea and adjacent regions
文摘In this study, the upper ocean heat content (OHC) variations in the South China Sea (SCS) during 1993- 2006 were investigated by examining ocean temperatures in seven datasets, including World Ocean Atlas 2009 (WOA09) (climatology), Ishii datasets, Ocean General Circulation ModeI for the Earth Simulator (OFES), Simple Ocean Data Assimilation system (SODA), Global Ocean Data Assimilation System (GODAS), China Oceanic ReAnalysis system (CORA) , and an ocean reanalysis dataset for the joining area of Asia and Indian-Pacific Ocean (AIPO1.0). Among these datasets, two were independent of any numerical model, four relied on data assimilation, and one was generated without any data assimilation. The annual cycles revealed by the seven datasets were similar, but the interannual variations were different. Vertical structures of temperatures along the 18~N, 12.75~N, and 120~E sections were compared with data collected during open cruises in 1998 and 2005-08. The results indicated that Ishii, OFES, CORA, and AIPO1.0 were more consistent with the observations. Through systematic shortcomings and advantages in presenting the upper comparisons, we found that each dataset had its own OHC in the SCS.
文摘Based on C-LSAT2.0,using high-and low-frequency components reconstruction methods,combined with observation constraint masking,a reconstructed C-LSAT2.0 with 756 ensemble members from the 1850s to 2018 has been developed.These ensemble versions have been merged with the ERSSTv5 ensemble dataset,and an upgraded version of the CMSTInterim dataset with 5°×5°resolution has been developed.The CMST-Interim dataset has significantly improved the coverage rate of global surface temperature data.After reconstruction,the data coverage before 1950 increased from 78%−81%of the original CMST to 81%−89%.The total coverage after 1955 reached about 93%,including more than 98%in the Northern Hemisphere and 81%−89%in the Southern Hemisphere.Through the reconstruction ensemble experiments with different parameters,a good basis is provided for more systematic uncertainty assessment of C-LSAT2.0 and CMSTInterim.In comparison with the original CMST,the global mean surface temperatures are estimated to be cooler in the second half of 19th century and warmer during the 21st century,which shows that the global warming trend is further amplified.The global warming trends are updated from 0.085±0.004℃(10 yr)^(–1)and 0.128±0.006℃(10 yr)^(–1)to 0.089±0.004℃(10 yr)^(–1)and 0.137±0.007℃(10 yr)^(–1),respectively,since the start and the second half of 20th century.
基金supported by the National Key Basic Research and Development Plan (No.2015CB953900)the Natural Science Foundation of China (Nos.41330960 and 41776032)
文摘Version 4(v4) of the Extended Reconstructed Sea Surface Temperature(ERSST) dataset is compared with its precedent, the widely used version 3b(v3b). The essential upgrades applied to v4 lead to remarkable differences in the characteristics of the sea surface temperature(SST) anomaly(SSTa) in both the temporal and spatial domains. First, the largest discrepancy of the global mean SSTa values around the 1940 s is due to ship-observation corrections made to reconcile observations from buckets and engine intake thermometers. Second, differences in global and regional mean SSTa values between v4 and v3b exhibit a downward trend(around-0.032℃ per decade) before the 1940s, an upward trend(around 0.014℃ per decade) during the period of 1950–2015, interdecadal oscillation with one peak around the 1980s, and two troughs during the 1960s and 2000s, respectively. This does not derive from treatments of the polar or the other data-void regions, since the difference of the SSTa does not share the common features. Third, the spatial pattern of the ENSO-related variability of v4 exhibits a wider but weaker cold tongue in the tropical region of the Pacific Ocean compared with that of v3b, which could be attributed to differences in gap-filling assumptions since the latter features satellite observations whereas the former features in situ ones. This intercomparison confirms that the structural uncertainty arising from underlying assumptions on the treatment of diverse SST observations even in the same SST product family is the main source of significant SST differences in the temporal domain. Why this uncertainty introduces artificial decadal oscillations remains unknown.
基金funded by the University Transportation Center for Underground Transportation Infrastructure(UTC-UTI)at the Colorado School of Mines under Grant No.69A3551747118 from the US Department of Transportation(DOT).
文摘Prediction of tunneling-induced ground settlements is an essential task,particularly for tunneling in urban settings.Ground settlements should be limited within a tolerable threshold to avoid damages to aboveground structures.Machine learning(ML)methods are becoming popular in many fields,including tunneling and underground excavations,as a powerful learning and predicting technique.However,the available datasets collected from a tunneling project are usually small from the perspective of applying ML methods.Can ML algorithms effectively predict tunneling-induced ground settlements when the available datasets are small?In this study,seven ML methods are utilized to predict tunneling-induced ground settlement using 14 contributing factors measured before or during tunnel excavation.These methods include multiple linear regression(MLR),decision tree(DT),random forest(RF),gradient boosting(GB),support vector regression(SVR),back-propagation neural network(BPNN),and permutation importancebased BPNN(PI-BPNN)models.All methods except BPNN and PI-BPNN are shallow-structure ML methods.The effectiveness of these seven ML approaches on small datasets is evaluated using model accuracy and stability.The model accuracy is measured by the coefficient of determination(R2)of training and testing datasets,and the stability of a learning algorithm indicates robust predictive performance.Also,the quantile error(QE)criterion is introduced to assess model predictive performance considering underpredictions and overpredictions.Our study reveals that the RF algorithm outperforms all the other models with the highest model prediction accuracy(0.9)and stability(3.0210^(-27)).Deep-structure ML models do not perform well for small datasets with relatively low model accuracy(0.59)and stability(5.76).The PI-BPNN architecture is proposed and designed for small datasets,showing better performance than typical BPNN.Six important contributing factors of ground settlements are identified,including tunnel depth,the distance between tunnel face and surface monitoring points(DTM),weighted average soil compressibility modulus(ACM),grouting pressure,penetrating rate and thrust force.
文摘We present a high performance modularly-built open-source software-OpenIFEM.OpenIFEM is a C++implementation of the modified immersed finite element method(mIFEM)to solve fluid-structure interaction(FSI)problems.This software is modularly built to perform multiple tasks including fluid dynamics(incompressible and slightly compressible fluid models),linear and nonlinear solid mechanics,and fully coupled fluid-structure interactions.Most of open-source software packages are restricted to certain discretization methods;some are under-tested,under-documented,and lack modularity as well as extensibility.OpenIFEM is designed and built to include a set of generic classes for users to adapt so that any fluid and solid solvers can be coupled through the FSI algorithm.In addition,the package utilizes well-developed and tested libraries.It also comes with standard test cases that serve as software and algorithm validation.The software can be built on cross-platform,i.e.,Linux,Windows,and Mac OS,using CMake.Efficient parallelization is also implemented for high-performance computing for large-sized problems.OpenIFEM is documented using Doxygen and publicly available to download on GitHub.It is expected to benefit the future development of FSI algorithms and be applied to a variety of FSI applications.
基金jointly supported by the National Natural Science Foundation of China (No. 42074060)the Special Fund, Institute of Geophysics, China Earthquake Administration (CEA-IGP) (Nos. DQJB19B29, DQJB20B15, and DQJB22Z01)supported by XingHuo Project, CEA (No. XH211103)
文摘Seismic phase pickers based on deep neural networks have been extensively used recently,demonstrating their advantages on both performance and efficiency.However,these pickers are trained with and applied to different data.A comprehensive benchmark based on a single dataset is therefore lacking.Here,using the recently released DiTing dataset,we analyzed performances of seven phase pickers with different network structures,the efficiencies are also evaluated using both CPU and GPU devices.Evaluations based on F1-scores reveal that the recurrent neural network(RNN)and EQTransformer exhibit the best performance,likely owing to their large receptive fields.Similar performances are observed among PhaseNet(UNet),UNet++,and the lightweight phase picking network(LPPN).However,the LPPN models are the most efficient.The RNN and EQTransformer have similar speeds,which are slower than those of the LPPN and PhaseNet.UNet++requires the most computational effort among the pickers.As all of the pickers perform well after being trained with a large-scale dataset,users may choose the one suitable for their applications.For beginners,we provide a tutorial on training and validating the pickers using the DiTing dataset.We also provide two sets of models trained using datasets with both 50 Hz and 100 Hz sampling rates for direct application by end-users.All of our models are open-source and publicly accessible.
基金supported by the National Key R&D Program of China[Grant Number 2020YFB1708300]the National Natural Science Foundation of China[Grant Number 52075184].
文摘Topology optimization(TO),a numerical technique to find the optimalmaterial layoutwith a given design domain,has attracted interest from researchers in the field of structural optimization in recent years.For beginners,opensource codes are undoubtedly the best alternative to learning TO,which can elaborate the implementation of a method in detail and easily engage more people to employ and extend the method.In this paper,we present a summary of various open-source codes and related literature on TO methods,including solid isotropic material with penalization(SIMP),evolutionary method,level set method(LSM),moving morphable components/voids(MMC/MMV)methods,multiscale topology optimization method,etc.Simultaneously,we classify the codes into five levels,fromeasy to difficult,depending on their difficulty,so that beginners can get started and understand the form of code implementation more quickly.
文摘Data-driven algorithms for predicting mechanical properties with small datasets are evaluated in a case study on gear steel hardenability.The limitations of current data-driven algorithms and empirical models are identified.Challenges in analysing small datasets are discussed,and solution is proposed to handle small datasets with multiple variables.Gaussian methods in combination with novel predictive algorithms are utilized to overcome the challenges in analysing gear steel hardenability data and to gain insight into alloying elements interaction and structure homogeneity.The gained fundamental knowledge integrated with machine learning is shown to be superior to the empirical equations in predicting hardenability.Metallurgical-property relationships between chemistry,sample size,and hardness are predicted via two optimized machine learning algorithms:neural networks(NNs)and extreme gradient boosting(XGboost).A comparison is drawn between all algorithms,evaluating their performance based on small data sets.The results reveal that XGboost has the highest potential for predicting hardenability using small datasets with class imbalance and large inhomogeneity issues.
文摘Scientific research requires the collection of data in order to study, monitor, analyze, describe, or understand a particular process or event. Data collection efforts are often a compromise: manual measurements can be time-consuming and labor-intensive, resulting in data being collected at a low frequency, while automating the data-collection process can reduce labor requirements and increase the frequency of measurements, but at the cost of added expense of electronic data-collecting instrumentation. Rapid advances in electronic technologies have resulted in a variety of new and inexpensive sensing, monitoring, and control capabilities which offer opportunities for implementation in agricultural and natural-resource research applications. An Open Source Hardware project called Arduino consists of a programmable microcontroller development platform, expansion capability through add-on boards, and a programming development environment for creating custom microcontroller software. All circuit-board and electronic component specifications, as well as the programming software, are open-source and freely available for anyone to use or modify. Inexpensive sensors and the Arduino development platform were used to develop several inexpensive, automated sensing and datalogging systems for use in agricultural and natural-resources related research projects. Systems were developed and implemented to monitor soil-moisture status of field crops for irrigation scheduling and crop-water use studies, to measure daily evaporation-pan water levels for quantifying evaporative demand, and to monitor environmental parameters under forested conditions. These studies demonstrate the usefulness of automated measurements, and offer guidance for other researchers in developing inexpensive sensing and monitoring systems to further their research.
基金Key Laboratory on Natural Disasters for Jiangsu Province (KLME050210)
文摘Comparisons of the west Pacific subtropical high with the South Asia High are made using the NCEP/NCAR and ECMWF 500 hPa and 100 hPa monthly boreal geopotential height fields for the period 1961-2000. Discrepancies are found for the time prior to 1980. The west Pacific subtropical high in the NCEP/NCAR data is less intense than in ECMWF data before 1980. The range and strength of the west Pacific subtropical high variation described by the NCEP/NCAR data are larger than those depicted by ECMWF data. The same situation appears in the 100-hPa geopotential field. These discoveries suggest that the interdecadal variation of the two systems as shown by the NCEP/NCAR data may not be true. Besides, the South Asia High center in the NCEP/NCAR data is obviously stronger than in the ECMWF data during the periods 1969, 1979-1991 and 1992-1995. Furthermore, the range is larger from 1992 to 1995.
基金Monitoring and Detection of Upperair Climate Change in China (GYHY200906014)
文摘This is a study to compare three selected tropical cyclone datasets separately compiled by CMA Shanghai Typhoon Institute (CMA SHI), the Joint Typhoon Warning Center (JTWC), and the Japan Meteorological Agency (JMA). The annual fi'equencies, observation times and destructive power index as the characteristic quantities are investigated of the tropical cyclones over the western North Pacific. The comparative study has resulted in the following findings: 1) Statistical gaps between the datasets compared are narrowing down as the intensity of tropical cyclones increases. 2) In the context of interdecadal distribution, there is for the 1950s a relatively large gap between the datasets, as compared with a narrowed gap for the period from the mid 1970s to the 1980s, and a recurring widened gap for the mid and late 1990s. Additionally, an approach is proposed in the paper to correct the wind speed data in the TC Yearbook.
基金the National Institutes of Health(NIH R01 HL089315-01 and NIH R01 HL152155,YJW)the Thoracic Surgery Foundation Resident Research Fellowship(YZ)the National Science Foundation Graduate Research Fellowship Program(AMI).
文摘Resource-scarce regions with serious COVID-19 outbreaks do not have enough ventilators to support critically ill patients,and these shortages are especially devastating in developing countries.To help alleviate this strain,we have designed and tested the accessible low-barrier in vivo-validated economical ventilator(ALIVE Vent),a COVID-19-inspired,cost-effective,open-source,in vivo-validated solution made from commercially available components.The ALIVE Vent operates using compressed oxygen and air to drive inspiration,while two solenoid valves ensure one-way flow and precise cycle timing.The device was functionally tested and profiled using a variable resistance and compliance artificial lung and validated in anesthetized large animals.Our functional test results revealed its effective operation under a wide variety of ventilation conditions defined by the American Association of Respiratory Care guidelines for ventilator stockpiling.The large animal test showed that our ventilator performed similarly if not better than a standard ventilator in maintaining optimal ventilation status.The FiO2,respiratory rate,inspiratory to expiratory time ratio,positive-end expiratory pressure,and peak inspiratory pressure were successfully maintained within normal,clinically validated ranges,and the animals were recovered without any complications.In regions with limited access to ventilators,the ALIVE Vent can help alleviate shortages,and we have ensured that all used materials are publicly available.While this pandemic has elucidated enormous global inequalities in healthcare,innovative,cost-effective solutions aimed at reducing socio-economic barriers,such as the ALIVE Vent,can help enable access to prompt healthcare and life saving technology on a global scale and beyond COVID-19.
基金supported by the National Social Science Foundation of China under the project“Research on the mechanism of developing and utilizing domestic and foreign open-source intelligence under product-oriented thinking(20BTQ049)”.
文摘In today’s society with advanced Internet,the amount of information increases dramatically with each passing day,which leads to increasingly complex processes of open-source intelligence.Therefore,it is more important to rationalize the operation mode and improve the operation efficiency of open-source intelligence under the premise of satisfying users’needs.This paper focuses on the simulation study of the process system of opensource intelligence from the user’s perspective.First,the basic concept and development status of open-source intelligence are introduced in details.Second,six existing intelligence operation process models are summarized and their advantages and disadvantages are compared in focus.Based on users’preference,the open-source intelligence system simulation theory model is constructed from four aspects:intelligence collection,intelligence processing,intelligence analysis,and intelligence delivery.Meanwhile,the dynamics model of the open-source intelligence process system is constructed based on the open-source intelligence system simulation theoretical model,which specifically includes five parts:determination of system boundary,construction of causal loop diagram,construction of stock flow diagram,writing ofmathematical equations,and system sensitivity test.Finally,the system simulation results were analyzed.It was found that improving the system of intelligence agencies,opening up government affairs,improving the professional level of intelligence personnel,strengthening the communication and cooperation among personnel of various intelligence departments,and expressing intelligence products through diverse forms can effectively improve the operational efficiency of the open-source intelligence process system.
基金supported by the "Strategic Priority Research Program" of the Chinese Academy of Sciences(XDA05100300)the National Basic Research Program of China(2013CB955801)the National Natural Science Foundation of China(41175030)
文摘Total Cloud Cover (TCC) over China deter- mined from four climate datasets including the Interna- tional Satellite Cloud Climatology Project (ISCCP), the 40-year Re-Analysis Project of the European Centre for Medium-Range Weather Forecasts (ERA-40), Climate Research Unit Time Series 3.0 (CRU3), and ground sta- tion datasets are used to show spatial and temporal varia- tion of TCC and their differences. It is demonstrated that the four datasets show similar spatial pattern and seasonal variation. The maximum value is derived from ISCCE TCC value in North China derived from ERA-40 is 50% larger than that from the station dataset; however, the value is 50% less than that in South China. The annual TCC of ISCCP, ERA-40, and ground station datasets shows a decreasing trend during 1984-2002; however, an increasing trend is derived from CRU3. The results of this study imply remarkable differences of TCC derived from surface and satellite observations as well as model simu- lations. The potential effects of these differences on cloud climatology and associated climatic issues should be carefully considered.
基金financial support provided by the National Natural Science Foundation of China(51508272,11832013,51878350,and 51678304)。
文摘In this study,through experimental research and an investigation on large datasets of the durability parameters in ocean engineering,the values,ranges,and types of distribution of the durability parameters employed for the durability design in ocean engineering in northern China were confirmed.Based on a modified theoretical model of chloride diffusion and the reliability theory,the service lives of concrete structures exposed to the splash,tidal,and underwater zones were calculated.Mixed concrete proportions meeting the requirement of a service life of 100 or 120 years were designed,and a cover thickness requirement was proposed.In addition,the effects of the different time-varying relationships of the boundary condition(Cs)and diffusion coefficient(Df)on the service life were compared;the results showed that the time-varying relationships used in this study(i.e.,Cscontinuously increased and then remained stable,and Dfcontinuously decreased and then remained stable)were beneficial for the durability design of concrete structures in marine environment.
文摘Anomaly based approaches in network intrusion detection suffer from evaluation, comparison and deployment which originate from the scarcity of adequate publicly available network trace datasets. Also, publicly available datasets are either outdated or generated in a controlled environment. Due to the ubiquity of cloud computing environments in commercial and government internet services, there is a need to assess the impacts of network attacks in cloud data centers. To the best of our knowledge, there is no publicly available dataset which captures the normal and anomalous network traces in the interactions between cloud users and cloud data centers. In this paper, we present an experimental platform designed to represent a practical interaction between cloud users and cloud services and collect network traces resulting from this interaction to conduct anomaly detection. We use Amazon web services (AWS) platform for conducting our experiments.
基金This work was supported by the National Social Science Foundation(NSSF)Research on intelligent recommendation of multi-modal resources for children’s graded reading in smart library(22BTQ033)the Science and Technology Research and Development Program Project of China railway group limited(Project No.2021-Special-08).
文摘With the rapid development of Open-Source(OS),more and more software projects are maintained and developed in the form of OS.These Open-Source projects depend on and influence each other,gradually forming a huge OS project network,namely an Open-Source Software ECOsystem(OSSECO).Unfortunately,not all OS projects in the open-source ecosystem can be healthy and stable in the long term,and more projects will go from active to inactive and gradually die.In a tightly connected ecosystem,the death of one project can potentially cause the collapse of the entire ecosystem network.How can we effectively prevent such situations from happening?In this paper,we first identify the basic project characteristics that affect the survival of OS projects at both project and ecosystem levels through the proportional hazards model.Then,we utilize graph convolutional networks based on the ecosystem network to extract the ecosystem environment characteristics of OS projects.Finally,we fuse basic project characteristics and environmental project characteristics and construct a Hybrid Structured Prediction Model(HSPM)to predict the OS project survival state.The experimental results show that HSPM significantly improved compared to the traditional prediction model.Our work can substantially assist OS project managers in maintaining their projects’health.It can also provide an essential reference for developers when choosing the right open-source project for their production activities.