To obtain more stable spectral data for accurate quantitative analysis of multi-element,especially for the large-area in-situ elements detection of soils, we propose a method for a multielement quantitative analysis o...To obtain more stable spectral data for accurate quantitative analysis of multi-element,especially for the large-area in-situ elements detection of soils, we propose a method for a multielement quantitative analysis of soils using calibration-free laser-induced breakdown spectroscopy(CF-LIBS) based on data filtering. In this study, we analyze a standard soil sample doped with two heavy metal elements, Cu and Cd, with a specific focus on the line of Cu I324.75 nm for filtering the experimental data of multiple sample sets. Pre-and post-data filtering,the relative standard deviation for Cu decreased from 30% to 10%, The limits of detection(LOD)values for Cu and Cd decreased by 5% and 4%, respectively. Through CF-LIBS, a quantitative analysis was conducted to determine the relative content of elements in soils. Using Cu as a reference, the concentration of Cd was accurately calculated. The results show that post-data filtering, the average relative error of the Cd decreases from 11% to 5%, indicating the effectiveness of data filtering in improving the accuracy of quantitative analysis. Moreover, the content of Si, Fe and other elements can be accurately calculated using this method. To further correct the calculation, the results for Cd was used to provide a more precise calculation. This approach is of great importance for the large-area in-situ heavy metals and trace elements detection in soil, as well as for rapid and accurate quantitative analysis.展开更多
Seeing is an important index to evaluate the quality of an astronomical site.To estimate seeing at the Muztagh-Ata site with height and time quantitatively,the European Centre for Medium-Range Weather Forecasts reanal...Seeing is an important index to evaluate the quality of an astronomical site.To estimate seeing at the Muztagh-Ata site with height and time quantitatively,the European Centre for Medium-Range Weather Forecasts reanalysis database(ERA5)is used.Seeing calculated from ERA5 is compared consistently with the Differential Image Motion Monitor seeing at the height of 12 m.Results show that seeing decays exponentially with height at the Muztagh-Ata site.Seeing decays the fastest in fall in 2021 and most slowly with height in summer.The seeing condition is better in fall than in summer.The median value of seeing at 12 m is 0.89 arcsec,the maximum value is1.21 arcsec in August and the minimum is 0.66 arcsec in October.The median value of seeing at 12 m is 0.72arcsec in the nighttime and 1.08 arcsec in the daytime.Seeing is a combination of annual and about biannual variations with the same phase as temperature and wind speed indicating that seeing variation with time is influenced by temperature and wind speed.The Richardson number Ri is used to analyze the atmospheric stability and the variations of seeing are consistent with Ri between layers.These quantitative results can provide an important reference for a telescopic observation strategy.展开更多
How can we efficiently store and mine dynamically generated dense tensors for modeling the behavior of multidimensional dynamic data?Much of the multidimensional dynamic data in the real world is generated in the form...How can we efficiently store and mine dynamically generated dense tensors for modeling the behavior of multidimensional dynamic data?Much of the multidimensional dynamic data in the real world is generated in the form of time-growing tensors.For example,air quality tensor data consists of multiple sensory values gathered from wide locations for a long time.Such data,accumulated over time,is redundant and consumes a lot ofmemory in its raw form.We need a way to efficiently store dynamically generated tensor data that increase over time and to model their behavior on demand between arbitrary time blocks.To this end,we propose a Block IncrementalDense Tucker Decomposition(BID-Tucker)method for efficient storage and on-demand modeling ofmultidimensional spatiotemporal data.Assuming that tensors come in unit blocks where only the time domain changes,our proposed BID-Tucker first slices the blocks into matrices and decomposes them via singular value decomposition(SVD).The SVDs of the time×space sliced matrices are stored instead of the raw tensor blocks to save space.When modeling from data is required at particular time blocks,the SVDs of corresponding time blocks are retrieved and incremented to be used for Tucker decomposition.The factor matrices and core tensor of the decomposed results can then be used for further data analysis.We compared our proposed BID-Tucker with D-Tucker,which our method extends,and vanilla Tucker decomposition.We show that our BID-Tucker is faster than both D-Tucker and vanilla Tucker decomposition and uses less memory for storage with a comparable reconstruction error.We applied our proposed BID-Tucker to model the spatial and temporal trends of air quality data collected in South Korea from 2018 to 2022.We were able to model the spatial and temporal air quality trends.We were also able to verify unusual events,such as chronic ozone alerts and large fire events.展开更多
Peanut allergy is majorly related to severe food induced allergic reactions.Several food including cow's milk,hen's eggs,soy,wheat,peanuts,tree nuts(walnuts,hazelnuts,almonds,cashews,pecans and pistachios),fis...Peanut allergy is majorly related to severe food induced allergic reactions.Several food including cow's milk,hen's eggs,soy,wheat,peanuts,tree nuts(walnuts,hazelnuts,almonds,cashews,pecans and pistachios),fish and shellfish are responsible for more than 90%of food allergies.Here,we provide promising insights using a large-scale data-driven analysis,comparing the mechanistic feature and biological relevance of different ingredients presents in peanuts,tree nuts(walnuts,almonds,cashews,pecans and pistachios)and soybean.Additionally,we have analysed the chemical compositions of peanuts in different processed form raw,boiled and dry-roasted.Using the data-driven approach we are able to generate new hypotheses to explain why nuclear receptors like the peroxisome proliferator-activated receptors(PPARs)and its isoform and their interaction with dietary lipids may have significant effect on allergic response.The results obtained from this study will direct future experimeantal and clinical studies to understand the role of dietary lipids and PPARisoforms to exert pro-inflammatory or anti-inflammatory functions on cells of the innate immunity and influence antigen presentation to the cells of the adaptive immunity.展开更多
Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and sha...Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.展开更多
Integrated data and energy transfer(IDET)enables the electromagnetic waves to transmit wireless energy at the same time of data delivery for lowpower devices.In this paper,an energy harvesting modulation(EHM)assisted ...Integrated data and energy transfer(IDET)enables the electromagnetic waves to transmit wireless energy at the same time of data delivery for lowpower devices.In this paper,an energy harvesting modulation(EHM)assisted multi-user IDET system is studied,where all the received signals at the users are exploited for energy harvesting without the degradation of wireless data transfer(WDT)performance.The joint IDET performance is then analysed theoretically by conceiving a practical time-dependent wireless channel.With the aid of the AO based algorithm,the average effective data rate among users are maximized by ensuring the BER and the wireless energy transfer(WET)performance.Simulation results validate and evaluate the IDET performance of the EHM assisted system,which also demonstrates that the optimal number of user clusters and IDET time slots should be allocated,in order to improve the WET and WDT performance.展开更多
Microsoft Excel is essential for the End-User Approach (EUA), offering versatility in data organization, analysis, and visualization, as well as widespread accessibility. It fosters collaboration and informed decision...Microsoft Excel is essential for the End-User Approach (EUA), offering versatility in data organization, analysis, and visualization, as well as widespread accessibility. It fosters collaboration and informed decision-making across diverse domains. Conversely, Python is indispensable for professional programming due to its versatility, readability, extensive libraries, and robust community support. It enables efficient development, advanced data analysis, data mining, and automation, catering to diverse industries and applications. However, one primary issue when using Microsoft Excel with Python libraries is compatibility and interoperability. While Excel is a widely used tool for data storage and analysis, it may not seamlessly integrate with Python libraries, leading to challenges in reading and writing data, especially in complex or large datasets. Additionally, manipulating Excel files with Python may not always preserve formatting or formulas accurately, potentially affecting data integrity. Moreover, dependency on Excel’s graphical user interface (GUI) for automation can limit scalability and reproducibility compared to Python’s scripting capabilities. This paper covers the integration solution of empowering non-programmers to leverage Python’s capabilities within the familiar Excel environment. This enables users to perform advanced data analysis and automation tasks without requiring extensive programming knowledge. Based on Soliciting feedback from non-programmers who have tested the integration solution, the case study shows how the solution evaluates the ease of implementation, performance, and compatibility of Python with Excel versions.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
This research paper compares Excel and R language for data analysis and concludes that R language is more suitable for complex data analysis tasks.R language’s open-source nature makes it accessible to everyone,and i...This research paper compares Excel and R language for data analysis and concludes that R language is more suitable for complex data analysis tasks.R language’s open-source nature makes it accessible to everyone,and its powerful data management and analysis tools make it suitable for handling complex data analysis tasks.It is also highly customizable,allowing users to create custom functions and packages to meet their specific needs.Additionally,R language provides high reproducibility,making it easy to replicate and verify research results,and it has excellent collaboration capabilities,enabling multiple users to work on the same project simultaneously.These advantages make R language a more suitable choice for complex data analysis tasks,particularly in scientific research and business applications.The findings of this study will help people understand that R is not just a language that can handle more data than Excel and demonstrate that r is essential to the field of data analysis.At the same time,it will also help users and organizations make informed decisions regarding their data analysis needs and software preferences.展开更多
This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,tradit...This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,traditional data analysis methods have been unable to meet the needs.Research methods include building neural networks and deep learning models,optimizing and improving them through Bayesian analysis,and applying them to the visualization of large-scale data sets.The results show that the neural network combined with Bayesian analysis and deep learning method can effectively improve the accuracy and efficiency of data visualization,and enhance the intuitiveness and depth of data interpretation.The significance of the research is that it provides a new solution for data visualization in the big data environment and helps to further promote the development and application of data science.展开更多
The outbreak of the pandemic,caused by Coronavirus Disease 2019(COVID-19),has affected the daily activities of people across the globe.During COVID-19 outbreak and the successive lockdowns,Twitter was heavily used and...The outbreak of the pandemic,caused by Coronavirus Disease 2019(COVID-19),has affected the daily activities of people across the globe.During COVID-19 outbreak and the successive lockdowns,Twitter was heavily used and the number of tweets regarding COVID-19 increased tremendously.Several studies used Sentiment Analysis(SA)to analyze the emotions expressed through tweets upon COVID-19.Therefore,in current study,a new Artificial Bee Colony(ABC)with Machine Learning-driven SA(ABCMLSA)model is developed for conducting Sentiment Analysis of COVID-19 Twitter data.The prime focus of the presented ABCML-SA model is to recognize the sentiments expressed in tweets made uponCOVID-19.It involves data pre-processing at the initial stage followed by n-gram based feature extraction to derive the feature vectors.For identification and classification of the sentiments,the Support Vector Machine(SVM)model is exploited.At last,the ABC algorithm is applied to fine tune the parameters involved in SVM.To demonstrate the improved performance of the proposed ABCML-SA model,a sequence of simulations was conducted.The comparative assessment results confirmed the effectual performance of the proposed ABCML-SA model over other approaches.展开更多
In the nonparametric data envelopment analysis literature,scale elasticity is evaluated in two alternative ways:using either the technical efficiency model or the cost efficiency model.This evaluation becomes problema...In the nonparametric data envelopment analysis literature,scale elasticity is evaluated in two alternative ways:using either the technical efficiency model or the cost efficiency model.This evaluation becomes problematic in several situations,for example(a)when input proportions change in the long run,(b)when inputs are heterogeneous,and(c)when firms face ex-ante price uncertainty in making their production decisions.To address these situations,a scale elasticity evaluation was performed using a value-based cost efficiency model.However,this alternative value-based scale elasticity evaluation is sensitive to the uncertainty and variability underlying input and output data.Therefore,in this study,we introduce a stochastic cost-efficiency model based on chance-constrained programming to develop a value-based measure of the scale elasticity of firms facing data uncertainty.An illustrative empirical application to the Indian banking industry comprising 71 banks for eight years(1998–2005)was made to compare inferences about their efficiency and scale properties.The key findings are as follows:First,both the deterministic model and our proposed stochastic model yield distinctly different results concerning the efficiency and scale elasticity scores at various tolerance levels of chance constraints.However,both models yield the same results at a tolerance level of 0.5,implying that the deterministic model is a special case of the stochastic model in that it reveals the same efficiency and returns to scale characterizations of banks.Second,the stochastic model generates higher efficiency scores for inefficient banks than its deterministic counterpart.Third,public banks exhibit higher efficiency than private and foreign banks.Finally,public and old private banks mostly exhibit either decreasing or constant returns to scale,whereas foreign and new private banks experience either increasing or decreasing returns to scale.Although the application of our proposed stochastic model is illustrative,it can be potentially applied to all firms in the information and distribution-intensive industry with high fixed costs,which have ample potential for reaping scale and scope benefits.展开更多
Electrocardiogram(ECG)is a low-cost,simple,fast,and non-invasive test.It can reflect the heart’s electrical activity and provide valuable diagnostic clues about the health of the entire body.Therefore,ECG has been wi...Electrocardiogram(ECG)is a low-cost,simple,fast,and non-invasive test.It can reflect the heart’s electrical activity and provide valuable diagnostic clues about the health of the entire body.Therefore,ECG has been widely used in various biomedical applications such as arrhythmia detection,disease-specific detection,mortality prediction,and biometric recognition.In recent years,ECG-related studies have been carried out using a variety of publicly available datasets,with many differences in the datasets used,data preprocessing methods,targeted challenges,and modeling and analysis techniques.Here we systematically summarize and analyze the ECGbased automatic analysis methods and applications.Specifically,we first reviewed 22 commonly used ECG public datasets and provided an overview of data preprocessing processes.Then we described some of the most widely used applications of ECG signals and analyzed the advanced methods involved in these applications.Finally,we elucidated some of the challenges in ECG analysis and provided suggestions for further research.展开更多
Getting insight into the spatiotemporal distribution patterns of knowledge innovation is receiving increasing attention from policymakers and economic research organizations.Many studies use bibliometric data to analy...Getting insight into the spatiotemporal distribution patterns of knowledge innovation is receiving increasing attention from policymakers and economic research organizations.Many studies use bibliometric data to analyze the popularity of certain research topics,well-adopted methodologies,influential authors,and the interrelationships among research disciplines.However,the visual exploration of the patterns of research topics with an emphasis on their spatial and temporal distribution remains challenging.This study combined a Space-Time Cube(STC)and a 3D glyph to represent the complex multivariate bibliographic data.We further implemented a visual design by developing an interactive interface.The effectiveness,understandability,and engagement of ST-Map are evaluated by seven experts in geovisualization.The results suggest that it is promising to use three-dimensional visualization to show the overview and on-demand details on a single screen.展开更多
The application of single-cell RNA sequencing(scRNA-seq)in biomedical research has advanced our understanding of the pathogenesis of disease and provided valuable insights into new diagnostic and therapeutic strategie...The application of single-cell RNA sequencing(scRNA-seq)in biomedical research has advanced our understanding of the pathogenesis of disease and provided valuable insights into new diagnostic and therapeutic strategies.With the expansion of capacity for high-throughput scRNA-seq,including clinical samples,the analysis of these huge volumes of data has become a daunting prospect for researchers entering this field.Here,we review the workflow for typical scRNA-seq data analysis,covering raw data processing and quality control,basic data analysis applicable for almost all scRNA-seq data sets,and advanced data analysis that should be tailored to specific scientific questions.While summarizing the current methods for each analysis step,we also provide an online repository of software and wrapped-up scripts to support the implementation.Recommendations and caveats are pointed out for some specific analysis tasks and approaches.We hope this resource will be helpful to researchers engaging with scRNA-seq,in particular for emerging clinical applications.展开更多
Big data on product sales are an emerging resource for supporting modular product design to meet diversified customers’requirements of product specification combinations.To better facilitate decision-making of modula...Big data on product sales are an emerging resource for supporting modular product design to meet diversified customers’requirements of product specification combinations.To better facilitate decision-making of modular product design,correlations among specifications and components originated from customers’conscious and subconscious preferences can be investigated by using big data on product sales.This study proposes a framework and the associated methods for supporting modular product design decisions based on correlation analysis of product specifications and components using big sales data.The correlations of the product specifications are determined by analyzing the collected product sales data.By building the relations between the product components and specifications,a matrix for measuring the correlation among product components is formed for component clustering.Six rules for supporting the decision making of modular product design are proposed based on the frequency analysis of the specification values per component cluster.A case study of electric vehicles illustrates the application of the proposed method.展开更多
As COVID-19 poses a major threat to people’s health and economy,there is an urgent need for forecasting methodologies that can anticipate its trajectory efficiently.In non-stationary time series forecasting jobs,ther...As COVID-19 poses a major threat to people’s health and economy,there is an urgent need for forecasting methodologies that can anticipate its trajectory efficiently.In non-stationary time series forecasting jobs,there is frequently a hysteresis in the anticipated values relative to the real values.The multilayer deep-time convolutional network and a feature fusion network are combined in this paper’s proposal of an enhanced Multilayer Deep Time Convolutional Neural Network(MDTCNet)for COVID-19 prediction to address this problem.In particular,it is possible to record the deep features and temporal dependencies in uncertain time series,and the features may then be combined using a feature fusion network and a multilayer perceptron.Last but not least,the experimental verification is conducted on the prediction task of COVID-19 real daily confirmed cases in the world and the United States with uncertainty,realizing the short-term and long-term prediction of COVID-19 daily confirmed cases,and verifying the effectiveness and accuracy of the suggested prediction method,as well as reducing the hysteresis of the prediction results.展开更多
Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable opera...Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable operation of distribution networks and power supplies needed for daily life.Therefore,considering the requirements for distribution network disaster prevention and mitigation,there is an urgent need for in-depth research on risk assessment methods of distribution networks under extreme natural disaster conditions.This paper accessesmultisource data,presents the data quality improvement methods of distribution networks,and conducts data-driven active fault diagnosis and disaster damage analysis and evaluation using data-driven theory.Furthermore,the paper realizes real-time,accurate access to distribution network disaster information.The proposed approach performs an accurate and rapid assessment of cross-sectional risk through case study.The minimal average annual outage time can be reduced to 3 h/a in the ring network through case study.The approach proposed in this paper can provide technical support to the further improvement of the ability of distribution networks to cope with extreme natural disasters.展开更多
In this case study, we hypothesized that sympathetic nerve activity would be higher during conversation with PALRO robot, and that conversation would result in an increase in cerebral blood flow near the Broca’s area...In this case study, we hypothesized that sympathetic nerve activity would be higher during conversation with PALRO robot, and that conversation would result in an increase in cerebral blood flow near the Broca’s area. The facial expressions of a human subject were recorded, and cerebral blood flow and heart rate variability were measured during interactions with the humanoid robot. These multimodal data were time-synchronized to quantitatively verify the change from the resting baseline by testing facial expression analysis, cerebral blood flow, and heart rate variability. In conclusion, this subject indicated that sympathetic nervous activity was dominant, suggesting that the subject may have enjoyed and been excited while talking to the robot (normalized High Frequency < normalized Low Frequency: 0.22 ± 0.16 < 0.78 ± 0.16). Cerebral blood flow values were higher during conversation and in the resting state after the experiment than in the resting state before the experiment. Talking increased cerebral blood flow in the frontal region. As the subject was left-handed, it was confirmed that the right side of the brain, where the Broca’s area is located, was particularly activated (Left < right: 0.15 ± 0.21 < 1.25 ± 0.17). In the sections where a “happy” facial emotion was recognized, the examiner-judged “happy” faces and the MTCNN “happy” results were also generally consistent.展开更多
To serve as a reference for future foreign tourism study,relevant tourist sectors have done in-depth investigations on foreign tourism both domestically and internationally.A study of outbound tourism activities from ...To serve as a reference for future foreign tourism study,relevant tourist sectors have done in-depth investigations on foreign tourism both domestically and internationally.A study of outbound tourism activities from the viewpoint of tourists can examine its development law and create successful marketing tactics based on the rise in the number of foreign tourists.Based on this,this study suggests a data mining technique to examine the variations in travel needs and marketing tactics among various consumer groups.The combined example analysis demonstrates how logical and useful our data mining analysis is.Our data tests demonstrate that the tourism strategy outlined in this paper can enhance the number of tourists by piquing their interest based on the rise in the number of international travellers travelling overseas.展开更多
基金supported by the Major Science and Technology Project of Gansu Province(No.22ZD6FA021-5)the Industrial Support Project of Gansu Province(Nos.2023CYZC-19 and 2021CYZC-22)the Science and Technology Project of Gansu Province(Nos.23YFFA0074,22JR5RA137 and 22JR5RA151).
文摘To obtain more stable spectral data for accurate quantitative analysis of multi-element,especially for the large-area in-situ elements detection of soils, we propose a method for a multielement quantitative analysis of soils using calibration-free laser-induced breakdown spectroscopy(CF-LIBS) based on data filtering. In this study, we analyze a standard soil sample doped with two heavy metal elements, Cu and Cd, with a specific focus on the line of Cu I324.75 nm for filtering the experimental data of multiple sample sets. Pre-and post-data filtering,the relative standard deviation for Cu decreased from 30% to 10%, The limits of detection(LOD)values for Cu and Cd decreased by 5% and 4%, respectively. Through CF-LIBS, a quantitative analysis was conducted to determine the relative content of elements in soils. Using Cu as a reference, the concentration of Cd was accurately calculated. The results show that post-data filtering, the average relative error of the Cd decreases from 11% to 5%, indicating the effectiveness of data filtering in improving the accuracy of quantitative analysis. Moreover, the content of Si, Fe and other elements can be accurately calculated using this method. To further correct the calculation, the results for Cd was used to provide a more precise calculation. This approach is of great importance for the large-area in-situ heavy metals and trace elements detection in soil, as well as for rapid and accurate quantitative analysis.
基金funded by the National Natural Science Foundation of China(NSFC)the Chinese Academy of Sciences(CAS)(grant No.U2031209)the National Natural Science Foundation of China(NSFC,grant Nos.11872128,42174192,and 91952111)。
文摘Seeing is an important index to evaluate the quality of an astronomical site.To estimate seeing at the Muztagh-Ata site with height and time quantitatively,the European Centre for Medium-Range Weather Forecasts reanalysis database(ERA5)is used.Seeing calculated from ERA5 is compared consistently with the Differential Image Motion Monitor seeing at the height of 12 m.Results show that seeing decays exponentially with height at the Muztagh-Ata site.Seeing decays the fastest in fall in 2021 and most slowly with height in summer.The seeing condition is better in fall than in summer.The median value of seeing at 12 m is 0.89 arcsec,the maximum value is1.21 arcsec in August and the minimum is 0.66 arcsec in October.The median value of seeing at 12 m is 0.72arcsec in the nighttime and 1.08 arcsec in the daytime.Seeing is a combination of annual and about biannual variations with the same phase as temperature and wind speed indicating that seeing variation with time is influenced by temperature and wind speed.The Richardson number Ri is used to analyze the atmospheric stability and the variations of seeing are consistent with Ri between layers.These quantitative results can provide an important reference for a telescopic observation strategy.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation (IITP)grant funded by the Korean government (MSIT) (No.2022-0-00369)by the NationalResearch Foundation of Korea Grant funded by the Korean government (2018R1A5A1060031,2022R1F1A1065664).
文摘How can we efficiently store and mine dynamically generated dense tensors for modeling the behavior of multidimensional dynamic data?Much of the multidimensional dynamic data in the real world is generated in the form of time-growing tensors.For example,air quality tensor data consists of multiple sensory values gathered from wide locations for a long time.Such data,accumulated over time,is redundant and consumes a lot ofmemory in its raw form.We need a way to efficiently store dynamically generated tensor data that increase over time and to model their behavior on demand between arbitrary time blocks.To this end,we propose a Block IncrementalDense Tucker Decomposition(BID-Tucker)method for efficient storage and on-demand modeling ofmultidimensional spatiotemporal data.Assuming that tensors come in unit blocks where only the time domain changes,our proposed BID-Tucker first slices the blocks into matrices and decomposes them via singular value decomposition(SVD).The SVDs of the time×space sliced matrices are stored instead of the raw tensor blocks to save space.When modeling from data is required at particular time blocks,the SVDs of corresponding time blocks are retrieved and incremented to be used for Tucker decomposition.The factor matrices and core tensor of the decomposed results can then be used for further data analysis.We compared our proposed BID-Tucker with D-Tucker,which our method extends,and vanilla Tucker decomposition.We show that our BID-Tucker is faster than both D-Tucker and vanilla Tucker decomposition and uses less memory for storage with a comparable reconstruction error.We applied our proposed BID-Tucker to model the spatial and temporal trends of air quality data collected in South Korea from 2018 to 2022.We were able to model the spatial and temporal air quality trends.We were also able to verify unusual events,such as chronic ozone alerts and large fire events.
文摘Peanut allergy is majorly related to severe food induced allergic reactions.Several food including cow's milk,hen's eggs,soy,wheat,peanuts,tree nuts(walnuts,hazelnuts,almonds,cashews,pecans and pistachios),fish and shellfish are responsible for more than 90%of food allergies.Here,we provide promising insights using a large-scale data-driven analysis,comparing the mechanistic feature and biological relevance of different ingredients presents in peanuts,tree nuts(walnuts,almonds,cashews,pecans and pistachios)and soybean.Additionally,we have analysed the chemical compositions of peanuts in different processed form raw,boiled and dry-roasted.Using the data-driven approach we are able to generate new hypotheses to explain why nuclear receptors like the peroxisome proliferator-activated receptors(PPARs)and its isoform and their interaction with dietary lipids may have significant effect on allergic response.The results obtained from this study will direct future experimeantal and clinical studies to understand the role of dietary lipids and PPARisoforms to exert pro-inflammatory or anti-inflammatory functions on cells of the innate immunity and influence antigen presentation to the cells of the adaptive immunity.
基金supported by STI 2030-Major Projects 2021ZD0200400National Natural Science Foundation of China(62276233 and 62072405)Key Research Project of Zhejiang Province(2023C01048).
文摘Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.
基金supported in part by the MOST Major Research and Development Project(Grant No.2021YFB2900204)the National Natural Science Foundation of China(NSFC)(Grant No.62201123,No.62132004,No.61971102)+3 种基金China Postdoctoral Science Foundation(Grant No.2022TQ0056)in part by the financial support of the Sichuan Science and Technology Program(Grant No.2022YFH0022)Sichuan Major R&D Project(Grant No.22QYCX0168)the Municipal Government of Quzhou(Grant No.2022D031)。
文摘Integrated data and energy transfer(IDET)enables the electromagnetic waves to transmit wireless energy at the same time of data delivery for lowpower devices.In this paper,an energy harvesting modulation(EHM)assisted multi-user IDET system is studied,where all the received signals at the users are exploited for energy harvesting without the degradation of wireless data transfer(WDT)performance.The joint IDET performance is then analysed theoretically by conceiving a practical time-dependent wireless channel.With the aid of the AO based algorithm,the average effective data rate among users are maximized by ensuring the BER and the wireless energy transfer(WET)performance.Simulation results validate and evaluate the IDET performance of the EHM assisted system,which also demonstrates that the optimal number of user clusters and IDET time slots should be allocated,in order to improve the WET and WDT performance.
文摘Microsoft Excel is essential for the End-User Approach (EUA), offering versatility in data organization, analysis, and visualization, as well as widespread accessibility. It fosters collaboration and informed decision-making across diverse domains. Conversely, Python is indispensable for professional programming due to its versatility, readability, extensive libraries, and robust community support. It enables efficient development, advanced data analysis, data mining, and automation, catering to diverse industries and applications. However, one primary issue when using Microsoft Excel with Python libraries is compatibility and interoperability. While Excel is a widely used tool for data storage and analysis, it may not seamlessly integrate with Python libraries, leading to challenges in reading and writing data, especially in complex or large datasets. Additionally, manipulating Excel files with Python may not always preserve formatting or formulas accurately, potentially affecting data integrity. Moreover, dependency on Excel’s graphical user interface (GUI) for automation can limit scalability and reproducibility compared to Python’s scripting capabilities. This paper covers the integration solution of empowering non-programmers to leverage Python’s capabilities within the familiar Excel environment. This enables users to perform advanced data analysis and automation tasks without requiring extensive programming knowledge. Based on Soliciting feedback from non-programmers who have tested the integration solution, the case study shows how the solution evaluates the ease of implementation, performance, and compatibility of Python with Excel versions.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
文摘This research paper compares Excel and R language for data analysis and concludes that R language is more suitable for complex data analysis tasks.R language’s open-source nature makes it accessible to everyone,and its powerful data management and analysis tools make it suitable for handling complex data analysis tasks.It is also highly customizable,allowing users to create custom functions and packages to meet their specific needs.Additionally,R language provides high reproducibility,making it easy to replicate and verify research results,and it has excellent collaboration capabilities,enabling multiple users to work on the same project simultaneously.These advantages make R language a more suitable choice for complex data analysis tasks,particularly in scientific research and business applications.The findings of this study will help people understand that R is not just a language that can handle more data than Excel and demonstrate that r is essential to the field of data analysis.At the same time,it will also help users and organizations make informed decisions regarding their data analysis needs and software preferences.
文摘This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,traditional data analysis methods have been unable to meet the needs.Research methods include building neural networks and deep learning models,optimizing and improving them through Bayesian analysis,and applying them to the visualization of large-scale data sets.The results show that the neural network combined with Bayesian analysis and deep learning method can effectively improve the accuracy and efficiency of data visualization,and enhance the intuitiveness and depth of data interpretation.The significance of the research is that it provides a new solution for data visualization in the big data environment and helps to further promote the development and application of data science.
基金The Deanship of ScientificResearch (DSR)at King Abdulaziz University,Jeddah,Saudi Arabia has funded this project,under Grant No. (FP-205-43).
文摘The outbreak of the pandemic,caused by Coronavirus Disease 2019(COVID-19),has affected the daily activities of people across the globe.During COVID-19 outbreak and the successive lockdowns,Twitter was heavily used and the number of tweets regarding COVID-19 increased tremendously.Several studies used Sentiment Analysis(SA)to analyze the emotions expressed through tweets upon COVID-19.Therefore,in current study,a new Artificial Bee Colony(ABC)with Machine Learning-driven SA(ABCMLSA)model is developed for conducting Sentiment Analysis of COVID-19 Twitter data.The prime focus of the presented ABCML-SA model is to recognize the sentiments expressed in tweets made uponCOVID-19.It involves data pre-processing at the initial stage followed by n-gram based feature extraction to derive the feature vectors.For identification and classification of the sentiments,the Support Vector Machine(SVM)model is exploited.At last,the ABC algorithm is applied to fine tune the parameters involved in SVM.To demonstrate the improved performance of the proposed ABCML-SA model,a sequence of simulations was conducted.The comparative assessment results confirmed the effectual performance of the proposed ABCML-SA model over other approaches.
文摘In the nonparametric data envelopment analysis literature,scale elasticity is evaluated in two alternative ways:using either the technical efficiency model or the cost efficiency model.This evaluation becomes problematic in several situations,for example(a)when input proportions change in the long run,(b)when inputs are heterogeneous,and(c)when firms face ex-ante price uncertainty in making their production decisions.To address these situations,a scale elasticity evaluation was performed using a value-based cost efficiency model.However,this alternative value-based scale elasticity evaluation is sensitive to the uncertainty and variability underlying input and output data.Therefore,in this study,we introduce a stochastic cost-efficiency model based on chance-constrained programming to develop a value-based measure of the scale elasticity of firms facing data uncertainty.An illustrative empirical application to the Indian banking industry comprising 71 banks for eight years(1998–2005)was made to compare inferences about their efficiency and scale properties.The key findings are as follows:First,both the deterministic model and our proposed stochastic model yield distinctly different results concerning the efficiency and scale elasticity scores at various tolerance levels of chance constraints.However,both models yield the same results at a tolerance level of 0.5,implying that the deterministic model is a special case of the stochastic model in that it reveals the same efficiency and returns to scale characterizations of banks.Second,the stochastic model generates higher efficiency scores for inefficient banks than its deterministic counterpart.Third,public banks exhibit higher efficiency than private and foreign banks.Finally,public and old private banks mostly exhibit either decreasing or constant returns to scale,whereas foreign and new private banks experience either increasing or decreasing returns to scale.Although the application of our proposed stochastic model is illustrative,it can be potentially applied to all firms in the information and distribution-intensive industry with high fixed costs,which have ample potential for reaping scale and scope benefits.
基金Supported by the NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization(U1909208)the Science and Technology Major Project of Changsha(kh2202004)the Changsha Municipal Natural Science Foundation(kq2202106).
文摘Electrocardiogram(ECG)is a low-cost,simple,fast,and non-invasive test.It can reflect the heart’s electrical activity and provide valuable diagnostic clues about the health of the entire body.Therefore,ECG has been widely used in various biomedical applications such as arrhythmia detection,disease-specific detection,mortality prediction,and biometric recognition.In recent years,ECG-related studies have been carried out using a variety of publicly available datasets,with many differences in the datasets used,data preprocessing methods,targeted challenges,and modeling and analysis techniques.Here we systematically summarize and analyze the ECGbased automatic analysis methods and applications.Specifically,we first reviewed 22 commonly used ECG public datasets and provided an overview of data preprocessing processes.Then we described some of the most widely used applications of ECG signals and analyzed the advanced methods involved in these applications.Finally,we elucidated some of the challenges in ECG analysis and provided suggestions for further research.
文摘Getting insight into the spatiotemporal distribution patterns of knowledge innovation is receiving increasing attention from policymakers and economic research organizations.Many studies use bibliometric data to analyze the popularity of certain research topics,well-adopted methodologies,influential authors,and the interrelationships among research disciplines.However,the visual exploration of the patterns of research topics with an emphasis on their spatial and temporal distribution remains challenging.This study combined a Space-Time Cube(STC)and a 3D glyph to represent the complex multivariate bibliographic data.We further implemented a visual design by developing an interactive interface.The effectiveness,understandability,and engagement of ST-Map are evaluated by seven experts in geovisualization.The results suggest that it is promising to use three-dimensional visualization to show the overview and on-demand details on a single screen.
基金suppor ted by the National Key Research and Development Program of China (2022YFC2702502)the National Natural Science Foundation of China (32170742, 31970646, and 32060152)+7 种基金the Start Fund for Specially Appointed Professor of Jiangsu ProvinceHainan Province Science and Technology Special Fund (ZDYF2021SHFZ051)the Natural Science Foundation of Hainan Province (820MS053)the Start Fund for High-level Talents of Nanjing Medical University (NMUR2020009)the Marshal Initiative Funding of Hainan Medical University (JBGS202103)the Hainan Province Clinical Medical Center (QWYH202175)the Bioinformatics for Major Diseases Science Innovation Group of Hainan Medical Universitythe Shenzhen Science and Technology Program (JCYJ20210324140407021)
文摘The application of single-cell RNA sequencing(scRNA-seq)in biomedical research has advanced our understanding of the pathogenesis of disease and provided valuable insights into new diagnostic and therapeutic strategies.With the expansion of capacity for high-throughput scRNA-seq,including clinical samples,the analysis of these huge volumes of data has become a daunting prospect for researchers entering this field.Here,we review the workflow for typical scRNA-seq data analysis,covering raw data processing and quality control,basic data analysis applicable for almost all scRNA-seq data sets,and advanced data analysis that should be tailored to specific scientific questions.While summarizing the current methods for each analysis step,we also provide an online repository of software and wrapped-up scripts to support the implementation.Recommendations and caveats are pointed out for some specific analysis tasks and approaches.We hope this resource will be helpful to researchers engaging with scRNA-seq,in particular for emerging clinical applications.
基金National Key R&D Program of China(Grant No.2018YFB1701701)Sailing Talent Program+1 种基金Guangdong Provincial Science and Technologies Program of China(Grant No.2017B090922008)Special Grand Grant from Tianjin City Government of China。
文摘Big data on product sales are an emerging resource for supporting modular product design to meet diversified customers’requirements of product specification combinations.To better facilitate decision-making of modular product design,correlations among specifications and components originated from customers’conscious and subconscious preferences can be investigated by using big data on product sales.This study proposes a framework and the associated methods for supporting modular product design decisions based on correlation analysis of product specifications and components using big sales data.The correlations of the product specifications are determined by analyzing the collected product sales data.By building the relations between the product components and specifications,a matrix for measuring the correlation among product components is formed for component clustering.Six rules for supporting the decision making of modular product design are proposed based on the frequency analysis of the specification values per component cluster.A case study of electric vehicles illustrates the application of the proposed method.
基金supported by the major scientific and technological research project of Chongqing Education Commission(KJZD-M202000802)The first batch of Industrial and Informatization Key Special Fund Support Projects in Chongqing in 2022(2022000537).
文摘As COVID-19 poses a major threat to people’s health and economy,there is an urgent need for forecasting methodologies that can anticipate its trajectory efficiently.In non-stationary time series forecasting jobs,there is frequently a hysteresis in the anticipated values relative to the real values.The multilayer deep-time convolutional network and a feature fusion network are combined in this paper’s proposal of an enhanced Multilayer Deep Time Convolutional Neural Network(MDTCNet)for COVID-19 prediction to address this problem.In particular,it is possible to record the deep features and temporal dependencies in uncertain time series,and the features may then be combined using a feature fusion network and a multilayer perceptron.Last but not least,the experimental verification is conducted on the prediction task of COVID-19 real daily confirmed cases in the world and the United States with uncertainty,realizing the short-term and long-term prediction of COVID-19 daily confirmed cases,and verifying the effectiveness and accuracy of the suggested prediction method,as well as reducing the hysteresis of the prediction results.
文摘Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable operation of distribution networks and power supplies needed for daily life.Therefore,considering the requirements for distribution network disaster prevention and mitigation,there is an urgent need for in-depth research on risk assessment methods of distribution networks under extreme natural disaster conditions.This paper accessesmultisource data,presents the data quality improvement methods of distribution networks,and conducts data-driven active fault diagnosis and disaster damage analysis and evaluation using data-driven theory.Furthermore,the paper realizes real-time,accurate access to distribution network disaster information.The proposed approach performs an accurate and rapid assessment of cross-sectional risk through case study.The minimal average annual outage time can be reduced to 3 h/a in the ring network through case study.The approach proposed in this paper can provide technical support to the further improvement of the ability of distribution networks to cope with extreme natural disasters.
文摘In this case study, we hypothesized that sympathetic nerve activity would be higher during conversation with PALRO robot, and that conversation would result in an increase in cerebral blood flow near the Broca’s area. The facial expressions of a human subject were recorded, and cerebral blood flow and heart rate variability were measured during interactions with the humanoid robot. These multimodal data were time-synchronized to quantitatively verify the change from the resting baseline by testing facial expression analysis, cerebral blood flow, and heart rate variability. In conclusion, this subject indicated that sympathetic nervous activity was dominant, suggesting that the subject may have enjoyed and been excited while talking to the robot (normalized High Frequency < normalized Low Frequency: 0.22 ± 0.16 < 0.78 ± 0.16). Cerebral blood flow values were higher during conversation and in the resting state after the experiment than in the resting state before the experiment. Talking increased cerebral blood flow in the frontal region. As the subject was left-handed, it was confirmed that the right side of the brain, where the Broca’s area is located, was particularly activated (Left < right: 0.15 ± 0.21 < 1.25 ± 0.17). In the sections where a “happy” facial emotion was recognized, the examiner-judged “happy” faces and the MTCNN “happy” results were also generally consistent.
基金2021 Youth Innovation Talents Project of Universities in Guangdong Province“Cause Analysis and Countermeasure Research on the Difference of Tourism Resources Development and Marketing Weakening in Underdeveloped Regions of Western Guangdong”(Project No.2021WQNCX241).
文摘To serve as a reference for future foreign tourism study,relevant tourist sectors have done in-depth investigations on foreign tourism both domestically and internationally.A study of outbound tourism activities from the viewpoint of tourists can examine its development law and create successful marketing tactics based on the rise in the number of foreign tourists.Based on this,this study suggests a data mining technique to examine the variations in travel needs and marketing tactics among various consumer groups.The combined example analysis demonstrates how logical and useful our data mining analysis is.Our data tests demonstrate that the tourism strategy outlined in this paper can enhance the number of tourists by piquing their interest based on the rise in the number of international travellers travelling overseas.