Enterprise Business Intelligence(BI)system refers to data mining through the existing database of the enterprise,and data analysis according to customer requirements through comprehensive processing.The data analysis ...Enterprise Business Intelligence(BI)system refers to data mining through the existing database of the enterprise,and data analysis according to customer requirements through comprehensive processing.The data analysis efficiency is high and the operation is convenient.This paper mainly analyzes the application of enterprise BI data analysis system in enterprises.展开更多
Seeing is an important index to evaluate the quality of an astronomical site.To estimate seeing at the Muztagh-Ata site with height and time quantitatively,the European Centre for Medium-Range Weather Forecasts reanal...Seeing is an important index to evaluate the quality of an astronomical site.To estimate seeing at the Muztagh-Ata site with height and time quantitatively,the European Centre for Medium-Range Weather Forecasts reanalysis database(ERA5)is used.Seeing calculated from ERA5 is compared consistently with the Differential Image Motion Monitor seeing at the height of 12 m.Results show that seeing decays exponentially with height at the Muztagh-Ata site.Seeing decays the fastest in fall in 2021 and most slowly with height in summer.The seeing condition is better in fall than in summer.The median value of seeing at 12 m is 0.89 arcsec,the maximum value is1.21 arcsec in August and the minimum is 0.66 arcsec in October.The median value of seeing at 12 m is 0.72arcsec in the nighttime and 1.08 arcsec in the daytime.Seeing is a combination of annual and about biannual variations with the same phase as temperature and wind speed indicating that seeing variation with time is influenced by temperature and wind speed.The Richardson number Ri is used to analyze the atmospheric stability and the variations of seeing are consistent with Ri between layers.These quantitative results can provide an important reference for a telescopic observation strategy.展开更多
Peanut allergy is majorly related to severe food induced allergic reactions.Several food including cow's milk,hen's eggs,soy,wheat,peanuts,tree nuts(walnuts,hazelnuts,almonds,cashews,pecans and pistachios),fis...Peanut allergy is majorly related to severe food induced allergic reactions.Several food including cow's milk,hen's eggs,soy,wheat,peanuts,tree nuts(walnuts,hazelnuts,almonds,cashews,pecans and pistachios),fish and shellfish are responsible for more than 90%of food allergies.Here,we provide promising insights using a large-scale data-driven analysis,comparing the mechanistic feature and biological relevance of different ingredients presents in peanuts,tree nuts(walnuts,almonds,cashews,pecans and pistachios)and soybean.Additionally,we have analysed the chemical compositions of peanuts in different processed form raw,boiled and dry-roasted.Using the data-driven approach we are able to generate new hypotheses to explain why nuclear receptors like the peroxisome proliferator-activated receptors(PPARs)and its isoform and their interaction with dietary lipids may have significant effect on allergic response.The results obtained from this study will direct future experimeantal and clinical studies to understand the role of dietary lipids and PPARisoforms to exert pro-inflammatory or anti-inflammatory functions on cells of the innate immunity and influence antigen presentation to the cells of the adaptive immunity.展开更多
Microsoft Excel is essential for the End-User Approach (EUA), offering versatility in data organization, analysis, and visualization, as well as widespread accessibility. It fosters collaboration and informed decision...Microsoft Excel is essential for the End-User Approach (EUA), offering versatility in data organization, analysis, and visualization, as well as widespread accessibility. It fosters collaboration and informed decision-making across diverse domains. Conversely, Python is indispensable for professional programming due to its versatility, readability, extensive libraries, and robust community support. It enables efficient development, advanced data analysis, data mining, and automation, catering to diverse industries and applications. However, one primary issue when using Microsoft Excel with Python libraries is compatibility and interoperability. While Excel is a widely used tool for data storage and analysis, it may not seamlessly integrate with Python libraries, leading to challenges in reading and writing data, especially in complex or large datasets. Additionally, manipulating Excel files with Python may not always preserve formatting or formulas accurately, potentially affecting data integrity. Moreover, dependency on Excel’s graphical user interface (GUI) for automation can limit scalability and reproducibility compared to Python’s scripting capabilities. This paper covers the integration solution of empowering non-programmers to leverage Python’s capabilities within the familiar Excel environment. This enables users to perform advanced data analysis and automation tasks without requiring extensive programming knowledge. Based on Soliciting feedback from non-programmers who have tested the integration solution, the case study shows how the solution evaluates the ease of implementation, performance, and compatibility of Python with Excel versions.展开更多
This research paper compares Excel and R language for data analysis and concludes that R language is more suitable for complex data analysis tasks.R language’s open-source nature makes it accessible to everyone,and i...This research paper compares Excel and R language for data analysis and concludes that R language is more suitable for complex data analysis tasks.R language’s open-source nature makes it accessible to everyone,and its powerful data management and analysis tools make it suitable for handling complex data analysis tasks.It is also highly customizable,allowing users to create custom functions and packages to meet their specific needs.Additionally,R language provides high reproducibility,making it easy to replicate and verify research results,and it has excellent collaboration capabilities,enabling multiple users to work on the same project simultaneously.These advantages make R language a more suitable choice for complex data analysis tasks,particularly in scientific research and business applications.The findings of this study will help people understand that R is not just a language that can handle more data than Excel and demonstrate that r is essential to the field of data analysis.At the same time,it will also help users and organizations make informed decisions regarding their data analysis needs and software preferences.展开更多
Reviewing the empirical and theoretical parameter relationships between various parameters is a good way to understand more about contact binary systems.In this investigation,two-dimensional(2D)relationships for P–MV...Reviewing the empirical and theoretical parameter relationships between various parameters is a good way to understand more about contact binary systems.In this investigation,two-dimensional(2D)relationships for P–MV(system),P–L1,2,M1,2–L1,2,and q–Lratiowere revisited.The sample used is related to 118 contact binary systems with an orbital period shorter than 0.6 days whose absolute parameters were estimated based on the Gaia Data Release 3 parallax.We reviewed previous studies on 2D relationships and updated six parameter relationships.Therefore,Markov chain Monte Carlo and Machine Learning methods were used,and the outcomes were compared.We selected 22 contact binary systems from eight previous studies for comparison,which had light curve solutions using spectroscopic data.The results show that the systems are in good agreement with the results of this study.展开更多
The application of single-cell RNA sequencing(scRNA-seq)in biomedical research has advanced our understanding of the pathogenesis of disease and provided valuable insights into new diagnostic and therapeutic strategie...The application of single-cell RNA sequencing(scRNA-seq)in biomedical research has advanced our understanding of the pathogenesis of disease and provided valuable insights into new diagnostic and therapeutic strategies.With the expansion of capacity for high-throughput scRNA-seq,including clinical samples,the analysis of these huge volumes of data has become a daunting prospect for researchers entering this field.Here,we review the workflow for typical scRNA-seq data analysis,covering raw data processing and quality control,basic data analysis applicable for almost all scRNA-seq data sets,and advanced data analysis that should be tailored to specific scientific questions.While summarizing the current methods for each analysis step,we also provide an online repository of software and wrapped-up scripts to support the implementation.Recommendations and caveats are pointed out for some specific analysis tasks and approaches.We hope this resource will be helpful to researchers engaging with scRNA-seq,in particular for emerging clinical applications.展开更多
In the nonparametric data envelopment analysis literature,scale elasticity is evaluated in two alternative ways:using either the technical efficiency model or the cost efficiency model.This evaluation becomes problema...In the nonparametric data envelopment analysis literature,scale elasticity is evaluated in two alternative ways:using either the technical efficiency model or the cost efficiency model.This evaluation becomes problematic in several situations,for example(a)when input proportions change in the long run,(b)when inputs are heterogeneous,and(c)when firms face ex-ante price uncertainty in making their production decisions.To address these situations,a scale elasticity evaluation was performed using a value-based cost efficiency model.However,this alternative value-based scale elasticity evaluation is sensitive to the uncertainty and variability underlying input and output data.Therefore,in this study,we introduce a stochastic cost-efficiency model based on chance-constrained programming to develop a value-based measure of the scale elasticity of firms facing data uncertainty.An illustrative empirical application to the Indian banking industry comprising 71 banks for eight years(1998–2005)was made to compare inferences about their efficiency and scale properties.The key findings are as follows:First,both the deterministic model and our proposed stochastic model yield distinctly different results concerning the efficiency and scale elasticity scores at various tolerance levels of chance constraints.However,both models yield the same results at a tolerance level of 0.5,implying that the deterministic model is a special case of the stochastic model in that it reveals the same efficiency and returns to scale characterizations of banks.Second,the stochastic model generates higher efficiency scores for inefficient banks than its deterministic counterpart.Third,public banks exhibit higher efficiency than private and foreign banks.Finally,public and old private banks mostly exhibit either decreasing or constant returns to scale,whereas foreign and new private banks experience either increasing or decreasing returns to scale.Although the application of our proposed stochastic model is illustrative,it can be potentially applied to all firms in the information and distribution-intensive industry with high fixed costs,which have ample potential for reaping scale and scope benefits.展开更多
Electrocardiogram(ECG)is a low-cost,simple,fast,and non-invasive test.It can reflect the heart’s electrical activity and provide valuable diagnostic clues about the health of the entire body.Therefore,ECG has been wi...Electrocardiogram(ECG)is a low-cost,simple,fast,and non-invasive test.It can reflect the heart’s electrical activity and provide valuable diagnostic clues about the health of the entire body.Therefore,ECG has been widely used in various biomedical applications such as arrhythmia detection,disease-specific detection,mortality prediction,and biometric recognition.In recent years,ECG-related studies have been carried out using a variety of publicly available datasets,with many differences in the datasets used,data preprocessing methods,targeted challenges,and modeling and analysis techniques.Here we systematically summarize and analyze the ECGbased automatic analysis methods and applications.Specifically,we first reviewed 22 commonly used ECG public datasets and provided an overview of data preprocessing processes.Then we described some of the most widely used applications of ECG signals and analyzed the advanced methods involved in these applications.Finally,we elucidated some of the challenges in ECG analysis and provided suggestions for further research.展开更多
As COVID-19 poses a major threat to people’s health and economy,there is an urgent need for forecasting methodologies that can anticipate its trajectory efficiently.In non-stationary time series forecasting jobs,ther...As COVID-19 poses a major threat to people’s health and economy,there is an urgent need for forecasting methodologies that can anticipate its trajectory efficiently.In non-stationary time series forecasting jobs,there is frequently a hysteresis in the anticipated values relative to the real values.The multilayer deep-time convolutional network and a feature fusion network are combined in this paper’s proposal of an enhanced Multilayer Deep Time Convolutional Neural Network(MDTCNet)for COVID-19 prediction to address this problem.In particular,it is possible to record the deep features and temporal dependencies in uncertain time series,and the features may then be combined using a feature fusion network and a multilayer perceptron.Last but not least,the experimental verification is conducted on the prediction task of COVID-19 real daily confirmed cases in the world and the United States with uncertainty,realizing the short-term and long-term prediction of COVID-19 daily confirmed cases,and verifying the effectiveness and accuracy of the suggested prediction method,as well as reducing the hysteresis of the prediction results.展开更多
Today,we are living in the era of“big data”where massive amounts of data are used for quantitative decisions and communication management.With the continuous penetration of big data-based intelligent technology in a...Today,we are living in the era of“big data”where massive amounts of data are used for quantitative decisions and communication management.With the continuous penetration of big data-based intelligent technology in all fields of human life,the enormous commercial value inherent in the data industry has become a crucial force that drives the aggregation of new industries.For the publishing industry,the introduction of big data and relevant intelligent technologies,such as data intelligence analysis and scenario services,into the structure and value system of the publishing industry,has become an effective path to expanding and reshaping the demand space of publishing products,content decisions,workflow chain,and marketing direction.In the integration and reconstruction of big data,cloud computing,artificial intelligence,and other related technologies,it is expected that a generalized publishing industry pattern dominated by virtual interaction will be formed in the future.展开更多
Law enforcement remains to be the main strategy used to combat poaching and account for high budget share in protected area management. Studies on efficiency of wildlife law enforcement in the protected areas are limi...Law enforcement remains to be the main strategy used to combat poaching and account for high budget share in protected area management. Studies on efficiency of wildlife law enforcement in the protected areas are limited. This study analyzed economic efficiency of wildlife law enforcement in terms of resource used and output generated using three different protected areas (PAs) of Serengeti ecosystem namely Serengeti National Park (SENAPA), Ikorongo/Grumeti Game Reserves (IGGR) and Ikona Wildlife Management Area (IWMA). Three years (2010-2012) monthly data on wildlife law enforcement inputs and outputs were collected from respective PAs authorities and supplemented with key informant interviews and secondary data. Questionnaire surveys were conducted to wildlife law enforcement staff. Shadow prices for non-marketed inputs were estimated, and market prices for marketed inputs. Data Envelopment Analysis (DEA) was used to estimate economic efficiency using Variable Return to Scale (VRS) and Constant Return to Scale (CCR) assumptions. Results revealed that wildlife law enforcement in all PAs was economically inefficient, with less inefficiency observed in IWMA. The less inefficiency in IWMA is likely attributed to existing sense of ownership and responsibility created through community-based conservation which resulted in to decrease in law enforcement costs. A slacks evaluation revealed a potential to reduce fuel consumption, number of patrol vehicles, ration and prosecution efforts at different magnitudes between studied protected areas. There is equal potential to recruit more rangers while maintaining the resting time. These finding forms the bases for monitoring and evaluation with respect to resource usage to enhance efficiency. It is further recommended to enhance community participation in conservation in SENAPA and IGGR to lower law enforcement costs. Collaboration between protected area, police and judiciary is fundamental to enhance enforcement efficiency. Despite old dataset, these findings are relevant since neither conservation policy nor institution framework has changed substantially in the last decade.展开更多
With the rapid development of the Internet,many enterprises have launched their network platforms.When users browse,search,and click the products of these platforms,most platforms will keep records of these network be...With the rapid development of the Internet,many enterprises have launched their network platforms.When users browse,search,and click the products of these platforms,most platforms will keep records of these network behaviors,these records are often heterogeneous,and it is called log data.To effectively to analyze and manage these heterogeneous log data,so that enterprises can grasp the behavior characteristics of their platform users in time,to realize targeted recommendation of users,increase the sales volume of enterprises’products,and accelerate the development of enterprises.Firstly,we follow the process of big data collection,storage,analysis,and visualization to design the system,then,we adopt HDFS storage technology,Yarn resource management technology,and gink load balancing technology to build a Hadoop cluster to process the log data,and adopt MapReduce processing technology and data warehouse hive technology analyze the log data to obtain the results.Finally,the obtained results are displayed visually,and a log data analysis system is successfully constructed.It has been proved by practice that the system effectively realizes the collection,analysis and visualization of log data,and can accurately realize the recommendation of products by enterprises.The system is stable and effective.展开更多
Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for rep...Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.展开更多
For the ASO-S/HXI payload, the accuracy of the flare reconstruction is reliant on important factors such as the alignment of the dual grating and the precise measurement of observation orientation. To guarantee optima...For the ASO-S/HXI payload, the accuracy of the flare reconstruction is reliant on important factors such as the alignment of the dual grating and the precise measurement of observation orientation. To guarantee optimal functionality of the instrument throughout its life cycle, the Solar Aspect System (SAS) is imperative to ensure that measurements are accurate and reliable. This is achieved by capturing the target motion and utilizing a physical model-based inversion algorithm. However, the SAS optical system’s inversion model is a typical ill-posed inverse problem due to its optical parameters, which results in small target sampling errors triggering unacceptable shifts in the solution. To enhance inversion accuracy and make it more robust against observation errors, we suggest dividing the inversion operation into two stages based on the SAS spot motion model. First, the as-rigid-aspossible (ARAP) transformation algorithm calculates the relative rotations and an intermediate variable between the substrates. Second, we solve an inversion linear equation for the relative translation of the substrates, the offset of the optical axes, and the observation orientation. To address the ill-posed challenge, the Tikhonov method grounded on the discrepancy criterion and the maximum a posteriori (MAP) method founded on the Bayesian framework are utilized. The simulation results exhibit that the ARAP method achieves a solution with a rotational error of roughly±3 5 (1/2-quantile);both regularization techniques are successful in enhancing the stability of the solution, the variance of error in the MAP method is even smaller—it achieves a translational error of approximately±18μm (1/2-quantile) in comparison to the Tikhonov method’s error of around±24μm (1/2-quantile). Furthermore, the SAS practical application data indicates the method’s usability in this study. Lastly, this paper discusses the intrinsic interconnections between the regularization methods.展开更多
Very low frequency(VLF)signals are propagated between the ground-ionosphere.Multimode interference will cause the phase to show oscillatory changes with distance while propagating at night,leading to abnormalities in ...Very low frequency(VLF)signals are propagated between the ground-ionosphere.Multimode interference will cause the phase to show oscillatory changes with distance while propagating at night,leading to abnormalities in the received VLF signal.This study uses the VLF signal received in Qingdao City,Shandong Province,from the Russian Alpha navigation system to explore the multimode interference problem of VLF signal propagation.The characteristics of the effect of multimode interference phenomena on the phase are analyzed according to the variation of the phase of the VLF signal.However,the phase of VLF signals will also be affected by the X-ray and energetic particles that are released during the eruption of solar flares,therefore the two phenomena are studied in this work.It is concluded that the X-ray will not affect the phase of VLF signals at night,but the energetic particles will affect the phase change,and the influence of energetic particles should be excluded in the study of multimode interference phenomena.Using VLF signals for navigation positioning in degraded or unavailable GPS conditions is of great practical significance for VLF navigation systems as it can avoid the influence of multimode interference and improve positioning accuracy.展开更多
Attitude is one of the crucial parameters for space objects and plays a vital role in collision prediction and debris removal.Analyzing light curves to determine attitude is the most commonly used method.In photometri...Attitude is one of the crucial parameters for space objects and plays a vital role in collision prediction and debris removal.Analyzing light curves to determine attitude is the most commonly used method.In photometric observations,outliers may exist in the obtained light curves due to various reasons.Therefore,preprocessing is required to remove these outliers to obtain high quality light curves.Through statistical analysis,the reasons leading to outliers can be categorized into two main types:first,the brightness of the object significantly increases due to the passage of a star nearby,referred to as“stellar contamination,”and second,the brightness markedly decreases due to cloudy cover,referred to as“cloudy contamination.”The traditional approach of manually inspecting images for contamination is time-consuming and labor-intensive.However,we propose the utilization of machine learning methods as a substitute.Convolutional Neural Networks and SVMs are employed to identify cases of stellar contamination and cloudy contamination,achieving F1 scores of 1.00 and 0.98 on a test set,respectively.We also explore other machine learning methods such as ResNet-18 and Light Gradient Boosting Machine,then conduct comparative analyses of the results.展开更多
To address the problem of real-time processing of ultra-wide bandwidth pulsar baseband data,we designed and implemented a pulsar baseband data processing algorithm(PSRDP)based on GPU parallel computing technology.PSRD...To address the problem of real-time processing of ultra-wide bandwidth pulsar baseband data,we designed and implemented a pulsar baseband data processing algorithm(PSRDP)based on GPU parallel computing technology.PSRDP can perform operations such as baseband data unpacking,channel separation,coherent dedispersion,Stokes detection,phase and folding period prediction,and folding integration in GPU clusters.We tested the algorithm using the J0437-4715 pulsar baseband data generated by the CASPSR and Medusa backends of the Parkes,and the J0332+5434 pulsar baseband data generated by the self-developed backend of the Nan Shan Radio Telescope.We obtained the pulse profiles of each baseband data.Through experimental analysis,we have found that the pulse profiles generated by the PSRDP algorithm in this paper are essentially consistent with the processing results of Digital Signal Processing Software for Pulsar Astronomy(DSPSR),which verified the effectiveness of the PSRDP algorithm.Furthermore,using the same baseband data,we compared the processing speed of PSRDP with DSPSR,and the results showed that PSRDP was not slower than DSPSR in terms of speed.The theoretical and technical experience gained from the PSRDP algorithm research in this article lays a technical foundation for the real-time processing of QTT(Qi Tai radio Telescope)ultra-wide bandwidth pulsar baseband data.展开更多
Human living would be impossible without air quality. Consistent advancements in practically every aspect of contemporary human life have harmed air quality. Everyday industrial, transportation, and home activities tu...Human living would be impossible without air quality. Consistent advancements in practically every aspect of contemporary human life have harmed air quality. Everyday industrial, transportation, and home activities turn up dangerous contaminants in our surroundings. This study investigated two years’ worth of air quality and outlier detection data from two Indian cities. Studies on air pollution have used numerous types of methodologies, with various gases being seen as a vector whose components include gas concentration values for each observation per-formed. We use curves to represent the monthly average of daily gas emissions in our technique. The approach, which is based on functional depth, was used to find outliers in the city of Delhi and Kolkata’s gas emissions, and the outcomes were compared to those from the traditional method. In the evaluation and comparison of these models’ performances, the functional approach model studied well.展开更多
This article presents a comprehensive analysis of the current state of research on the English translation of Lu You’s poetry, utilizing a data sample comprising research papers published in the CNKI Full-text Databa...This article presents a comprehensive analysis of the current state of research on the English translation of Lu You’s poetry, utilizing a data sample comprising research papers published in the CNKI Full-text Database from 2001 to 2022. Employing rigorous longitudinal statistical methods, the study examines the progress achieved over the past two decades. Notably, domestic researchers have displayed considerable interest in the study of Lu You’s English translation works since 2001. The research on the English translation of Lu You’s poetry reveals a diverse range of perspectives, indicating a rich body of scholarship. However, several challenges persist, including insufficient research, limited translation coverage, and a noticeable focus on specific poems such as “Phoenix Hairpin” in the realm of English translation research. Consequently, there is ample room for improvement in the quality of research output on the English translation of Lu You’s poems, as well as its recognition within the academic community. Building on these findings, it is argued that future investigations pertaining to the English translation of Lu You’s poetry should transcend the boundaries of textual analysis and encompass broader theoretical perspectives and research methodologies. By undertaking this shift, scholars will develop a more profound comprehension of Lu You’s poetic works and make substantive contributions to the field of translation studies. Thus, this article aims to bridge the gap between past research endeavors and future possibilities, serving as a guide and inspiration for scholars to embark on a more nuanced and enriching exploration of Lu You’s poetry as well as other Chinese literature classics.展开更多
文摘Enterprise Business Intelligence(BI)system refers to data mining through the existing database of the enterprise,and data analysis according to customer requirements through comprehensive processing.The data analysis efficiency is high and the operation is convenient.This paper mainly analyzes the application of enterprise BI data analysis system in enterprises.
基金funded by the National Natural Science Foundation of China(NSFC)the Chinese Academy of Sciences(CAS)(grant No.U2031209)the National Natural Science Foundation of China(NSFC,grant Nos.11872128,42174192,and 91952111)。
文摘Seeing is an important index to evaluate the quality of an astronomical site.To estimate seeing at the Muztagh-Ata site with height and time quantitatively,the European Centre for Medium-Range Weather Forecasts reanalysis database(ERA5)is used.Seeing calculated from ERA5 is compared consistently with the Differential Image Motion Monitor seeing at the height of 12 m.Results show that seeing decays exponentially with height at the Muztagh-Ata site.Seeing decays the fastest in fall in 2021 and most slowly with height in summer.The seeing condition is better in fall than in summer.The median value of seeing at 12 m is 0.89 arcsec,the maximum value is1.21 arcsec in August and the minimum is 0.66 arcsec in October.The median value of seeing at 12 m is 0.72arcsec in the nighttime and 1.08 arcsec in the daytime.Seeing is a combination of annual and about biannual variations with the same phase as temperature and wind speed indicating that seeing variation with time is influenced by temperature and wind speed.The Richardson number Ri is used to analyze the atmospheric stability and the variations of seeing are consistent with Ri between layers.These quantitative results can provide an important reference for a telescopic observation strategy.
文摘Peanut allergy is majorly related to severe food induced allergic reactions.Several food including cow's milk,hen's eggs,soy,wheat,peanuts,tree nuts(walnuts,hazelnuts,almonds,cashews,pecans and pistachios),fish and shellfish are responsible for more than 90%of food allergies.Here,we provide promising insights using a large-scale data-driven analysis,comparing the mechanistic feature and biological relevance of different ingredients presents in peanuts,tree nuts(walnuts,almonds,cashews,pecans and pistachios)and soybean.Additionally,we have analysed the chemical compositions of peanuts in different processed form raw,boiled and dry-roasted.Using the data-driven approach we are able to generate new hypotheses to explain why nuclear receptors like the peroxisome proliferator-activated receptors(PPARs)and its isoform and their interaction with dietary lipids may have significant effect on allergic response.The results obtained from this study will direct future experimeantal and clinical studies to understand the role of dietary lipids and PPARisoforms to exert pro-inflammatory or anti-inflammatory functions on cells of the innate immunity and influence antigen presentation to the cells of the adaptive immunity.
文摘Microsoft Excel is essential for the End-User Approach (EUA), offering versatility in data organization, analysis, and visualization, as well as widespread accessibility. It fosters collaboration and informed decision-making across diverse domains. Conversely, Python is indispensable for professional programming due to its versatility, readability, extensive libraries, and robust community support. It enables efficient development, advanced data analysis, data mining, and automation, catering to diverse industries and applications. However, one primary issue when using Microsoft Excel with Python libraries is compatibility and interoperability. While Excel is a widely used tool for data storage and analysis, it may not seamlessly integrate with Python libraries, leading to challenges in reading and writing data, especially in complex or large datasets. Additionally, manipulating Excel files with Python may not always preserve formatting or formulas accurately, potentially affecting data integrity. Moreover, dependency on Excel’s graphical user interface (GUI) for automation can limit scalability and reproducibility compared to Python’s scripting capabilities. This paper covers the integration solution of empowering non-programmers to leverage Python’s capabilities within the familiar Excel environment. This enables users to perform advanced data analysis and automation tasks without requiring extensive programming knowledge. Based on Soliciting feedback from non-programmers who have tested the integration solution, the case study shows how the solution evaluates the ease of implementation, performance, and compatibility of Python with Excel versions.
文摘This research paper compares Excel and R language for data analysis and concludes that R language is more suitable for complex data analysis tasks.R language’s open-source nature makes it accessible to everyone,and its powerful data management and analysis tools make it suitable for handling complex data analysis tasks.It is also highly customizable,allowing users to create custom functions and packages to meet their specific needs.Additionally,R language provides high reproducibility,making it easy to replicate and verify research results,and it has excellent collaboration capabilities,enabling multiple users to work on the same project simultaneously.These advantages make R language a more suitable choice for complex data analysis tasks,particularly in scientific research and business applications.The findings of this study will help people understand that R is not just a language that can handle more data than Excel and demonstrate that r is essential to the field of data analysis.At the same time,it will also help users and organizations make informed decisions regarding their data analysis needs and software preferences.
基金The Binary Systems of South and North(BSN)project(https://bsnp.info/)。
文摘Reviewing the empirical and theoretical parameter relationships between various parameters is a good way to understand more about contact binary systems.In this investigation,two-dimensional(2D)relationships for P–MV(system),P–L1,2,M1,2–L1,2,and q–Lratiowere revisited.The sample used is related to 118 contact binary systems with an orbital period shorter than 0.6 days whose absolute parameters were estimated based on the Gaia Data Release 3 parallax.We reviewed previous studies on 2D relationships and updated six parameter relationships.Therefore,Markov chain Monte Carlo and Machine Learning methods were used,and the outcomes were compared.We selected 22 contact binary systems from eight previous studies for comparison,which had light curve solutions using spectroscopic data.The results show that the systems are in good agreement with the results of this study.
基金suppor ted by the National Key Research and Development Program of China (2022YFC2702502)the National Natural Science Foundation of China (32170742, 31970646, and 32060152)+7 种基金the Start Fund for Specially Appointed Professor of Jiangsu ProvinceHainan Province Science and Technology Special Fund (ZDYF2021SHFZ051)the Natural Science Foundation of Hainan Province (820MS053)the Start Fund for High-level Talents of Nanjing Medical University (NMUR2020009)the Marshal Initiative Funding of Hainan Medical University (JBGS202103)the Hainan Province Clinical Medical Center (QWYH202175)the Bioinformatics for Major Diseases Science Innovation Group of Hainan Medical Universitythe Shenzhen Science and Technology Program (JCYJ20210324140407021)
文摘The application of single-cell RNA sequencing(scRNA-seq)in biomedical research has advanced our understanding of the pathogenesis of disease and provided valuable insights into new diagnostic and therapeutic strategies.With the expansion of capacity for high-throughput scRNA-seq,including clinical samples,the analysis of these huge volumes of data has become a daunting prospect for researchers entering this field.Here,we review the workflow for typical scRNA-seq data analysis,covering raw data processing and quality control,basic data analysis applicable for almost all scRNA-seq data sets,and advanced data analysis that should be tailored to specific scientific questions.While summarizing the current methods for each analysis step,we also provide an online repository of software and wrapped-up scripts to support the implementation.Recommendations and caveats are pointed out for some specific analysis tasks and approaches.We hope this resource will be helpful to researchers engaging with scRNA-seq,in particular for emerging clinical applications.
文摘In the nonparametric data envelopment analysis literature,scale elasticity is evaluated in two alternative ways:using either the technical efficiency model or the cost efficiency model.This evaluation becomes problematic in several situations,for example(a)when input proportions change in the long run,(b)when inputs are heterogeneous,and(c)when firms face ex-ante price uncertainty in making their production decisions.To address these situations,a scale elasticity evaluation was performed using a value-based cost efficiency model.However,this alternative value-based scale elasticity evaluation is sensitive to the uncertainty and variability underlying input and output data.Therefore,in this study,we introduce a stochastic cost-efficiency model based on chance-constrained programming to develop a value-based measure of the scale elasticity of firms facing data uncertainty.An illustrative empirical application to the Indian banking industry comprising 71 banks for eight years(1998–2005)was made to compare inferences about their efficiency and scale properties.The key findings are as follows:First,both the deterministic model and our proposed stochastic model yield distinctly different results concerning the efficiency and scale elasticity scores at various tolerance levels of chance constraints.However,both models yield the same results at a tolerance level of 0.5,implying that the deterministic model is a special case of the stochastic model in that it reveals the same efficiency and returns to scale characterizations of banks.Second,the stochastic model generates higher efficiency scores for inefficient banks than its deterministic counterpart.Third,public banks exhibit higher efficiency than private and foreign banks.Finally,public and old private banks mostly exhibit either decreasing or constant returns to scale,whereas foreign and new private banks experience either increasing or decreasing returns to scale.Although the application of our proposed stochastic model is illustrative,it can be potentially applied to all firms in the information and distribution-intensive industry with high fixed costs,which have ample potential for reaping scale and scope benefits.
基金Supported by the NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization(U1909208)the Science and Technology Major Project of Changsha(kh2202004)the Changsha Municipal Natural Science Foundation(kq2202106).
文摘Electrocardiogram(ECG)is a low-cost,simple,fast,and non-invasive test.It can reflect the heart’s electrical activity and provide valuable diagnostic clues about the health of the entire body.Therefore,ECG has been widely used in various biomedical applications such as arrhythmia detection,disease-specific detection,mortality prediction,and biometric recognition.In recent years,ECG-related studies have been carried out using a variety of publicly available datasets,with many differences in the datasets used,data preprocessing methods,targeted challenges,and modeling and analysis techniques.Here we systematically summarize and analyze the ECGbased automatic analysis methods and applications.Specifically,we first reviewed 22 commonly used ECG public datasets and provided an overview of data preprocessing processes.Then we described some of the most widely used applications of ECG signals and analyzed the advanced methods involved in these applications.Finally,we elucidated some of the challenges in ECG analysis and provided suggestions for further research.
基金supported by the major scientific and technological research project of Chongqing Education Commission(KJZD-M202000802)The first batch of Industrial and Informatization Key Special Fund Support Projects in Chongqing in 2022(2022000537).
文摘As COVID-19 poses a major threat to people’s health and economy,there is an urgent need for forecasting methodologies that can anticipate its trajectory efficiently.In non-stationary time series forecasting jobs,there is frequently a hysteresis in the anticipated values relative to the real values.The multilayer deep-time convolutional network and a feature fusion network are combined in this paper’s proposal of an enhanced Multilayer Deep Time Convolutional Neural Network(MDTCNet)for COVID-19 prediction to address this problem.In particular,it is possible to record the deep features and temporal dependencies in uncertain time series,and the features may then be combined using a feature fusion network and a multilayer perceptron.Last but not least,the experimental verification is conducted on the prediction task of COVID-19 real daily confirmed cases in the world and the United States with uncertainty,realizing the short-term and long-term prediction of COVID-19 daily confirmed cases,and verifying the effectiveness and accuracy of the suggested prediction method,as well as reducing the hysteresis of the prediction results.
文摘Today,we are living in the era of“big data”where massive amounts of data are used for quantitative decisions and communication management.With the continuous penetration of big data-based intelligent technology in all fields of human life,the enormous commercial value inherent in the data industry has become a crucial force that drives the aggregation of new industries.For the publishing industry,the introduction of big data and relevant intelligent technologies,such as data intelligence analysis and scenario services,into the structure and value system of the publishing industry,has become an effective path to expanding and reshaping the demand space of publishing products,content decisions,workflow chain,and marketing direction.In the integration and reconstruction of big data,cloud computing,artificial intelligence,and other related technologies,it is expected that a generalized publishing industry pattern dominated by virtual interaction will be formed in the future.
文摘Law enforcement remains to be the main strategy used to combat poaching and account for high budget share in protected area management. Studies on efficiency of wildlife law enforcement in the protected areas are limited. This study analyzed economic efficiency of wildlife law enforcement in terms of resource used and output generated using three different protected areas (PAs) of Serengeti ecosystem namely Serengeti National Park (SENAPA), Ikorongo/Grumeti Game Reserves (IGGR) and Ikona Wildlife Management Area (IWMA). Three years (2010-2012) monthly data on wildlife law enforcement inputs and outputs were collected from respective PAs authorities and supplemented with key informant interviews and secondary data. Questionnaire surveys were conducted to wildlife law enforcement staff. Shadow prices for non-marketed inputs were estimated, and market prices for marketed inputs. Data Envelopment Analysis (DEA) was used to estimate economic efficiency using Variable Return to Scale (VRS) and Constant Return to Scale (CCR) assumptions. Results revealed that wildlife law enforcement in all PAs was economically inefficient, with less inefficiency observed in IWMA. The less inefficiency in IWMA is likely attributed to existing sense of ownership and responsibility created through community-based conservation which resulted in to decrease in law enforcement costs. A slacks evaluation revealed a potential to reduce fuel consumption, number of patrol vehicles, ration and prosecution efforts at different magnitudes between studied protected areas. There is equal potential to recruit more rangers while maintaining the resting time. These finding forms the bases for monitoring and evaluation with respect to resource usage to enhance efficiency. It is further recommended to enhance community participation in conservation in SENAPA and IGGR to lower law enforcement costs. Collaboration between protected area, police and judiciary is fundamental to enhance enforcement efficiency. Despite old dataset, these findings are relevant since neither conservation policy nor institution framework has changed substantially in the last decade.
基金supported by the Huaihua University Science Foundation under Grant HHUY2019-24.
文摘With the rapid development of the Internet,many enterprises have launched their network platforms.When users browse,search,and click the products of these platforms,most platforms will keep records of these network behaviors,these records are often heterogeneous,and it is called log data.To effectively to analyze and manage these heterogeneous log data,so that enterprises can grasp the behavior characteristics of their platform users in time,to realize targeted recommendation of users,increase the sales volume of enterprises’products,and accelerate the development of enterprises.Firstly,we follow the process of big data collection,storage,analysis,and visualization to design the system,then,we adopt HDFS storage technology,Yarn resource management technology,and gink load balancing technology to build a Hadoop cluster to process the log data,and adopt MapReduce processing technology and data warehouse hive technology analyze the log data to obtain the results.Finally,the obtained results are displayed visually,and a log data analysis system is successfully constructed.It has been proved by practice that the system effectively realizes the collection,analysis and visualization of log data,and can accurately realize the recommendation of products by enterprises.The system is stable and effective.
文摘Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.
基金the Strategic Priority Research Program on Space Science of the Chinese Academy of Sciences,the grant No.XDA15320104,with additional contributions from the Purple Mountain Observatory(PMO)of the Chinese Academy of Sciences and the National Space Science Center(NSSC).
文摘For the ASO-S/HXI payload, the accuracy of the flare reconstruction is reliant on important factors such as the alignment of the dual grating and the precise measurement of observation orientation. To guarantee optimal functionality of the instrument throughout its life cycle, the Solar Aspect System (SAS) is imperative to ensure that measurements are accurate and reliable. This is achieved by capturing the target motion and utilizing a physical model-based inversion algorithm. However, the SAS optical system’s inversion model is a typical ill-posed inverse problem due to its optical parameters, which results in small target sampling errors triggering unacceptable shifts in the solution. To enhance inversion accuracy and make it more robust against observation errors, we suggest dividing the inversion operation into two stages based on the SAS spot motion model. First, the as-rigid-aspossible (ARAP) transformation algorithm calculates the relative rotations and an intermediate variable between the substrates. Second, we solve an inversion linear equation for the relative translation of the substrates, the offset of the optical axes, and the observation orientation. To address the ill-posed challenge, the Tikhonov method grounded on the discrepancy criterion and the maximum a posteriori (MAP) method founded on the Bayesian framework are utilized. The simulation results exhibit that the ARAP method achieves a solution with a rotational error of roughly±3 5 (1/2-quantile);both regularization techniques are successful in enhancing the stability of the solution, the variance of error in the MAP method is even smaller—it achieves a translational error of approximately±18μm (1/2-quantile) in comparison to the Tikhonov method’s error of around±24μm (1/2-quantile). Furthermore, the SAS practical application data indicates the method’s usability in this study. Lastly, this paper discusses the intrinsic interconnections between the regularization methods.
基金supported by the National Natural Science Foundation of China(U1704134)。
文摘Very low frequency(VLF)signals are propagated between the ground-ionosphere.Multimode interference will cause the phase to show oscillatory changes with distance while propagating at night,leading to abnormalities in the received VLF signal.This study uses the VLF signal received in Qingdao City,Shandong Province,from the Russian Alpha navigation system to explore the multimode interference problem of VLF signal propagation.The characteristics of the effect of multimode interference phenomena on the phase are analyzed according to the variation of the phase of the VLF signal.However,the phase of VLF signals will also be affected by the X-ray and energetic particles that are released during the eruption of solar flares,therefore the two phenomena are studied in this work.It is concluded that the X-ray will not affect the phase of VLF signals at night,but the energetic particles will affect the phase change,and the influence of energetic particles should be excluded in the study of multimode interference phenomena.Using VLF signals for navigation positioning in degraded or unavailable GPS conditions is of great practical significance for VLF navigation systems as it can avoid the influence of multimode interference and improve positioning accuracy.
基金funded by the National Natural Science Foundation of China(NSFC,Nos.12373086 and 12303082)CAS“Light of West China”Program+2 种基金Yunnan Revitalization Talent Support Program in Yunnan ProvinceNational Key R&D Program of ChinaGravitational Wave Detection Project No.2022YFC2203800。
文摘Attitude is one of the crucial parameters for space objects and plays a vital role in collision prediction and debris removal.Analyzing light curves to determine attitude is the most commonly used method.In photometric observations,outliers may exist in the obtained light curves due to various reasons.Therefore,preprocessing is required to remove these outliers to obtain high quality light curves.Through statistical analysis,the reasons leading to outliers can be categorized into two main types:first,the brightness of the object significantly increases due to the passage of a star nearby,referred to as“stellar contamination,”and second,the brightness markedly decreases due to cloudy cover,referred to as“cloudy contamination.”The traditional approach of manually inspecting images for contamination is time-consuming and labor-intensive.However,we propose the utilization of machine learning methods as a substitute.Convolutional Neural Networks and SVMs are employed to identify cases of stellar contamination and cloudy contamination,achieving F1 scores of 1.00 and 0.98 on a test set,respectively.We also explore other machine learning methods such as ResNet-18 and Light Gradient Boosting Machine,then conduct comparative analyses of the results.
基金supported by the National Key R&D Program of China Nos.2021YFC2203502 and 2022YFF0711502the National Natural Science Foundation of China(NSFC)(12173077 and 12003062)+5 种基金the Tianshan Innovation Team Plan of Xinjiang Uygur Autonomous Region(2022D14020)the Tianshan Talent Project of Xinjiang Uygur Autonomous Region(2022TSYCCX0095)the Scientific Instrument Developing Project of the Chinese Academy of Sciences,grant No.PTYQ2022YZZD01China National Astronomical Data Center(NADC)the Operation,Maintenance and Upgrading Fund for Astronomical Telescopes and Facility Instruments,budgeted from the Ministry of Finance of China(MOF)and administrated by the Chinese Academy of Sciences(CAS)Natural Science Foundation of Xinjiang Uygur Autonomous Region(2022D01A360)。
文摘To address the problem of real-time processing of ultra-wide bandwidth pulsar baseband data,we designed and implemented a pulsar baseband data processing algorithm(PSRDP)based on GPU parallel computing technology.PSRDP can perform operations such as baseband data unpacking,channel separation,coherent dedispersion,Stokes detection,phase and folding period prediction,and folding integration in GPU clusters.We tested the algorithm using the J0437-4715 pulsar baseband data generated by the CASPSR and Medusa backends of the Parkes,and the J0332+5434 pulsar baseband data generated by the self-developed backend of the Nan Shan Radio Telescope.We obtained the pulse profiles of each baseband data.Through experimental analysis,we have found that the pulse profiles generated by the PSRDP algorithm in this paper are essentially consistent with the processing results of Digital Signal Processing Software for Pulsar Astronomy(DSPSR),which verified the effectiveness of the PSRDP algorithm.Furthermore,using the same baseband data,we compared the processing speed of PSRDP with DSPSR,and the results showed that PSRDP was not slower than DSPSR in terms of speed.The theoretical and technical experience gained from the PSRDP algorithm research in this article lays a technical foundation for the real-time processing of QTT(Qi Tai radio Telescope)ultra-wide bandwidth pulsar baseband data.
文摘Human living would be impossible without air quality. Consistent advancements in practically every aspect of contemporary human life have harmed air quality. Everyday industrial, transportation, and home activities turn up dangerous contaminants in our surroundings. This study investigated two years’ worth of air quality and outlier detection data from two Indian cities. Studies on air pollution have used numerous types of methodologies, with various gases being seen as a vector whose components include gas concentration values for each observation per-formed. We use curves to represent the monthly average of daily gas emissions in our technique. The approach, which is based on functional depth, was used to find outliers in the city of Delhi and Kolkata’s gas emissions, and the outcomes were compared to those from the traditional method. In the evaluation and comparison of these models’ performances, the functional approach model studied well.
文摘This article presents a comprehensive analysis of the current state of research on the English translation of Lu You’s poetry, utilizing a data sample comprising research papers published in the CNKI Full-text Database from 2001 to 2022. Employing rigorous longitudinal statistical methods, the study examines the progress achieved over the past two decades. Notably, domestic researchers have displayed considerable interest in the study of Lu You’s English translation works since 2001. The research on the English translation of Lu You’s poetry reveals a diverse range of perspectives, indicating a rich body of scholarship. However, several challenges persist, including insufficient research, limited translation coverage, and a noticeable focus on specific poems such as “Phoenix Hairpin” in the realm of English translation research. Consequently, there is ample room for improvement in the quality of research output on the English translation of Lu You’s poems, as well as its recognition within the academic community. Building on these findings, it is argued that future investigations pertaining to the English translation of Lu You’s poetry should transcend the boundaries of textual analysis and encompass broader theoretical perspectives and research methodologies. By undertaking this shift, scholars will develop a more profound comprehension of Lu You’s poetic works and make substantive contributions to the field of translation studies. Thus, this article aims to bridge the gap between past research endeavors and future possibilities, serving as a guide and inspiration for scholars to embark on a more nuanced and enriching exploration of Lu You’s poetry as well as other Chinese literature classics.