Differences in the imaging subgroups of cerebral small vessel disease(CSVD)need to be further explored.First,we use propensity score matching to obtain balanced datasets.Then random forest(RF)is adopted to classify th...Differences in the imaging subgroups of cerebral small vessel disease(CSVD)need to be further explored.First,we use propensity score matching to obtain balanced datasets.Then random forest(RF)is adopted to classify the subgroups compared with support vector machine(SVM)and extreme gradient boosting(XGBoost),and to select the features.The top 10 important features are included in the stepwise logistic regression,and the odds ratio(OR)and 95%confidence interval(CI)are obtained.There are 41290 adult inpatient records diagnosed with CSVD.Accuracy and area under curve(AUC)of RF are close to 0.7,which performs best in classification compared to SVM and XGBoost.OR and 95%CI of hematocrit for white matter lesions(WMLs),lacunes,microbleeds,atrophy,and enlarged perivascular space(EPVS)are 0.9875(0.9857−0.9893),0.9728(0.9705−0.9752),0.9782(0.9740−0.9824),1.0093(1.0081−1.0106),and 0.9716(0.9597−0.9832).OR and 95%CI of red cell distribution width for WMLs,lacunes,atrophy,and EPVS are 0.9600(0.9538−0.9662),0.9630(0.9559−0.9702),1.0751(1.0686−1.0817),and 0.9304(0.8864−0.9755).OR and 95%CI of platelet distribution width for WMLs,lacunes,and microbleeds are 1.1796(1.1636−1.1958),1.1663(1.1476−1.1853),and 1.0416(1.0152−1.0687).This study proposes a new analytical framework to select important clinical markers for CSVD with machine learning based on a common data model,which has low cost,fast speed,large sample size,and continuous data sources.展开更多
Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, a...Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, and more. However, their widespread usage emphasizes the critical need to enhance their security posture to ensure the integrity and reliability of their outputs and minimize harmful effects. Prompt injections and training data poisoning attacks are two of the most prominent vulnerabilities in LLMs, which could potentially lead to unpredictable and undesirable behaviors, such as biased outputs, misinformation propagation, and even malicious content generation. The Common Vulnerability Scoring System (CVSS) framework provides a standardized approach to capturing the principal characteristics of vulnerabilities, facilitating a deeper understanding of their severity within the security and AI communities. By extending the current CVSS framework, we generate scores for these vulnerabilities such that organizations can prioritize mitigation efforts, allocate resources effectively, and implement targeted security measures to defend against potential risks.展开更多
Multidatabase systems are designed to achieve schema integration and data interoperation among distributed and heterogeneous database systems. But data model heterogeneity and schema heterogeneity make this a challeng...Multidatabase systems are designed to achieve schema integration and data interoperation among distributed and heterogeneous database systems. But data model heterogeneity and schema heterogeneity make this a challenging task. A multidatabase common data model is firstly introduced based on XML, named XML-based Integration Data Model (XIDM), which is suitable for integrating different types of schemas. Then an approach of schema mappings based on XIDM in multidatabase systems has been presented. The mappings include global mappings, dealing with horizontal and vertical partitioning between global schemas and export schemas, and local mappings, processing the transformation between export schemas and local schemas. Finally, the illustration and implementation of schema mappings in a multidatabase prototype - Panorama system are also discussed. The implementation results demonstrate that the XIDM is an efficient model for managing multiple heterogeneous data sources and the approaches of schema mapping based on XIDM behave very well when integrating relational, object-oriented database systems and other file systems.展开更多
In this review, we highlight some recent methodological and theoretical develop- ments in estimation and testing of large panel data models with cross-sectional dependence. The paper begins with a discussion of issues...In this review, we highlight some recent methodological and theoretical develop- ments in estimation and testing of large panel data models with cross-sectional dependence. The paper begins with a discussion of issues of cross-sectional dependence, and introduces the concepts of weak and strong cross-sectional dependence. Then, the main attention is primarily paid to spatial and factor approaches for modeling cross-sectional dependence for both linear and nonlinear (nonparametric and semiparametric) panel data models. Finally, we conclude with some speculations on future research directions.展开更多
Towards a better understanding of hydrological interactions between the land surface and atmosphere, land surface mod- els are routinely used to simulate hydro-meteorological fluxes. However, there is a lack of observ...Towards a better understanding of hydrological interactions between the land surface and atmosphere, land surface mod- els are routinely used to simulate hydro-meteorological fluxes. However, there is a lack of observations available for model forcing, to estimate the hydro-meteorological fluxes in East Asia. In this study, Common Land Model (CLM) was used in offline-mode during the summer monsoon period of 2006 in East Asia, with different forcings from Asiaflux, Korea Land Data Assimilation System (KLDAS), and Global Land Data Assimilation System (GLDAS), at point and regional scales, separately. The CLM results were compared with observations from Asiaflux sites. The estimated net radiation showed good agreement, with r = 0.99 for the point scale and 0.85 for the regional scale. The estimated sensible and latent heat fluxes using Asiaflux and KLDAS data indicated reasonable agreement, with r = 0.70. The estimated soil moisture and soil temperature showed similar patterns to observations, although the estimated water fluxes using KLDAS showed larger discrepancies than those of Asiaflux because of scale mismatch. The spatial distribution of hydro-meteorological fluxes according to KLDAS for East Asia were compared to the CLM results with GLDAS, and the GLDAS provided online. The spatial distributions of CLM with KLDAS were analogous to CLM with GLDAS, and the standalone GLDAS data. The results indicate that KLDAS is a good potential source of high spatial resolution forcing data. Therefore, the KLDAS is a promising alternative product, capable of compensating for the lack of observations and low resolution grid data for East Asia.展开更多
Cooperative spectrum monitoring with multiple sensors has been deemed as an efficient mechanism for improving the monitoring accuracy and enlarging the monitoring area in wireless sensor networks.However,there exists ...Cooperative spectrum monitoring with multiple sensors has been deemed as an efficient mechanism for improving the monitoring accuracy and enlarging the monitoring area in wireless sensor networks.However,there exists redundancy among the spectrum data collected by a sensor node within a data collection period,which may reduce the data uploading efficiency.In this paper,we investigate the inter-data commonality detection which describes how much two data have in common.We define common segment set and divide it into six categories firstly,then a method to measure a common segment set is conducted by extracting commonality between two files.Moreover,the existing algorithms fail in finding a good common segment set,so Common Data Measurement(CDM)algorithm that can identify a good common segment set based on inter-data commonality detection is proposed.Theoretical analysis proves that CDM algorithm achieves a good measurement for the commonality between two strings.In addition,we conduct an synthetic dataset which are produced randomly.Numerical results shows that CDM algorithm can get better performance in measuring commonality between two binary files compared with Greedy-String-Tiling(GST)algorithm and simple greedy algorithm.展开更多
Structural change in panel data is a widespread phenomena. This paper proposes a fluctuation test to detect a structural change at an unknown date in heterogeneous panel data models with or without common correlated e...Structural change in panel data is a widespread phenomena. This paper proposes a fluctuation test to detect a structural change at an unknown date in heterogeneous panel data models with or without common correlated effects. The asymptotic properties of the fluctuation statistics in two cases are developed under the null and local alternative hypothesis. Furthermore, the consistency of the change point estimator is proven. Monte Carlo simulation shows that the fluctuation test can control the probability of type I error in most cases, and the empirical power is high in case of small and moderate sample sizes. An application of the procedure to a real data is presented.展开更多
现有标准格式雷达基数据解析工具在设计上存在通用性和抽象性不足的问题,不便于雷达数据的解析和处理。为了解决这个问题,本文基于Unidata的CDM(Common Data Model),设计和构建了中国天气雷达基数据模型,在数据模型层面实现了对天气雷...现有标准格式雷达基数据解析工具在设计上存在通用性和抽象性不足的问题,不便于雷达数据的解析和处理。为了解决这个问题,本文基于Unidata的CDM(Common Data Model),设计和构建了中国天气雷达基数据模型,在数据模型层面实现了对天气雷达标准格式基数据的访问,并以Unidata开源的NetCDF Java库和IDV(Integrated Data Viewer)可视化软件为基础,形成了一套基于CDM的天气雷达标准格式基数据内容提取和可视化分析工具。本研究以广州雷达新旧两种格式基本反射率数据对比为例,展示了研究成果在多普勒天气雷达标准格式基数据评估中的应用。结果表明:本研究成果方便了雷达标准格式基数据的使用,对雷达标准格式基数据的业务应用起到了促进作用。本研究成果亦可应用于雷达基数据处理与分析相关的实际业务和科研工作中,为雷达资料的应用提供基础支持。展开更多
Metadata is data about data,which is generated mainly for resources organization and description,facilitating finding,identifying,selecting and obtaining information.With the advancement of technologies,the acquisitio...Metadata is data about data,which is generated mainly for resources organization and description,facilitating finding,identifying,selecting and obtaining information.With the advancement of technologies,the acquisition of metadata has gradually become a critical step in data modeling and function operation,which leads to the formation of its methodological commons.A series of general operations has been developed to achieve structured description,semantic encoding and machine-understandable information,including entity definition,relation description,object analysis,attribute extraction,ontology modeling,data cleaning,disambiguation,alignment,mapping,relating,enriching,importing,exporting,service implementation,registry and discovery,monitoring etc.Those operations are not only necessary elements in semantic technologies(including linked data)and knowledge graph technology,but has also developed into the common operation and primary strategy in building independent and knowledge-based information systems.In this paper,a series of metadata-related methods are collectively referred to as'metadata methodological commons',which has a lot of best practices reflected in the various standard specifications of the Semantic Web.In the future construction of a multi-modal metaverse based on Web 3.0,it shall play an important role,for example,in building digital twins through adopting knowledge models,or supporting the modeling of the entire virtual world,etc.Manual-based description and coding obviously cannot adapted to the UGC(User Generated Contents)and AIGC(AI Generated Contents)-based content production in the metaverse era.The automatic processing of semantic formalization must be considered as a sure way to adapt metadata methodological commons to meet the future needs of Al era.展开更多
We compare the Hubble diagram calculated from the observed redshift (RS)/magnitude (μ) data of 280 Supernovae in the RS range of z = 0.0104 to 8.1 with Hubble diagrams inferred on the basis of the exponential tired l...We compare the Hubble diagram calculated from the observed redshift (RS)/magnitude (μ) data of 280 Supernovae in the RS range of z = 0.0104 to 8.1 with Hubble diagrams inferred on the basis of the exponential tired light and the Lambda Cold Dark Matter (ΛCDM) cosmological model. We show that the experimentally measured Hubble diagram follows clearly the exponential photon flight time (tS)/RS relation, whilst the data calculated on the basis of the ΛCDM model exhibit poor agreement with the observed data.展开更多
基金supported by the National Natural Science Foundation of China(Nos.72204169 and 81825007)Beijing Outstanding Young Scientist Program(No.BJJWZYJH01201910025030)+5 种基金Capital’s Funds for Health Improvement and Research(No.2022-2-2045)National Key R&D Program of China(Nos.2022YFF15015002022YFF1501501,2022YFF1501502,2022YFF1501503,2022YFF1501504,and 2022YFF1501505)Youth Beijing Scholar Program(No.010)Beijing Laboratory of Oral Health(No.PXM2021_014226_000041)Beijing Talent Project-Class A:Innovation and Development(No.2018A12)National Ten-Thousand Talent PlanLeadership of Scientific and Technological Innovation,and National Key R&D Program of China(Nos.2017YFC1307900 and 2017YFC1307905).
文摘Differences in the imaging subgroups of cerebral small vessel disease(CSVD)need to be further explored.First,we use propensity score matching to obtain balanced datasets.Then random forest(RF)is adopted to classify the subgroups compared with support vector machine(SVM)and extreme gradient boosting(XGBoost),and to select the features.The top 10 important features are included in the stepwise logistic regression,and the odds ratio(OR)and 95%confidence interval(CI)are obtained.There are 41290 adult inpatient records diagnosed with CSVD.Accuracy and area under curve(AUC)of RF are close to 0.7,which performs best in classification compared to SVM and XGBoost.OR and 95%CI of hematocrit for white matter lesions(WMLs),lacunes,microbleeds,atrophy,and enlarged perivascular space(EPVS)are 0.9875(0.9857−0.9893),0.9728(0.9705−0.9752),0.9782(0.9740−0.9824),1.0093(1.0081−1.0106),and 0.9716(0.9597−0.9832).OR and 95%CI of red cell distribution width for WMLs,lacunes,atrophy,and EPVS are 0.9600(0.9538−0.9662),0.9630(0.9559−0.9702),1.0751(1.0686−1.0817),and 0.9304(0.8864−0.9755).OR and 95%CI of platelet distribution width for WMLs,lacunes,and microbleeds are 1.1796(1.1636−1.1958),1.1663(1.1476−1.1853),and 1.0416(1.0152−1.0687).This study proposes a new analytical framework to select important clinical markers for CSVD with machine learning based on a common data model,which has low cost,fast speed,large sample size,and continuous data sources.
文摘Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, and more. However, their widespread usage emphasizes the critical need to enhance their security posture to ensure the integrity and reliability of their outputs and minimize harmful effects. Prompt injections and training data poisoning attacks are two of the most prominent vulnerabilities in LLMs, which could potentially lead to unpredictable and undesirable behaviors, such as biased outputs, misinformation propagation, and even malicious content generation. The Common Vulnerability Scoring System (CVSS) framework provides a standardized approach to capturing the principal characteristics of vulnerabilities, facilitating a deeper understanding of their severity within the security and AI communities. By extending the current CVSS framework, we generate scores for these vulnerabilities such that organizations can prioritize mitigation efforts, allocate resources effectively, and implement targeted security measures to defend against potential risks.
文摘Multidatabase systems are designed to achieve schema integration and data interoperation among distributed and heterogeneous database systems. But data model heterogeneity and schema heterogeneity make this a challenging task. A multidatabase common data model is firstly introduced based on XML, named XML-based Integration Data Model (XIDM), which is suitable for integrating different types of schemas. Then an approach of schema mappings based on XIDM in multidatabase systems has been presented. The mappings include global mappings, dealing with horizontal and vertical partitioning between global schemas and export schemas, and local mappings, processing the transformation between export schemas and local schemas. Finally, the illustration and implementation of schema mappings in a multidatabase prototype - Panorama system are also discussed. The implementation results demonstrate that the XIDM is an efficient model for managing multiple heterogeneous data sources and the approaches of schema mapping based on XIDM behave very well when integrating relational, object-oriented database systems and other file systems.
基金Supported by the National Natural Science Foundation of China(71131008(Key Project)and 71271179)
文摘In this review, we highlight some recent methodological and theoretical develop- ments in estimation and testing of large panel data models with cross-sectional dependence. The paper begins with a discussion of issues of cross-sectional dependence, and introduces the concepts of weak and strong cross-sectional dependence. Then, the main attention is primarily paid to spatial and factor approaches for modeling cross-sectional dependence for both linear and nonlinear (nonparametric and semiparametric) panel data models. Finally, we conclude with some speculations on future research directions.
基金supported by Space Core Technology Development Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Science,ICTFuture Planning(NRF-2014M1A3A3A02034789)+1 种基金Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2013R1A1A2A10004743)the Korea Meteorological Administration Research and Development Program under Grant Weather Information Service Engine(WISE)project,KMA-2012-0001-A
文摘Towards a better understanding of hydrological interactions between the land surface and atmosphere, land surface mod- els are routinely used to simulate hydro-meteorological fluxes. However, there is a lack of observations available for model forcing, to estimate the hydro-meteorological fluxes in East Asia. In this study, Common Land Model (CLM) was used in offline-mode during the summer monsoon period of 2006 in East Asia, with different forcings from Asiaflux, Korea Land Data Assimilation System (KLDAS), and Global Land Data Assimilation System (GLDAS), at point and regional scales, separately. The CLM results were compared with observations from Asiaflux sites. The estimated net radiation showed good agreement, with r = 0.99 for the point scale and 0.85 for the regional scale. The estimated sensible and latent heat fluxes using Asiaflux and KLDAS data indicated reasonable agreement, with r = 0.70. The estimated soil moisture and soil temperature showed similar patterns to observations, although the estimated water fluxes using KLDAS showed larger discrepancies than those of Asiaflux because of scale mismatch. The spatial distribution of hydro-meteorological fluxes according to KLDAS for East Asia were compared to the CLM results with GLDAS, and the GLDAS provided online. The spatial distributions of CLM with KLDAS were analogous to CLM with GLDAS, and the standalone GLDAS data. The results indicate that KLDAS is a good potential source of high spatial resolution forcing data. Therefore, the KLDAS is a promising alternative product, capable of compensating for the lack of observations and low resolution grid data for East Asia.
基金supported in part by the National Natural Science Foundation of China(No.61901328)the China Postdoctoral Science Foundation (No. 2019M653558)+1 种基金the Fundamental Research Funds for the Central Universities (No. CJT150101)the Key project of National Natural Science Foundation of China (No. 61631015)
文摘Cooperative spectrum monitoring with multiple sensors has been deemed as an efficient mechanism for improving the monitoring accuracy and enlarging the monitoring area in wireless sensor networks.However,there exists redundancy among the spectrum data collected by a sensor node within a data collection period,which may reduce the data uploading efficiency.In this paper,we investigate the inter-data commonality detection which describes how much two data have in common.We define common segment set and divide it into six categories firstly,then a method to measure a common segment set is conducted by extracting commonality between two files.Moreover,the existing algorithms fail in finding a good common segment set,so Common Data Measurement(CDM)algorithm that can identify a good common segment set based on inter-data commonality detection is proposed.Theoretical analysis proves that CDM algorithm achieves a good measurement for the commonality between two strings.In addition,we conduct an synthetic dataset which are produced randomly.Numerical results shows that CDM algorithm can get better performance in measuring commonality between two binary files compared with Greedy-String-Tiling(GST)algorithm and simple greedy algorithm.
基金supported by the National Natural Science Foundation of China under Grant Nos. 11801438,12161072 and 12171388the Natural Science Basic Research Plan in Shaanxi Province of China under Grant No. 2023-JC-YB-058the Innovation Capability Support Program of Shaanxi under Grant No. 2020PT-023。
文摘Structural change in panel data is a widespread phenomena. This paper proposes a fluctuation test to detect a structural change at an unknown date in heterogeneous panel data models with or without common correlated effects. The asymptotic properties of the fluctuation statistics in two cases are developed under the null and local alternative hypothesis. Furthermore, the consistency of the change point estimator is proven. Monte Carlo simulation shows that the fluctuation test can control the probability of type I error in most cases, and the empirical power is high in case of small and moderate sample sizes. An application of the procedure to a real data is presented.
文摘现有标准格式雷达基数据解析工具在设计上存在通用性和抽象性不足的问题,不便于雷达数据的解析和处理。为了解决这个问题,本文基于Unidata的CDM(Common Data Model),设计和构建了中国天气雷达基数据模型,在数据模型层面实现了对天气雷达标准格式基数据的访问,并以Unidata开源的NetCDF Java库和IDV(Integrated Data Viewer)可视化软件为基础,形成了一套基于CDM的天气雷达标准格式基数据内容提取和可视化分析工具。本研究以广州雷达新旧两种格式基本反射率数据对比为例,展示了研究成果在多普勒天气雷达标准格式基数据评估中的应用。结果表明:本研究成果方便了雷达标准格式基数据的使用,对雷达标准格式基数据的业务应用起到了促进作用。本研究成果亦可应用于雷达基数据处理与分析相关的实际业务和科研工作中,为雷达资料的应用提供基础支持。
基金supported by the National Social Science Foundation(Grant/Award Number:21&ZD334)。
文摘Metadata is data about data,which is generated mainly for resources organization and description,facilitating finding,identifying,selecting and obtaining information.With the advancement of technologies,the acquisition of metadata has gradually become a critical step in data modeling and function operation,which leads to the formation of its methodological commons.A series of general operations has been developed to achieve structured description,semantic encoding and machine-understandable information,including entity definition,relation description,object analysis,attribute extraction,ontology modeling,data cleaning,disambiguation,alignment,mapping,relating,enriching,importing,exporting,service implementation,registry and discovery,monitoring etc.Those operations are not only necessary elements in semantic technologies(including linked data)and knowledge graph technology,but has also developed into the common operation and primary strategy in building independent and knowledge-based information systems.In this paper,a series of metadata-related methods are collectively referred to as'metadata methodological commons',which has a lot of best practices reflected in the various standard specifications of the Semantic Web.In the future construction of a multi-modal metaverse based on Web 3.0,it shall play an important role,for example,in building digital twins through adopting knowledge models,or supporting the modeling of the entire virtual world,etc.Manual-based description and coding obviously cannot adapted to the UGC(User Generated Contents)and AIGC(AI Generated Contents)-based content production in the metaverse era.The automatic processing of semantic formalization must be considered as a sure way to adapt metadata methodological commons to meet the future needs of Al era.
文摘We compare the Hubble diagram calculated from the observed redshift (RS)/magnitude (μ) data of 280 Supernovae in the RS range of z = 0.0104 to 8.1 with Hubble diagrams inferred on the basis of the exponential tired light and the Lambda Cold Dark Matter (ΛCDM) cosmological model. We show that the experimentally measured Hubble diagram follows clearly the exponential photon flight time (tS)/RS relation, whilst the data calculated on the basis of the ΛCDM model exhibit poor agreement with the observed data.