Based on the data of daily precipitation in Lianyungang area from 1951 to 2012 and various climate signal data from the National Climate Center website and the NOAA website,a model for predicting whether the number of...Based on the data of daily precipitation in Lianyungang area from 1951 to 2012 and various climate signal data from the National Climate Center website and the NOAA website,a model for predicting whether the number of rainstorm days in summer in Lianyungang area is large was established by the classical C5. 0 decision tree algorithm. The data samples in 48 years( accounting for about 80% of total number of samples)was as the training set of a model,and the training accuracy rate of the model was 95. 83%. The data samples in the remaining 14 years( accounting for about 20% of total number of samples) were used as the test set of the model to test the model,and the test accuracy of the model was 85. 71%. The results showed that the prediction model of number of rainstorm days in summer constructed by C5. 0 algorithm had high accuracy and was easy to explain. Moreover,it is convenient for meteorological staff to use directly. At the same time,this study provides a new idea for short-term climate prediction of number of rainstorm days in summer.展开更多
分类问题是数据研发领域里研究和使用最广泛的技术之一。近几年经济的飞速发展,分类问题在多行业和领域中被广泛使用,那么,怎样更准确、更有效地分类呢?这是多数研究工作人员的目标。决策树(decision tree)以条理清晰,程序严谨,定量、...分类问题是数据研发领域里研究和使用最广泛的技术之一。近几年经济的飞速发展,分类问题在多行业和领域中被广泛使用,那么,怎样更准确、更有效地分类呢?这是多数研究工作人员的目标。决策树(decision tree)以条理清晰,程序严谨,定量、定性分析相结合,方法通俗易懂,容易掌握,应用性较强等优点,被广泛应用。现在构造决策树的算法比较多,如:ID3算法、C4.5算法、CART等。C4.5算法是在ID3算法的基础上进行改进的,C4.5算法选用信息增益率(Info Gain Ratio)为择取分枝属性的标准,此方法弥补了ID3算法在运用信息增益择取分枝属性时偏向于取值较多的属性的不足之处,但是C4.5算法也有部分缺陷,本文主要在其处理连续属性比较耗时问题上进行深入探讨,本文对其连续的处理过程进行改进来提高C4.5算法的计算效率,大大缩短算法用时。展开更多
目的构建男男同性性行为人群(men who have sex with men,MSM)丙型肝炎病毒(hepatitis C virus,HCV)感染高风险行为评价工具。方法本研究组首先开发HCV感染高风险行为评价工具,2019年12月20日—2020年1月14日利用社交软件平台,通过在线...目的构建男男同性性行为人群(men who have sex with men,MSM)丙型肝炎病毒(hepatitis C virus,HCV)感染高风险行为评价工具。方法本研究组首先开发HCV感染高风险行为评价工具,2019年12月20日—2020年1月14日利用社交软件平台,通过在线调查收集目标人群相关信息,对该工具进行评估,采用决策树模型进行数据分析。结果HCV感染高风险行为评价工具包含的6个条目并全部纳入树模型,树模型包括5层,27个节点,模型Risk估计量为0.085,模型预测正确率为91.52%,树模型索引图和收益图显示模型拟合良好。重要性评价结果显示,对MSM人群HCV感染风险影响由大至小的条目依次为:HIV结果、毒品使用、性病或相关症状、安全套使用、群交和创伤性操作。结论本研究开发的HCV感染高风险行为评价工具简单、易操作,可用于评价MSM人群的HCV感染高风险行为,为精准行为干预提供科学依据。展开更多
基金Support by Meteorological Open Research Foundation for the Huaihe River Basin(HRM201602)Foundation for Young Scholars of Jiangsu Meteorological Bureau(Q201708,KQ201802)+2 种基金Science and Technology Innovation Team Foundation for Marine Meteorological Forecast Technology of Lianyungang Meteorological BureauKey Technology R&D Program Project of Lianyungang City(SH1634)Special Project for Forecasters of Jiangsu Meteorological Bureau(JSYBY201811,JSYBY201812,JSYBY201810)
文摘Based on the data of daily precipitation in Lianyungang area from 1951 to 2012 and various climate signal data from the National Climate Center website and the NOAA website,a model for predicting whether the number of rainstorm days in summer in Lianyungang area is large was established by the classical C5. 0 decision tree algorithm. The data samples in 48 years( accounting for about 80% of total number of samples)was as the training set of a model,and the training accuracy rate of the model was 95. 83%. The data samples in the remaining 14 years( accounting for about 20% of total number of samples) were used as the test set of the model to test the model,and the test accuracy of the model was 85. 71%. The results showed that the prediction model of number of rainstorm days in summer constructed by C5. 0 algorithm had high accuracy and was easy to explain. Moreover,it is convenient for meteorological staff to use directly. At the same time,this study provides a new idea for short-term climate prediction of number of rainstorm days in summer.
文摘分类问题是数据研发领域里研究和使用最广泛的技术之一。近几年经济的飞速发展,分类问题在多行业和领域中被广泛使用,那么,怎样更准确、更有效地分类呢?这是多数研究工作人员的目标。决策树(decision tree)以条理清晰,程序严谨,定量、定性分析相结合,方法通俗易懂,容易掌握,应用性较强等优点,被广泛应用。现在构造决策树的算法比较多,如:ID3算法、C4.5算法、CART等。C4.5算法是在ID3算法的基础上进行改进的,C4.5算法选用信息增益率(Info Gain Ratio)为择取分枝属性的标准,此方法弥补了ID3算法在运用信息增益择取分枝属性时偏向于取值较多的属性的不足之处,但是C4.5算法也有部分缺陷,本文主要在其处理连续属性比较耗时问题上进行深入探讨,本文对其连续的处理过程进行改进来提高C4.5算法的计算效率,大大缩短算法用时。
基金HIV/HCV No Co-Infection(NoCo)Program(IN-US-987-5557)
文摘目的构建男男同性性行为人群(men who have sex with men,MSM)丙型肝炎病毒(hepatitis C virus,HCV)感染高风险行为评价工具。方法本研究组首先开发HCV感染高风险行为评价工具,2019年12月20日—2020年1月14日利用社交软件平台,通过在线调查收集目标人群相关信息,对该工具进行评估,采用决策树模型进行数据分析。结果HCV感染高风险行为评价工具包含的6个条目并全部纳入树模型,树模型包括5层,27个节点,模型Risk估计量为0.085,模型预测正确率为91.52%,树模型索引图和收益图显示模型拟合良好。重要性评价结果显示,对MSM人群HCV感染风险影响由大至小的条目依次为:HIV结果、毒品使用、性病或相关症状、安全套使用、群交和创伤性操作。结论本研究开发的HCV感染高风险行为评价工具简单、易操作,可用于评价MSM人群的HCV感染高风险行为,为精准行为干预提供科学依据。