For high reliability and long life systems, system pass/fail data are often rare. Integrating lower-level data, such as data drawn from the subsystem or component pass/fail testing,the Bayesian analysis can improve th...For high reliability and long life systems, system pass/fail data are often rare. Integrating lower-level data, such as data drawn from the subsystem or component pass/fail testing,the Bayesian analysis can improve the precision of the system reliability assessment. If the multi-level pass/fail data are overlapping,one challenging problem for the Bayesian analysis is to develop a likelihood function. Since the computation burden of the existing methods makes them infeasible for multi-component systems, this paper proposes an improved Bayesian approach for the system reliability assessment in light of overlapping data. This approach includes three steps: fristly searching for feasible paths based on the binary decision diagram, then screening feasible points based on space partition and constraint decomposition, and finally simplifying the likelihood function. An example of a satellite rolling control system demonstrates the feasibility and the efficiency of the proposed approach.展开更多
Similarity measure design on non-overlapped data was carried out and compared with the case of overlapped data.Unconsistant feature of similarity on overlapped data to non-overlapped data was provided by example.By th...Similarity measure design on non-overlapped data was carried out and compared with the case of overlapped data.Unconsistant feature of similarity on overlapped data to non-overlapped data was provided by example.By the artificial data illustration,it was proved that the conventional similarity measure was not proper to calculate the similarity measure of the non-overlapped case.To overcome the unbalance problem,similarity measure on non-overlapped data was obtained by considering neighbor information.Hence,different approaches to design similarity measure were proposed and proved by consideration of neighbor information.With the example of artificial data,similarity measure calculation was carried out.Similarity measure extension to intuitionistic fuzzy sets(IFSs)containing uncertainty named hesitance was also followed.展开更多
针对不平衡数据中类重叠区域易造成分类错误的问题,提出一种引入合成因子改进边界分类的Borderline-SMOTE过采样方法(IBSM).首先根据少数类样本近邻分布情况找出处于边界的少数类样本,然后计算边界样本对应的合成因子,并根据其取值更新...针对不平衡数据中类重叠区域易造成分类错误的问题,提出一种引入合成因子改进边界分类的Borderline-SMOTE过采样方法(IBSM).首先根据少数类样本近邻分布情况找出处于边界的少数类样本,然后计算边界样本对应的合成因子,并根据其取值更新该样本需生成的样本数,最后在近邻中根据合成因子挑选距离最近的top-Z少数类样本进行新样本生成.将提出的方法与八种采样方法在KNN和SVM两种分类器、10个KEEL不平衡数据集上进行对比实验,结果表明,提出的方法在大部分数据集上的F1,G-mean,AUC(Area under Curve)均获得最优值,且F1与AUC的Friedman排名最优,证明所提方法和其余采样方法相比,在处理不平衡数据中的边界样本分类问题时有更好的表现,通过合成因子设定一定的约束条件与分配策略,可以为同类研究提供思路.展开更多
海洋数据同化是一种同时利用海洋观测资料和海洋数值模式对海洋数据进行修正的有效方法,经过处理的海洋数据更加接近海洋的真实情况.在高分辨率下,基于中国科学院大气物理研究所(Institute of Atmospheric Physics,Chinese Academy of S...海洋数据同化是一种同时利用海洋观测资料和海洋数值模式对海洋数据进行修正的有效方法,经过处理的海洋数据更加接近海洋的真实情况.在高分辨率下,基于中国科学院大气物理研究所(Institute of Atmospheric Physics,Chinese Academy of Sciences,IAP)和大气科学和地球流体力学数值模拟国家重点实验室(State Key Laboratory Modelling for Atmospheric Sciences and Geophysical Fluid Dynamics,LASG)发展的LASG/IAP气候系统海洋模式(LASG/IAP climate ocean model,LICOM)的同化并行程序往往涉及大量的文件读取、通信和计算,以往的研究虽然对这些方面进行了优化,但是由于优化只是停留在上层算法层面,没有考虑底层的文件系统以及超算集群的架构,因此优化的效果不太明显.针对以往研究存在的问题,进一步将海洋数据同化的数据特性、计算特性与所使用的超算平台的架构特性相结合,在此基础上结合时间局部性和空间局部性,提出了基于计算拓扑图的负载均衡策略、基于Lustre文件存储架构和超算集群特性的并行优化策略,以及计算、读取通信、写回3层重叠策略.最后,使用高分辨率数据集,在天河2号超算集群上对所提算法进行了测试.相比于现有算法,所提的算法在4 000核下对总体同化性能上提升了18倍.另外,还在曙光7 000超算集群上开展了测试.在4 000块DCU加速卡上,相比于已有算法,所提算法提升总体计算性能8倍左右.展开更多
提出一种新的面向复杂网络大数据的重叠社区检测算法DOC(detecting overlapping communities over complex network big data),时间复杂度为O(nlog2(n)),算法基于模块度聚类和图计算思想,应用新的节点和边的更新方法,利用平衡二叉树对...提出一种新的面向复杂网络大数据的重叠社区检测算法DOC(detecting overlapping communities over complex network big data),时间复杂度为O(nlog2(n)),算法基于模块度聚类和图计算思想,应用新的节点和边的更新方法,利用平衡二叉树对模块度增量建立索引,基于模块度最优的思想设计一种新的重叠社区检测算法.相对于传统的重叠节点检测算法,对每个节点分析的频率大为降低,可以在较低的算法运行时间下获得较高的识别准确率.复杂网络大数据集上的算法测试结果表明:DOC算法能够有效地检测出网络重叠社区,社区识别准确率较高,在大规模LFR基准数据集上其重叠社区检测标准化互信息指标NMI最高能达到0.97,重叠节点检测指标F-score的平均值在0.91以上,且复杂网络大数据下的运行时间明显优于传统算法.展开更多
基金supported by the National Natural Science Foundation of China(61304218)
文摘For high reliability and long life systems, system pass/fail data are often rare. Integrating lower-level data, such as data drawn from the subsystem or component pass/fail testing,the Bayesian analysis can improve the precision of the system reliability assessment. If the multi-level pass/fail data are overlapping,one challenging problem for the Bayesian analysis is to develop a likelihood function. Since the computation burden of the existing methods makes them infeasible for multi-component systems, this paper proposes an improved Bayesian approach for the system reliability assessment in light of overlapping data. This approach includes three steps: fristly searching for feasible paths based on the binary decision diagram, then screening feasible points based on space partition and constraint decomposition, and finally simplifying the likelihood function. An example of a satellite rolling control system demonstrates the feasibility and the efficiency of the proposed approach.
文摘Similarity measure design on non-overlapped data was carried out and compared with the case of overlapped data.Unconsistant feature of similarity on overlapped data to non-overlapped data was provided by example.By the artificial data illustration,it was proved that the conventional similarity measure was not proper to calculate the similarity measure of the non-overlapped case.To overcome the unbalance problem,similarity measure on non-overlapped data was obtained by considering neighbor information.Hence,different approaches to design similarity measure were proposed and proved by consideration of neighbor information.With the example of artificial data,similarity measure calculation was carried out.Similarity measure extension to intuitionistic fuzzy sets(IFSs)containing uncertainty named hesitance was also followed.
文摘针对不平衡数据中类重叠区域易造成分类错误的问题,提出一种引入合成因子改进边界分类的Borderline-SMOTE过采样方法(IBSM).首先根据少数类样本近邻分布情况找出处于边界的少数类样本,然后计算边界样本对应的合成因子,并根据其取值更新该样本需生成的样本数,最后在近邻中根据合成因子挑选距离最近的top-Z少数类样本进行新样本生成.将提出的方法与八种采样方法在KNN和SVM两种分类器、10个KEEL不平衡数据集上进行对比实验,结果表明,提出的方法在大部分数据集上的F1,G-mean,AUC(Area under Curve)均获得最优值,且F1与AUC的Friedman排名最优,证明所提方法和其余采样方法相比,在处理不平衡数据中的边界样本分类问题时有更好的表现,通过合成因子设定一定的约束条件与分配策略,可以为同类研究提供思路.
文摘海洋数据同化是一种同时利用海洋观测资料和海洋数值模式对海洋数据进行修正的有效方法,经过处理的海洋数据更加接近海洋的真实情况.在高分辨率下,基于中国科学院大气物理研究所(Institute of Atmospheric Physics,Chinese Academy of Sciences,IAP)和大气科学和地球流体力学数值模拟国家重点实验室(State Key Laboratory Modelling for Atmospheric Sciences and Geophysical Fluid Dynamics,LASG)发展的LASG/IAP气候系统海洋模式(LASG/IAP climate ocean model,LICOM)的同化并行程序往往涉及大量的文件读取、通信和计算,以往的研究虽然对这些方面进行了优化,但是由于优化只是停留在上层算法层面,没有考虑底层的文件系统以及超算集群的架构,因此优化的效果不太明显.针对以往研究存在的问题,进一步将海洋数据同化的数据特性、计算特性与所使用的超算平台的架构特性相结合,在此基础上结合时间局部性和空间局部性,提出了基于计算拓扑图的负载均衡策略、基于Lustre文件存储架构和超算集群特性的并行优化策略,以及计算、读取通信、写回3层重叠策略.最后,使用高分辨率数据集,在天河2号超算集群上对所提算法进行了测试.相比于现有算法,所提的算法在4 000核下对总体同化性能上提升了18倍.另外,还在曙光7 000超算集群上开展了测试.在4 000块DCU加速卡上,相比于已有算法,所提算法提升总体计算性能8倍左右.
文摘提出一种新的面向复杂网络大数据的重叠社区检测算法DOC(detecting overlapping communities over complex network big data),时间复杂度为O(nlog2(n)),算法基于模块度聚类和图计算思想,应用新的节点和边的更新方法,利用平衡二叉树对模块度增量建立索引,基于模块度最优的思想设计一种新的重叠社区检测算法.相对于传统的重叠节点检测算法,对每个节点分析的频率大为降低,可以在较低的算法运行时间下获得较高的识别准确率.复杂网络大数据集上的算法测试结果表明:DOC算法能够有效地检测出网络重叠社区,社区识别准确率较高,在大规模LFR基准数据集上其重叠社区检测标准化互信息指标NMI最高能达到0.97,重叠节点检测指标F-score的平均值在0.91以上,且复杂网络大数据下的运行时间明显优于传统算法.