数据挖掘技术能够从大量、不完全、有噪声、模糊、随机的实际应用数据中,提取隐含在其中的、人们事先不知道的本质的规律。为了有效地发现旋转机械故障诊断过程中的故障征兆知识,引入数据挖掘技术和方法。针对旋转机械,构建了基于重复...数据挖掘技术能够从大量、不完全、有噪声、模糊、随机的实际应用数据中,提取隐含在其中的、人们事先不知道的本质的规律。为了有效地发现旋转机械故障诊断过程中的故障征兆知识,引入数据挖掘技术和方法。针对旋转机械,构建了基于重复增量修枝算法RIPPER(Repeated Incremental Pruning to Produce Error Reduction)的故障诊断知识获取系统。通过收集故障现象并整理成由故障征兆、故障类型等组成的故障信息样本,应用RIPPER算法对故障进行分析得到故障诊断规则集文件,实现故障诊断系统知识的获取和自动更新,并能对旋转机械的常见故障进行诊断,验证了算法的合理性。展开更多
The epidemiology of HIV-1 varies in different areas of the world, and it is possible that this complexity may leave unique footprints in the viral genome. Thus, we attempted to find significant patterns in global HIV-...The epidemiology of HIV-1 varies in different areas of the world, and it is possible that this complexity may leave unique footprints in the viral genome. Thus, we attempted to find significant patterns in global HIV-1 genome sequences. By applying the rule inference algorithm RIPPER (Repeated Incremental Pruning to Produce Error Reduction) to multiple sequence alignments of Env sequences from four classes of compiled datasets, we generated four sets of signature patterns. We found that these patterns were able to distinguish southeastern Asian from non- southeastern Asian sequences with 97.5% accuracy, Chinese from non-Chinese sequences with 98.3% accuracy, African from non-African sequences with 88.4% accuracy, and southern African from non-southern African sequences with 91.2% accuracy. These patterns showed different associations with subtypes and with amino acid positions. In addition, some signature patterns were characteristic of the geographic area from which the sample was taken. Amino acid features corresponding to the phylogenetic clustering of HIV-1 sequences were consistent with some of the deduced patterns. Using a combination of patterns inferred from subtypes B, C, and all subtypes chimeric with CRF01_AE worldwide, we found that signature patterns of subtype C were extremely common in some sampled countries (for example, Zambia in southern Africa), which may hint at the origin of this HIV-1 subtype and the need to pay special attention to this area of Africa. Signature patterns of subtype B sequences were associated with different countries. Even more, there are distinct patterns at single position 21 with glycine, leucine and isoleucine corresponding to subtype C, B and all possible recombination forms chimeric with CRF01_AE, which also indicate distinct geographic features. Our method widens the scope of inference of signature from geographic, genetic, and genomic viewpoints. These findings may provide a valuable reference for epidemiological research or vaccine design.展开更多
基金国家自然科学基金(the National Natural Science Foundation of China under Grant No.50375017)北京市自然科学基金(the Natural Science Foundation of Beijing City of China under Grant No.3042006+1 种基金No.3062008)机电系统测控北京市重点实验室开放课题资助 (No.KF20041123206)
文摘数据挖掘技术能够从大量、不完全、有噪声、模糊、随机的实际应用数据中,提取隐含在其中的、人们事先不知道的本质的规律。为了有效地发现旋转机械故障诊断过程中的故障征兆知识,引入数据挖掘技术和方法。针对旋转机械,构建了基于重复增量修枝算法RIPPER(Repeated Incremental Pruning to Produce Error Reduction)的故障诊断知识获取系统。通过收集故障现象并整理成由故障征兆、故障类型等组成的故障信息样本,应用RIPPER算法对故障进行分析得到故障诊断规则集文件,实现故障诊断系统知识的获取和自动更新,并能对旋转机械的常见故障进行诊断,验证了算法的合理性。
基金the funding by the Chinese Key National Science and Technology Program in the 12th Five-Year Period, grant 2012ZX10001006-002
文摘The epidemiology of HIV-1 varies in different areas of the world, and it is possible that this complexity may leave unique footprints in the viral genome. Thus, we attempted to find significant patterns in global HIV-1 genome sequences. By applying the rule inference algorithm RIPPER (Repeated Incremental Pruning to Produce Error Reduction) to multiple sequence alignments of Env sequences from four classes of compiled datasets, we generated four sets of signature patterns. We found that these patterns were able to distinguish southeastern Asian from non- southeastern Asian sequences with 97.5% accuracy, Chinese from non-Chinese sequences with 98.3% accuracy, African from non-African sequences with 88.4% accuracy, and southern African from non-southern African sequences with 91.2% accuracy. These patterns showed different associations with subtypes and with amino acid positions. In addition, some signature patterns were characteristic of the geographic area from which the sample was taken. Amino acid features corresponding to the phylogenetic clustering of HIV-1 sequences were consistent with some of the deduced patterns. Using a combination of patterns inferred from subtypes B, C, and all subtypes chimeric with CRF01_AE worldwide, we found that signature patterns of subtype C were extremely common in some sampled countries (for example, Zambia in southern Africa), which may hint at the origin of this HIV-1 subtype and the need to pay special attention to this area of Africa. Signature patterns of subtype B sequences were associated with different countries. Even more, there are distinct patterns at single position 21 with glycine, leucine and isoleucine corresponding to subtype C, B and all possible recombination forms chimeric with CRF01_AE, which also indicate distinct geographic features. Our method widens the scope of inference of signature from geographic, genetic, and genomic viewpoints. These findings may provide a valuable reference for epidemiological research or vaccine design.