传统离线数据分析方法对于处理即时性高和流量大的数据存在缺陷,而在线检测模型可以满足数据流分析的实时性要求。文中提出了一种基于多阈值模板的在线检测方法。该方法结合多路搜索树突变点检测(Ternary Search Tree and Kolmogorov-Sm...传统离线数据分析方法对于处理即时性高和流量大的数据存在缺陷,而在线检测模型可以满足数据流分析的实时性要求。文中提出了一种基于多阈值模板的在线检测方法。该方法结合多路搜索树突变点检测(Ternary Search Tree and Kolmogorov-Smirnov,TSTKS)算法进行在线检测,基于突变点密度更新窗口长度从而提高了突变点检测精度。采用等量分级策略实现对时序数据的自学习、匹配和分类,进而对大规模病变数据进行状态检测和预测。仿真实验和病变数据的实验结果表明,所提方法具有效果高、分类准确等优点,为大规模时序数据进行快速分类研究提供了新方法。展开更多
针对现阶段用电设备状态监测技术存在的处理速度较慢、准确率较低等问题,文中基于多突变点检测和模板匹配策略提出了一种用电设备在线状态监测方法。该方法在缓冲区模型和滑动窗口模型的基础上,利用多路搜索树突变点检测(Ternary Search...针对现阶段用电设备状态监测技术存在的处理速度较慢、准确率较低等问题,文中基于多突变点检测和模板匹配策略提出了一种用电设备在线状态监测方法。该方法在缓冲区模型和滑动窗口模型的基础上,利用多路搜索树突变点检测(Ternary Search Tree and Kolmogorov-Smirnov,TSTKS)算法形成窗口维度和缓冲区维度的特征向量,通过两种维度的模板匹配实现用电设备的运行状态匹配和状态切换时刻定位。基于家用电冰箱的仿真实验结果表明,所提方法具有检测速度快、准确率高等优点,可为用电设备状态监测领域提供参考。展开更多
针对在大规模时序医疗数据的分析中现有检测方法检测精度低、检测速度慢等问题,文中提出了一种基于深度学习的时序病变数据段分类方法。该方法在TSTKS(Ternary Search Trees and modified Kolmogorov-Smirnov)算法和滑动窗口理论的基础...针对在大规模时序医疗数据的分析中现有检测方法检测精度低、检测速度慢等问题,文中提出了一种基于深度学习的时序病变数据段分类方法。该方法在TSTKS(Ternary Search Trees and modified Kolmogorov-Smirnov)算法和滑动窗口理论的基础上,利用深度学习技术实现了对病变数据段的快速准确分类。文中以利用该方法对病变数据段进行分类的结果作为依据,实现了滑动窗口大小的动态调整。通过对真实癫痫脑电信号(Electroencephalogram,EEG)进行分析,证明了所提病变数据段分类方法和基于该分类方法的滑动窗口动态调整机制具有检测速度快、精度较高等优点,可以为大规模时序数据的快速分析研究提供一种新选择。展开更多
<span style="font-family:Verdana;">The sequence of the ribosomal RNA gene (rDNA) plays an important role in species identification and phylogenetic analysis. However, the only published </span>&l...<span style="font-family:Verdana;">The sequence of the ribosomal RNA gene (rDNA) plays an important role in species identification and phylogenetic analysis. However, the only published </span><span><span style="font-family:Verdana;">full-length sequence of a ribosomal gene of green algae is that of </span><i><span style="font-family:Verdana;">Ulva mutabilis</span></i><span style="font-family:Verdana;">.</span></span><span style="font-family:Verdana;"> In this study, we </span><a name="_Hlk17805857"></a><span style="font-family:Verdana;">amplified the full-length sequence of each ribosomal gene unit of the ribosomal gene of </span><i><span style="font-family:Verdana;">Blidingia minima</span></i><span style="font-family:Verdana;">. The full-length sequence of the ribosomal gene in </span><i><span style="font-family:Verdana;">Blidingia minima</span></i><span style="font-family:Verdana;"> is 8676 bp, including the 1759 bp 18S rDNA, 576 bp internal transcribed spacer (ITS) + 5.8S rDNA, 3282 bp 28S </span><span style="font-family:Verdana;">rDNA, and 3059 bp intergenic spacer (IGS) region. We then carried out a series</span><span style="font-family:Verdana;"> of genetic analyses based on the ITS and IGS sequences, to verify whether IGS </span><span><span style="font-family:Verdana;">sequences are useful for studying the genetic diversity of green algae from different locations. We amplified the IGS sequences of </span><i><span style="font-family:Verdana;">Blidingia minima</span></i><span style="font-family:Verdana;"> from 10 different locations in the Yellow Sea. Multiple alignments of the IGS sequences </span></span><span style="font-family:Verdana;">of samples from these 10 different sites revealed varying degrees of base </span><span><span style="font-family:Verdana;">differences, and comparative analysis of the ITS sequences revealed that our amplified species was classified as </span><i><span style="font-family:Verdana;">Blidingia minima</span></i><span style="font-family:Verdana;"> and distinct from other green algae. In conclusion, our full-length amplified ribosomal gene provides useful information to enrich the data on green algae ribosomal genes and provides an effective mo</span></span><span style="font-family:Verdana;">lecular marker for the analysis of the interspecies and intraspecies relationships of </span><i><span style="font-family:Verdana;">Blidingia minima</span></i><span style="font-family:Verdana;">.</span>展开更多
For high-dimensional nonparametric Behrens-Fisher problem in which the data dimension is larger than the sample size,the authors propose two test statistics in which one is U-statistic Rankbased Test(URT)and another i...For high-dimensional nonparametric Behrens-Fisher problem in which the data dimension is larger than the sample size,the authors propose two test statistics in which one is U-statistic Rankbased Test(URT)and another is Cauchy Combination Test(CCT).CCT is analogous to the maximumtype test,while URT takes into account the sum of squares of differences of ranked samples in different dimensions,which is free of shapes of distributions and robust to outliers.The asymptotic distribution of URT is derived and the closed form for calculating the statistical significance of CCT is given.Extensive simulation studies are conducted to evaluate the finite sample power performance of the statistics by comparing with the existing method.The simulation results show that our URT is robust and powerful method,meanwhile,its practicability and effectiveness can be illustrated by an application to the gene expression data.展开更多
文摘传统离线数据分析方法对于处理即时性高和流量大的数据存在缺陷,而在线检测模型可以满足数据流分析的实时性要求。文中提出了一种基于多阈值模板的在线检测方法。该方法结合多路搜索树突变点检测(Ternary Search Tree and Kolmogorov-Smirnov,TSTKS)算法进行在线检测,基于突变点密度更新窗口长度从而提高了突变点检测精度。采用等量分级策略实现对时序数据的自学习、匹配和分类,进而对大规模病变数据进行状态检测和预测。仿真实验和病变数据的实验结果表明,所提方法具有效果高、分类准确等优点,为大规模时序数据进行快速分类研究提供了新方法。
文摘针对现阶段用电设备状态监测技术存在的处理速度较慢、准确率较低等问题,文中基于多突变点检测和模板匹配策略提出了一种用电设备在线状态监测方法。该方法在缓冲区模型和滑动窗口模型的基础上,利用多路搜索树突变点检测(Ternary Search Tree and Kolmogorov-Smirnov,TSTKS)算法形成窗口维度和缓冲区维度的特征向量,通过两种维度的模板匹配实现用电设备的运行状态匹配和状态切换时刻定位。基于家用电冰箱的仿真实验结果表明,所提方法具有检测速度快、准确率高等优点,可为用电设备状态监测领域提供参考。
文摘针对在大规模时序医疗数据的分析中现有检测方法检测精度低、检测速度慢等问题,文中提出了一种基于深度学习的时序病变数据段分类方法。该方法在TSTKS(Ternary Search Trees and modified Kolmogorov-Smirnov)算法和滑动窗口理论的基础上,利用深度学习技术实现了对病变数据段的快速准确分类。文中以利用该方法对病变数据段进行分类的结果作为依据,实现了滑动窗口大小的动态调整。通过对真实癫痫脑电信号(Electroencephalogram,EEG)进行分析,证明了所提病变数据段分类方法和基于该分类方法的滑动窗口动态调整机制具有检测速度快、精度较高等优点,可以为大规模时序数据的快速分析研究提供一种新选择。
文摘<span style="font-family:Verdana;">The sequence of the ribosomal RNA gene (rDNA) plays an important role in species identification and phylogenetic analysis. However, the only published </span><span><span style="font-family:Verdana;">full-length sequence of a ribosomal gene of green algae is that of </span><i><span style="font-family:Verdana;">Ulva mutabilis</span></i><span style="font-family:Verdana;">.</span></span><span style="font-family:Verdana;"> In this study, we </span><a name="_Hlk17805857"></a><span style="font-family:Verdana;">amplified the full-length sequence of each ribosomal gene unit of the ribosomal gene of </span><i><span style="font-family:Verdana;">Blidingia minima</span></i><span style="font-family:Verdana;">. The full-length sequence of the ribosomal gene in </span><i><span style="font-family:Verdana;">Blidingia minima</span></i><span style="font-family:Verdana;"> is 8676 bp, including the 1759 bp 18S rDNA, 576 bp internal transcribed spacer (ITS) + 5.8S rDNA, 3282 bp 28S </span><span style="font-family:Verdana;">rDNA, and 3059 bp intergenic spacer (IGS) region. We then carried out a series</span><span style="font-family:Verdana;"> of genetic analyses based on the ITS and IGS sequences, to verify whether IGS </span><span><span style="font-family:Verdana;">sequences are useful for studying the genetic diversity of green algae from different locations. We amplified the IGS sequences of </span><i><span style="font-family:Verdana;">Blidingia minima</span></i><span style="font-family:Verdana;"> from 10 different locations in the Yellow Sea. Multiple alignments of the IGS sequences </span></span><span style="font-family:Verdana;">of samples from these 10 different sites revealed varying degrees of base </span><span><span style="font-family:Verdana;">differences, and comparative analysis of the ITS sequences revealed that our amplified species was classified as </span><i><span style="font-family:Verdana;">Blidingia minima</span></i><span style="font-family:Verdana;"> and distinct from other green algae. In conclusion, our full-length amplified ribosomal gene provides useful information to enrich the data on green algae ribosomal genes and provides an effective mo</span></span><span style="font-family:Verdana;">lecular marker for the analysis of the interspecies and intraspecies relationships of </span><i><span style="font-family:Verdana;">Blidingia minima</span></i><span style="font-family:Verdana;">.</span>
基金supported by Beijing Natural Science Foundation under Grant No.Z180006the National Nature Science Foundation of China under Grant No.11722113。
文摘For high-dimensional nonparametric Behrens-Fisher problem in which the data dimension is larger than the sample size,the authors propose two test statistics in which one is U-statistic Rankbased Test(URT)and another is Cauchy Combination Test(CCT).CCT is analogous to the maximumtype test,while URT takes into account the sum of squares of differences of ranked samples in different dimensions,which is free of shapes of distributions and robust to outliers.The asymptotic distribution of URT is derived and the closed form for calculating the statistical significance of CCT is given.Extensive simulation studies are conducted to evaluate the finite sample power performance of the statistics by comparing with the existing method.The simulation results show that our URT is robust and powerful method,meanwhile,its practicability and effectiveness can be illustrated by an application to the gene expression data.