A maximum test in lieu of forcing a choice between the two dependent samples t-test and Wilcoxon signed-ranks test is proposed. The maximum test, which requires a new table of critical values, maintains nominal α whi...A maximum test in lieu of forcing a choice between the two dependent samples t-test and Wilcoxon signed-ranks test is proposed. The maximum test, which requires a new table of critical values, maintains nominal α while guaranteeing the maximum power of the two constituent tests. Critical values, obtained via Monte Carlo methods, are uniformly smaller than the Bonferroni-Dunn adjustment, giving it power superiority when testing for treatment alternatives of shift in location parameter when data are sampled from non-normal distributions.展开更多
Feature selection(FS)(or feature dimensional reduction,or feature optimization)is an essential process in pattern recognition and machine learning because of its enhanced classification speed and accuracy and reduced ...Feature selection(FS)(or feature dimensional reduction,or feature optimization)is an essential process in pattern recognition and machine learning because of its enhanced classification speed and accuracy and reduced system complexity.FS reduces the number of features extracted in the feature extraction phase by reducing highly correlated features,retaining features with high information gain,and removing features with no weights in classification.In this work,an FS filter-type statistical method is designed and implemented,utilizing a t-test to decrease the convergence between feature subsets by calculating the quality of performance value(QoPV).The approach utilizes the well-designed fitness function to calculate the strength of recognition value(SoRV).The two values are used to rank all features according to the final weight(FW)calculated for each feature subset using a function that prioritizes feature subsets with high SoRV values.An FW is assigned to each feature subset,and those with FWs less than a predefined threshold are removed from the feature subset domain.Experiments are implemented on three datasets:Ryerson Audio-Visual Database of Emotional Speech and Song,Berlin,and Surrey Audio-Visual Expressed Emotion.The performance of the F-test and F-score FS methods are compared to those of the proposed method.Tests are also conducted on a system before and after deploying the FS methods.Results demonstrate the comparative efficiency of the proposed method.The complexity of the system is calculated based on the time overhead required before and after FS.Results show that the proposed method can reduce system complexity.展开更多
Objective: To take advantage of Epi Info to manage and analyze disease data. Methods: After selecting the living examples for independent sample T-Test, Epi Info 5.00, 5.01a(Chinese), 6.00, 6.04 b, 6.04 d, 2000, 2...Objective: To take advantage of Epi Info to manage and analyze disease data. Methods: After selecting the living examples for independent sample T-Test, Epi Info 5.00, 5.01a(Chinese), 6.00, 6.04 b, 6.04 d, 2000, 2002(Chinese)were used to do the independent sample T-Test on the two living examples. Then, Intercooled StataT, Microsoft Excel(2002) and SPSS 10.0 for Windows were used to verify the results. Results: The statistical results from the Epi Info 5.00, 5.01a(Chinese) and 6.00 were the same while that from Epi Info 6.04 b, 6.04 d, 2000,2002 (Chinese), Intercooled StataT, Microsoft Excel(2002) and SPSS 10.0 for Windows were identical In independent sample T-Test,. But the former result is different with the latter. Conclusion: There was an error in the result of T-Test from Epi Info 6.00 and former versions. Thus, it's important to be alert to select the versions of Epi Info to manage and analyze disease data. On the other hand, it's also quite pivotal to distinguish the adopted Epi Info versions when referring to the articles ane use the statistical results.展开更多
Single nucleotide polymorphisms (SNPs) are genetic variations that determine the differences between any two unrelated individuals. Various population groups can be distinguished from each other using SNPs. For inst...Single nucleotide polymorphisms (SNPs) are genetic variations that determine the differences between any two unrelated individuals. Various population groups can be distinguished from each other using SNPs. For instance, the HapMap dataset has four population groups with about ten million SNPs. For more insights on human evolution, ethnic variation, and population assignment, we propose to find out which SNPs are significant in determining the population groups and then to classify different populations using these relevant SNPs as input features. In this study, we developed a modified t-test ranking measure and applied it to the HapMap genotype data. Firstly, we rank all SNPs in comparison with other feature importance measures including F-statistics and the informativeness for assignment. Secondly, we select different numbers of the most highly ranked SNPs as the input to a classifier, such as the support vector machine, so as to find the best feature subset corresponding to the best classification accuracy. Experimental results showed that the proposed method is very effective in finding SNPs that are significant in determining the population groups, with reduced computational burden and better classification accuracy.展开更多
Objective: To investigate the toxicity difference between raw and processed Pinelliae Rhizoma(Banxia in Chinese, BX), the rhizoma of Pinellia ternata, from the view of chemical composition.Methods: Sixteen samples of ...Objective: To investigate the toxicity difference between raw and processed Pinelliae Rhizoma(Banxia in Chinese, BX), the rhizoma of Pinellia ternata, from the view of chemical composition.Methods: Sixteen samples of raw and processed BX were prepared and analyzed by UPLC/Q-TOF-MS/MS.The discrimination(chemical marker) between the two group was investigated by principal component analysis(PCA) and T-test analysis. According to the accurate charge-to-mass ratio, MS/MS fragments, and comparison of corresponding data with the reference or database, the chemical markers were identified preliminarily.Results: Liquiritin, liquiritigenin, and lysophosphatidylcholine(LPC) were identified as the characteristic markers. The reducing of LPC in processed BX was one of the main reasons for detoxification because LPC could induce the inflammatory response;Liquiritin and liquiritigenin showed the anti-inflammatory effect and reduced liver injury, therefore the appearance of them in processed BX was an another reason for detoxification.Conclusion: An approach to explain the mechanisms of reducing the toxicity in medicinal plants by processing was proposed. Moreover, the chemical markers of toxicity could be used to differentiate the raw material from processed herbs for the quality control and safety application in clinical practice.展开更多
文摘A maximum test in lieu of forcing a choice between the two dependent samples t-test and Wilcoxon signed-ranks test is proposed. The maximum test, which requires a new table of critical values, maintains nominal α while guaranteeing the maximum power of the two constituent tests. Critical values, obtained via Monte Carlo methods, are uniformly smaller than the Bonferroni-Dunn adjustment, giving it power superiority when testing for treatment alternatives of shift in location parameter when data are sampled from non-normal distributions.
文摘Feature selection(FS)(or feature dimensional reduction,or feature optimization)is an essential process in pattern recognition and machine learning because of its enhanced classification speed and accuracy and reduced system complexity.FS reduces the number of features extracted in the feature extraction phase by reducing highly correlated features,retaining features with high information gain,and removing features with no weights in classification.In this work,an FS filter-type statistical method is designed and implemented,utilizing a t-test to decrease the convergence between feature subsets by calculating the quality of performance value(QoPV).The approach utilizes the well-designed fitness function to calculate the strength of recognition value(SoRV).The two values are used to rank all features according to the final weight(FW)calculated for each feature subset using a function that prioritizes feature subsets with high SoRV values.An FW is assigned to each feature subset,and those with FWs less than a predefined threshold are removed from the feature subset domain.Experiments are implemented on three datasets:Ryerson Audio-Visual Database of Emotional Speech and Song,Berlin,and Surrey Audio-Visual Expressed Emotion.The performance of the F-test and F-score FS methods are compared to those of the proposed method.Tests are also conducted on a system before and after deploying the FS methods.Results demonstrate the comparative efficiency of the proposed method.The complexity of the system is calculated based on the time overhead required before and after FS.Results show that the proposed method can reduce system complexity.
文摘Objective: To take advantage of Epi Info to manage and analyze disease data. Methods: After selecting the living examples for independent sample T-Test, Epi Info 5.00, 5.01a(Chinese), 6.00, 6.04 b, 6.04 d, 2000, 2002(Chinese)were used to do the independent sample T-Test on the two living examples. Then, Intercooled StataT, Microsoft Excel(2002) and SPSS 10.0 for Windows were used to verify the results. Results: The statistical results from the Epi Info 5.00, 5.01a(Chinese) and 6.00 were the same while that from Epi Info 6.04 b, 6.04 d, 2000,2002 (Chinese), Intercooled StataT, Microsoft Excel(2002) and SPSS 10.0 for Windows were identical In independent sample T-Test,. But the former result is different with the latter. Conclusion: There was an error in the result of T-Test from Epi Info 6.00 and former versions. Thus, it's important to be alert to select the versions of Epi Info to manage and analyze disease data. On the other hand, it's also quite pivotal to distinguish the adopted Epi Info versions when referring to the articles ane use the statistical results.
文摘Single nucleotide polymorphisms (SNPs) are genetic variations that determine the differences between any two unrelated individuals. Various population groups can be distinguished from each other using SNPs. For instance, the HapMap dataset has four population groups with about ten million SNPs. For more insights on human evolution, ethnic variation, and population assignment, we propose to find out which SNPs are significant in determining the population groups and then to classify different populations using these relevant SNPs as input features. In this study, we developed a modified t-test ranking measure and applied it to the HapMap genotype data. Firstly, we rank all SNPs in comparison with other feature importance measures including F-statistics and the informativeness for assignment. Secondly, we select different numbers of the most highly ranked SNPs as the input to a classifier, such as the support vector machine, so as to find the best feature subset corresponding to the best classification accuracy. Experimental results showed that the proposed method is very effective in finding SNPs that are significant in determining the population groups, with reduced computational burden and better classification accuracy.
基金supported by the National Natural Science Foundation of China (No. 81460595)
文摘Objective: To investigate the toxicity difference between raw and processed Pinelliae Rhizoma(Banxia in Chinese, BX), the rhizoma of Pinellia ternata, from the view of chemical composition.Methods: Sixteen samples of raw and processed BX were prepared and analyzed by UPLC/Q-TOF-MS/MS.The discrimination(chemical marker) between the two group was investigated by principal component analysis(PCA) and T-test analysis. According to the accurate charge-to-mass ratio, MS/MS fragments, and comparison of corresponding data with the reference or database, the chemical markers were identified preliminarily.Results: Liquiritin, liquiritigenin, and lysophosphatidylcholine(LPC) were identified as the characteristic markers. The reducing of LPC in processed BX was one of the main reasons for detoxification because LPC could induce the inflammatory response;Liquiritin and liquiritigenin showed the anti-inflammatory effect and reduced liver injury, therefore the appearance of them in processed BX was an another reason for detoxification.Conclusion: An approach to explain the mechanisms of reducing the toxicity in medicinal plants by processing was proposed. Moreover, the chemical markers of toxicity could be used to differentiate the raw material from processed herbs for the quality control and safety application in clinical practice.