It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limit...It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.展开更多
In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified fro...In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method.展开更多
Unsedated colonoscopy is available worldwide,but is not a routine option in the United States(US).We conducted a literature review supplemented by our experience and expert commentaries to provide data to support the ...Unsedated colonoscopy is available worldwide,but is not a routine option in the United States(US).We conducted a literature review supplemented by our experience and expert commentaries to provide data to support the use of unsedated colonoscopy for colorectal cancer screening.Medline data from 1966 to 2009 were searched to identify relevant articles on the subject.Data were summarized and co-authors provided critiques as well as accounts of unsedated colonoscopy for screening and surveillance.Diagnostic colonoscopy was initially dev eloped as an unsedated procedure.Procedure-re lated discomfort led to wide adoption of sedation in the US,although unsedated colonoscopy remains the usual practice elsewhere.The increased use of colonoscopy for colorectal cancer screening in healthy,asymptomatic individuals suggests a reass-essment of the burden of sedation in colonoscopy for screening is appropriate in the US for lowering costs and minimizing complications for patients.A water method developed to minimize discomfort has shown promise to enhance outcomes of unsedated colonoscopy.The use of scheduled,unsedated colono scopy in the US appears to be feasible for colorectal cancer screening.Studies to assess its applicability in diverse practice settings deserve to be conducted and supported.展开更多
Aim:To assess the efficacy and limitation of free/total prostate-specific antigen ratio(f/tPSA)at a single institution in Japan,focusing on the avoidance of pointless prostate biopsies.Methods:In total,631 men between...Aim:To assess the efficacy and limitation of free/total prostate-specific antigen ratio(f/tPSA)at a single institution in Japan,focusing on the avoidance of pointless prostate biopsies.Methods:In total,631 men between 44 and 93 years old(mean 69.8 years)with elevated PSA underwent power-Doppler ultrasoundgraphy-guided transrectal 10-core prostate biopsies at Niigata Cancer Center Hospital,and their histological features were investigated with total PSA (tPSA)and f/tPSA.Results:PCa was detected in 126 of 134 patients(94.3%)with tPSA of 26 ng/mL or higher.The detection rate was 59.4% for tPSA of 21-25 ng/mL,followed by 39.2% for 16-20 ng/mL,30.0% for 11-15 ng/mL, 20.0% for 4.1-10 ng/mL and 7.6% for≤4.0 ng/mL,f/tPSA of the PCa group was significantly lower than that of non-malignamt disorders in any tPSA ranges(mean 0.122 vs.0.160,P<0.001).Receiver-operating characteristics analyses showed that f/tPSA(AUC:0.664)performed more valuably than tPSA(AUC:0.559)in patients with tPSA between 3.0-10 ng/mL(P<0.01).Although f/tPSA of 0.250 for the cut-off value might miss 1.8% PCa patients,it potentially spares 9.2% of unnecessary biopsies.Conclusion:f/tPSA is more valuable compared with tPSA alone for the prediction of the occurrence of PCa.We recommend 0.250 as the cut-off value for f/tPSA in PCa screening for Asian men having so-called grey-zone tPSA.(Asian J Androl 2006 Jul;8:429-434)展开更多
It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable ...It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable feature screening method is very limited;to handle this non-trivial situation, we propose a model-free feature screening for ultrahigh-dimensional multi-classification with both categorical and continuous covariates. The proposed feature screening method will be based on Gini impurity to evaluate the prediction power of covariates. Under certain regularity conditions, it is proved that the proposed screening procedure possesses the sure screening property and ranking consistency properties. We demonstrate the finite sample performance of the proposed procedure by simulation studies and illustrate using real data analysis.展开更多
Colorectal cancer(CRC) is one of the most prevalent malignancies in the world. CRC-associated morbidity and mortality is continuously increasing, in part due to a lack of early detection. The existing screening tools ...Colorectal cancer(CRC) is one of the most prevalent malignancies in the world. CRC-associated morbidity and mortality is continuously increasing, in part due to a lack of early detection. The existing screening tools such as colonoscopy, are invasive and yet high cost, affecting the willingness of patients to participate in screening programs. In recent years, evidence is accumulating that the interaction of aberrant genetic and epigenetic modifications is the cornerstone for the CRC development and progression by alternating the function of tumor suppressor genes, DNA repair genes and oncogenes of colonic cells. Apart from the understanding of the underlying mechanism(s) of carcinogenesis, the aforementioned interaction has also allowed identification of clinical biomarkers, especially epigenetic, for the early detection and prognosis of cancer patients. One of the ways to detect these epigenetic biomarkers is the cell-free circulating DNA(circ DNA), a blood-based cancer diagnostic test, mainly focusing in the molecular alterations found in tumor cells, such as DNA mutations and DNA methylation.In this brief review, we epitomize the current knowledge on the research in circ DNA biomarkers-mainly focusing on DNA methylation-as potential blood-based tests for early detection of colorectal cancer and the challenges for validation and globally implementation of this emergent technology.展开更多
文摘It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.
文摘In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method.
基金Supported in part by Veterans Affairs Clinical Merit Medical Research Funds,the ASGE Career Development Award (FWL1985)American College of Gastroenterology Clinical Research Award(FWL 2009)
文摘Unsedated colonoscopy is available worldwide,but is not a routine option in the United States(US).We conducted a literature review supplemented by our experience and expert commentaries to provide data to support the use of unsedated colonoscopy for colorectal cancer screening.Medline data from 1966 to 2009 were searched to identify relevant articles on the subject.Data were summarized and co-authors provided critiques as well as accounts of unsedated colonoscopy for screening and surveillance.Diagnostic colonoscopy was initially dev eloped as an unsedated procedure.Procedure-re lated discomfort led to wide adoption of sedation in the US,although unsedated colonoscopy remains the usual practice elsewhere.The increased use of colonoscopy for colorectal cancer screening in healthy,asymptomatic individuals suggests a reass-essment of the burden of sedation in colonoscopy for screening is appropriate in the US for lowering costs and minimizing complications for patients.A water method developed to minimize discomfort has shown promise to enhance outcomes of unsedated colonoscopy.The use of scheduled,unsedated colono scopy in the US appears to be feasible for colorectal cancer screening.Studies to assess its applicability in diverse practice settings deserve to be conducted and supported.
文摘Aim:To assess the efficacy and limitation of free/total prostate-specific antigen ratio(f/tPSA)at a single institution in Japan,focusing on the avoidance of pointless prostate biopsies.Methods:In total,631 men between 44 and 93 years old(mean 69.8 years)with elevated PSA underwent power-Doppler ultrasoundgraphy-guided transrectal 10-core prostate biopsies at Niigata Cancer Center Hospital,and their histological features were investigated with total PSA (tPSA)and f/tPSA.Results:PCa was detected in 126 of 134 patients(94.3%)with tPSA of 26 ng/mL or higher.The detection rate was 59.4% for tPSA of 21-25 ng/mL,followed by 39.2% for 16-20 ng/mL,30.0% for 11-15 ng/mL, 20.0% for 4.1-10 ng/mL and 7.6% for≤4.0 ng/mL,f/tPSA of the PCa group was significantly lower than that of non-malignamt disorders in any tPSA ranges(mean 0.122 vs.0.160,P<0.001).Receiver-operating characteristics analyses showed that f/tPSA(AUC:0.664)performed more valuably than tPSA(AUC:0.559)in patients with tPSA between 3.0-10 ng/mL(P<0.01).Although f/tPSA of 0.250 for the cut-off value might miss 1.8% PCa patients,it potentially spares 9.2% of unnecessary biopsies.Conclusion:f/tPSA is more valuable compared with tPSA alone for the prediction of the occurrence of PCa.We recommend 0.250 as the cut-off value for f/tPSA in PCa screening for Asian men having so-called grey-zone tPSA.(Asian J Androl 2006 Jul;8:429-434)
文摘It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable feature screening method is very limited;to handle this non-trivial situation, we propose a model-free feature screening for ultrahigh-dimensional multi-classification with both categorical and continuous covariates. The proposed feature screening method will be based on Gini impurity to evaluate the prediction power of covariates. Under certain regularity conditions, it is proved that the proposed screening procedure possesses the sure screening property and ranking consistency properties. We demonstrate the finite sample performance of the proposed procedure by simulation studies and illustrate using real data analysis.
文摘Colorectal cancer(CRC) is one of the most prevalent malignancies in the world. CRC-associated morbidity and mortality is continuously increasing, in part due to a lack of early detection. The existing screening tools such as colonoscopy, are invasive and yet high cost, affecting the willingness of patients to participate in screening programs. In recent years, evidence is accumulating that the interaction of aberrant genetic and epigenetic modifications is the cornerstone for the CRC development and progression by alternating the function of tumor suppressor genes, DNA repair genes and oncogenes of colonic cells. Apart from the understanding of the underlying mechanism(s) of carcinogenesis, the aforementioned interaction has also allowed identification of clinical biomarkers, especially epigenetic, for the early detection and prognosis of cancer patients. One of the ways to detect these epigenetic biomarkers is the cell-free circulating DNA(circ DNA), a blood-based cancer diagnostic test, mainly focusing in the molecular alterations found in tumor cells, such as DNA mutations and DNA methylation.In this brief review, we epitomize the current knowledge on the research in circ DNA biomarkers-mainly focusing on DNA methylation-as potential blood-based tests for early detection of colorectal cancer and the challenges for validation and globally implementation of this emergent technology.