ESA is an unsupervised approach to word segmentation previously proposed by Wang, which is an iterative process consisting of three phases: Evaluation, Selection and Adjustment. In this article, we propose Ex ESA, the...ESA is an unsupervised approach to word segmentation previously proposed by Wang, which is an iterative process consisting of three phases: Evaluation, Selection and Adjustment. In this article, we propose Ex ESA, the extension of ESA. In Ex ESA, the original approach is extended to a 2-pass process and the ratio of different word lengths is introduced as the third type of information combined with cohesion and separation. A maximum strategy is adopted to determine the best segmentation of a character sequence in the phrase of Selection. Besides, in Adjustment, Ex ESA re-evaluates separation information and individual information to overcome the overestimation frequencies. Additionally, a smoothing algorithm is applied to alleviate sparseness. The experiment results show that Ex ESA can further improve the performance and is time-saving by properly utilizing more information from un-annotated corpora. Moreover, the parameters of Ex ESA can be predicted by a set of empirical formulae or combined with the minimum description length principle.展开更多
Receiver operating characteristic (ROC) curve is often used to study and compare two- sample problems in medicine. When more information may be available on one treatment than the other, one can improve estimator of...Receiver operating characteristic (ROC) curve is often used to study and compare two- sample problems in medicine. When more information may be available on one treatment than the other, one can improve estimator of ROC curve if the auxiliary population information is taken into account. The authors show that the empirical likelihood method can be naturally adapted to make efficient use of the auxiliary information to such problems. The authors propose a smoothed empirical likelihood estimator for ROC curve with some auxiliary information in medical studies. The proposed estimates are more efficient than those ROC estimators without any auxiliary information, in the sense of comparing asymptotic variances and mean squared error (MSE). Some asymptotic properties for the empirical likelihood estimation of ROC curve are established. A simulation study is presented to demonstrate the performance of the proposed estimators.展开更多
Long noncoding RNAs(lncRNAs)play important roles in human diseases including vascular disease.Given the large number of lncRNAs,however,whether the majority of them are associated with vascular disease remains unknown...Long noncoding RNAs(lncRNAs)play important roles in human diseases including vascular disease.Given the large number of lncRNAs,however,whether the majority of them are associated with vascular disease remains unknown.For this purpose,here we present a genomic location based bioinformatics method to predict the lncRNAs associated with vascular disease.We applied the presented method to globally screen the human lncRNAs potentially involved in vascular disease.As a result,we predicted 3043 putative vascular disease associated lncRNAs.To test the accuracy of the method,we selected 10 lncRNAs predicted to be implicated in proliferation and migration of vascular smooth muscle cells(VSMCs)for further experimental validation.The results confirmed that eight of the 10 lncRNAs(80%)are validated.This result suggests that the presented method has a reliable prediction performance.Finally,the presented bioinformatics method and the predicted vascular disease associated lncRNAs together may provide helps for not only better understanding of the roles of lncRNAs in vascular disease but also the identification of novel molecules for the diagnosis and therapy of vascular disease.展开更多
基金supported in part by National Science Foundation of China under Grants No. 61303105 and 61402304the Humanity & Social Science general project of Ministry of Education under Grants No.14YJAZH046+2 种基金the Beijing Natural Science Foundation under Grants No. 4154065the Beijing Educational Committee Science and Technology Development Planned under Grants No.KM201410028017Beijing Key Disciplines of Computer Application Technology
文摘ESA is an unsupervised approach to word segmentation previously proposed by Wang, which is an iterative process consisting of three phases: Evaluation, Selection and Adjustment. In this article, we propose Ex ESA, the extension of ESA. In Ex ESA, the original approach is extended to a 2-pass process and the ratio of different word lengths is introduced as the third type of information combined with cohesion and separation. A maximum strategy is adopted to determine the best segmentation of a character sequence in the phrase of Selection. Besides, in Adjustment, Ex ESA re-evaluates separation information and individual information to overcome the overestimation frequencies. Additionally, a smoothing algorithm is applied to alleviate sparseness. The experiment results show that Ex ESA can further improve the performance and is time-saving by properly utilizing more information from un-annotated corpora. Moreover, the parameters of Ex ESA can be predicted by a set of empirical formulae or combined with the minimum description length principle.
基金This research was partially supported by National Natural Science Funds for Distinguished Young Scholar under Grant No. 70825004 and National Natural Science Foundation of China (NSFC) under Grant No. 10731010, the National Basic Research Program under Grant No. 2007CB814902, Creative Research Groups of China under Grant No.10721101 and Shanghai University of Finance and Economics through Project 211 Phase III and Shanghai Leading Academic Discipline Project under Grant No. B803.
文摘Receiver operating characteristic (ROC) curve is often used to study and compare two- sample problems in medicine. When more information may be available on one treatment than the other, one can improve estimator of ROC curve if the auxiliary population information is taken into account. The authors show that the empirical likelihood method can be naturally adapted to make efficient use of the auxiliary information to such problems. The authors propose a smoothed empirical likelihood estimator for ROC curve with some auxiliary information in medical studies. The proposed estimates are more efficient than those ROC estimators without any auxiliary information, in the sense of comparing asymptotic variances and mean squared error (MSE). Some asymptotic properties for the empirical likelihood estimation of ROC curve are established. A simulation study is presented to demonstrate the performance of the proposed estimators.
基金supported by the National Natural Science Foundation of China(91339106)National High Technology Research and Development Program of China(2014AA021102)
文摘Long noncoding RNAs(lncRNAs)play important roles in human diseases including vascular disease.Given the large number of lncRNAs,however,whether the majority of them are associated with vascular disease remains unknown.For this purpose,here we present a genomic location based bioinformatics method to predict the lncRNAs associated with vascular disease.We applied the presented method to globally screen the human lncRNAs potentially involved in vascular disease.As a result,we predicted 3043 putative vascular disease associated lncRNAs.To test the accuracy of the method,we selected 10 lncRNAs predicted to be implicated in proliferation and migration of vascular smooth muscle cells(VSMCs)for further experimental validation.The results confirmed that eight of the 10 lncRNAs(80%)are validated.This result suggests that the presented method has a reliable prediction performance.Finally,the presented bioinformatics method and the predicted vascular disease associated lncRNAs together may provide helps for not only better understanding of the roles of lncRNAs in vascular disease but also the identification of novel molecules for the diagnosis and therapy of vascular disease.