This paper proposes a novel approach to comment spam identification based on content analysis. Three main features including the number of links, content repetitiveness, and text similarity are used for comment spam i...This paper proposes a novel approach to comment spam identification based on content analysis. Three main features including the number of links, content repetitiveness, and text similarity are used for comment spam identification. In practice, content repetitiveness is determined by the length and frequency of the longest common substring. Furthermore, text similarity is calculated using vector space model. The precisions of preliminary experiments on comment spam identification conducted on Chinese and English are as high as 93% and 82% respectively. The results show the validity and language independency of this approach. Compared with conventional spam filtering approaches, our method requires no training, no rule sets and no link relationships. The proposed approach can also deal with new comments as well as existing comments.展开更多
Large-eddy simulation developments and validations are presented for an improved simulation of turbulent internal flows. Numerical methods are proposed according to two competing criteria: numerical qualities (preci...Large-eddy simulation developments and validations are presented for an improved simulation of turbulent internal flows. Numerical methods are proposed according to two competing criteria: numerical qualities (precision and spectral characteristics), and adaptability to complex configurations. First, methods are tested on academic test-cases, in order to abridge with fundamental studies. Consistent results are obtained using adaptable finite volume method, with higher order advection fluxes, implicit grid filtering and "low-cost" shear-improved Smagorinsky model. This analysis particularly focuses on mean flow, fluctuations, two-point correlations and spectra. Moreover, it is shown that exponential averaging is a promising tool for LES implementation in complex geometry with deterministic unsteadiness. Finally, adaptability of the method is demonstrated by application to a configuration representative of blade-tip clearance flow in a turbomachine.展开更多
The statistical inference for generalized mixed-effects state space models (MESSM) are investigated when the random effects are unknown. Two filtering algorithms are designed both of which are based on mixture Kalma...The statistical inference for generalized mixed-effects state space models (MESSM) are investigated when the random effects are unknown. Two filtering algorithms are designed both of which are based on mixture Kalman filter. These algorithms are particularly useful when the longitudinal ts are sparse. The authors also propose a globally convergent algorithm for parameter estimation of MESSM which can be used to locate the initial value of parameters for local while more efficient algorithms. Simulation examples are carried out which validate the efficacy of the proposed approaches. A data set from the clinical trial is investigated and a smaller mean square error is achieved compared to the existing results in literatures.展开更多
We study the subspace identification for the continuous-time errors-in-variables model from sampled data.First,the filtering approach is applied to handle the time-derivative problem inherent in continuous-time identi...We study the subspace identification for the continuous-time errors-in-variables model from sampled data.First,the filtering approach is applied to handle the time-derivative problem inherent in continuous-time identification.The generalized Poisson moment functional is focused.A total least squares equation based on this filtering approach is derived.Inspired by the idea of discrete-time subspace identification based on principal component analysis,we develop two algorithms to deliver consistent estimates for the continuous-time errors-in-variables model by introducing two different instrumental variables.Order determination and other instrumental variables are discussed.The usefulness of the proposed algorithms is illustrated through numerical simulation.展开更多
基金Supported by the National Natural Science Foundation of China (No.60736044, 60803094)
文摘This paper proposes a novel approach to comment spam identification based on content analysis. Three main features including the number of links, content repetitiveness, and text similarity are used for comment spam identification. In practice, content repetitiveness is determined by the length and frequency of the longest common substring. Furthermore, text similarity is calculated using vector space model. The precisions of preliminary experiments on comment spam identification conducted on Chinese and English are as high as 93% and 82% respectively. The results show the validity and language independency of this approach. Compared with conventional spam filtering approaches, our method requires no training, no rule sets and no link relationships. The proposed approach can also deal with new comments as well as existing comments.
文摘Large-eddy simulation developments and validations are presented for an improved simulation of turbulent internal flows. Numerical methods are proposed according to two competing criteria: numerical qualities (precision and spectral characteristics), and adaptability to complex configurations. First, methods are tested on academic test-cases, in order to abridge with fundamental studies. Consistent results are obtained using adaptable finite volume method, with higher order advection fluxes, implicit grid filtering and "low-cost" shear-improved Smagorinsky model. This analysis particularly focuses on mean flow, fluctuations, two-point correlations and spectra. Moreover, it is shown that exponential averaging is a promising tool for LES implementation in complex geometry with deterministic unsteadiness. Finally, adaptability of the method is demonstrated by application to a configuration representative of blade-tip clearance flow in a turbomachine.
基金supported by the National Natural Science Foundation of China under Grant No.71271165
文摘The statistical inference for generalized mixed-effects state space models (MESSM) are investigated when the random effects are unknown. Two filtering algorithms are designed both of which are based on mixture Kalman filter. These algorithms are particularly useful when the longitudinal ts are sparse. The authors also propose a globally convergent algorithm for parameter estimation of MESSM which can be used to locate the initial value of parameters for local while more efficient algorithms. Simulation examples are carried out which validate the efficacy of the proposed approaches. A data set from the clinical trial is investigated and a smaller mean square error is achieved compared to the existing results in literatures.
基金supported by the National Natural Science Foundation of China (Nos.60674086 and 60736021)the Scientific and Technology Plan of Zhejiang Province,China (No.2007C21173)
文摘We study the subspace identification for the continuous-time errors-in-variables model from sampled data.First,the filtering approach is applied to handle the time-derivative problem inherent in continuous-time identification.The generalized Poisson moment functional is focused.A total least squares equation based on this filtering approach is derived.Inspired by the idea of discrete-time subspace identification based on principal component analysis,we develop two algorithms to deliver consistent estimates for the continuous-time errors-in-variables model by introducing two different instrumental variables.Order determination and other instrumental variables are discussed.The usefulness of the proposed algorithms is illustrated through numerical simulation.