The problem of spam short message (SMS) recognition involves many aspects of natural language pro- cessing. A good solution to solving the problem can not only improve the quality of people experiencing the mobile l...The problem of spam short message (SMS) recognition involves many aspects of natural language pro- cessing. A good solution to solving the problem can not only improve the quality of people experiencing the mobile life, but also has a positive role on promoting the analysis of short text occurring in current mobile applications, such as We- bchat and microblog. As spam SMSes have characteristics of sparsity, transformation and real-timedness, we propose three methods at different levels, i.e., recognition based on sym- bolic features, recognition based on text similarity, and recog- nition based on pattern matching. By combining these meth- ods, we obtain a multi-level approach to spam SMS recog- nition. In order to enrich the pattern base to reduce manual labor and time, we propose a quasi-pattern learning method, which utilizes quasi-pattern matching results in the pattern matching process. The method can learn many interesting and new patterns from the SMS corpus. Finally, a comprehensive analysis indicates that our spare SMS recognition approach achieves a precision rate as high as 95.18%, and a recall rate of 95.51%.展开更多
文摘The problem of spam short message (SMS) recognition involves many aspects of natural language pro- cessing. A good solution to solving the problem can not only improve the quality of people experiencing the mobile life, but also has a positive role on promoting the analysis of short text occurring in current mobile applications, such as We- bchat and microblog. As spam SMSes have characteristics of sparsity, transformation and real-timedness, we propose three methods at different levels, i.e., recognition based on sym- bolic features, recognition based on text similarity, and recog- nition based on pattern matching. By combining these meth- ods, we obtain a multi-level approach to spam SMS recog- nition. In order to enrich the pattern base to reduce manual labor and time, we propose a quasi-pattern learning method, which utilizes quasi-pattern matching results in the pattern matching process. The method can learn many interesting and new patterns from the SMS corpus. Finally, a comprehensive analysis indicates that our spare SMS recognition approach achieves a precision rate as high as 95.18%, and a recall rate of 95.51%.