摘要
电子邮件分类一般采用向量空间模型来表示邮件,但是该模型只是基于独立词在邮件内容中出现的频率来建立的,而并未考虑邮件的结构特征,从而使得特征向量不能准确地表示邮件的内容。针对目前向量空间模型出现的这种缺陷,文中将粘合性衡量方法提取n-gram的思想运用于文本表示当中,对词的权重进行赋值,并以此模型设计了一个邮件分类系统,由于粘合性方法考虑到了邮件的结构特征,实例证明,这种方法能够提高系统的分类精确度。
Email classification often uses the Vector Space Model (VSM) as a tool to represent emails. This model is only based on frequencies of the words that disappear in the email. It ignores the structure of the email, therefore VSM can not express the email exactly. In order to overcome the shortcomings of the VSM, the idea that uses glue measure to extract n-grams is applied in this paper, which is then used to weight the words, and an email classification system is designed. Because the structure of email is considered in glue measure, the experiment shows that the new method can improve the precision of classification.
出处
《计算机仿真》
CSCD
2008年第2期121-123,共3页
Computer Simulation