摘要
耦合流数据分类问题是当前数据挖掘与信息领域的热点和难点,引起国内外越来越多学者的关注,但现有研究成果大多依赖于从单个流数据中提取特征并进行分类,没有考虑到流数据内以及流数据间特征的相互依赖关系。基于此,借鉴生物信息学中基序查找的方法,本文提出了长期频率和逆文档频率的分类方法,该方法主要是将耦合流数据中每个输入流都转化为信号变化特征,以便有效地提取基序,通过计算基序的频率、长期频率与逆文档频率的权重,用以衡量不同输入耦合流数据的基序之间的时序关系,并利用基序与时序的关系实现对耦合流数据的分类,仿真实验的结果也证明了该方法的有效性。
Currently, coupled stream data classification is a very popular topic in data mining and information science, which has been attracted more and more domestic and abroad scholars. However, most of the existing research results are based on the feature extraction and classification from the single stream of data, and the dependency relations among the features within and across the streams are not taken into account. Due to this situation, referring to searching motif methods of bioinformatics, a classifying method applying long - run frequency and inverse document frequency is presented in this research. This method converts every input stream of the coupled stream data into a signal variation to extract the motif effectively. By calculating the frequency of the motif, the long - run frequency and the weight of inverse document frequency, the temporal relationships among the motifs of the input stream data can be approached, then the results can be used to classify the coupled stream data. The simulation results prove the effectiveness of the method.
出处
《情报学报》
CSSCI
北大核心
2013年第2期190-197,共8页
Journal of the China Society for Scientific and Technical Information
基金
本文得到中国博士后科学基金项目(基金号:20100481284)、山东省优秀中青年科学家科研奖励基金项目(基金号:BS2012SF024)、山东省博士后创新基金项目(基金号:201003083)资助.
关键词
基序
时序
耦合流数据
长期频率和逆文档频率
motifs, temporal motifs, coupled stream data, long -run frequency and inverse document frequency.