期刊文献+

概率数据上基于规则的分类器

Rule-Based Classifier for Probabilistic Data
下载PDF
导出
摘要 分类作为一类重要的数据挖掘问题被广泛地研究和应用,然而先前的研究主要针对确定数据上的分类问题,由于目前例如传感器等数据采集工具的普遍使用,概率数据广泛存在,在这类数据上进行分类研究十分必要。提出了一种新的概率数据模型,它既考虑了概率分布上的随机性,又包含了独立区间上的相似度;定义了一种新的辨识距离来衡量这类概率数据元组之间的距离;最后提出了概率数据上基于规则的分类算法,在基础分类算法上,引入了一种带有可变精度的分类算法来降低噪声或者扰动,提高了分类的精度。实验结果证明了该算法的有效性。 Classification as an important problem in data mining is widely studied and applied nowadays, but the previous study is mainly about classification on certain data. Since probabilistic data exist and are widely used in many fields, such as sensor data, it is necessary to do feature selection for probabilistic databases. Firstly, this paper proposes a new probabilistic data model, which considers not only the randomness but also the similarity of different intervals. Secondly, in order to do classification for such probabilistic data, this paper designs a discernible distance to measure the distance between such and develops a new variable distance tuples. Finally, this paper proposes a basic rule-based classification algorithm, to reduce classification sensitivity to noise or perturbation. The Experimental results verify the effectiveness of the proposed algorithm.
出处 《计算机科学与探索》 CSCD 2013年第7期639-648,共10页 Journal of Frontiers of Computer Science and Technology
基金 中央高校基本科研业务费专项资金 中国人民大学研究基金 No.12XNLF07~~
关键词 分类 随机性 概率数据 辨识距离 classification randomness probabilistic data discemible distance
  • 相关文献

参考文献15

  • 1Ruadys S, Jain A. Small sample size effects in statistical pattern recognition: recommendations for practitioners[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991, 13(3): 252-264.
  • 2Ichino M, Yaguchi H. Generalized Minkowski metrics for mixed feature-type data analysis [J]. IEEE Transactions on Systems, Man, and Cybernetics, 1994,24(4): 698-708.
  • 3Das Sarma A, Benjelloun 0, Halevy A, et al. Working models for uncertain data[C]//Proceedings of the 22nd International Conference on Data Engineering (ICDE '06), Apr 3-7, 2006. Washington, DC, USA: IEEE Computer Society, 2006: 7.
  • 4Kriegel H-P, Pfeifle M. Hierarchical density based clustering of uncertain data[C]//Proceedings of the 5th IEEE International Conference on Data Mining (lCDM '05). Washington, DC, USA: IEEE Computer Society, 2005: 689-692.
  • 5Zhang Qin, Li Feifei, Yi Ke. Finding frequent items in probabilistic data[C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. New York, NY, USA: ACM, 2008: 819-832.
  • 6Shannon C E. A mathematical theory of communication[J]. The Bell System Technical Journal, 1948,27: 379-423.
  • 7Kriegel H-P. Pfeifle M. Density-based clustering of uncertain data[C]//Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD '05). New York, NY, USA: ACM, 2005: 672-677.
  • 8Pei Bin, Zhao Suyun, Chen Hong, et al. FARP: mining fuzzy association rules from a probabilistic quantitative database[J]. Information Sciences, 2013, 237: 242-260.
  • 9Pei Bin, Zhao Tingting, Zhao Suyun, et al. Fuzzy associative classifier for probabilistic numerical data[C]//Proceedings of the 7th International Conference on Intelligent Systems and Knowledge Engineering (ISKE '12), Beijing, Dec 15-17, 2012. Berlin: Springer-Verlag, 2013.
  • 10Srikant R, Agrawal R. Mining generalized association mles[C]11 Proceedings of the 21 st International Conference on Very Large Data Bases (VLDB '95). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 1995: 407-419.

二级参考文献98

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2谷峪,于戈,张天成.RFID复杂事件处理技术[J].计算机科学与探索,2007,1(3):255-267. 被引量:54
  • 3Deshpande A, Guestrin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, 2004:588-599
  • 4Madhavan J, Cohen S, Xin D, Halevy A, Jeffery S, Ko D, Yu C. Web-scale data integration: You can afford to pay as you go//Proceedings of the 33rd Biennial Conference on Innovative Data Systems Research. Asilomar, 2007:342-350
  • 5Liu Ling. From data privacy to location privacy: Models and algorithms (tutorial)//Proceedings of the 33rd International Conference on Very Large Data bases. Vienna, 2007: 1429- 1430
  • 6Samarati P, Sweeney L. Generalizing data to provide anonymity when disclosing information (abstract)//Proeeedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Seattle, 1998:188
  • 7Cavallo R, Pittarelli M. The theory of probabilistic databases//Proceedings of the 13th International Conference on Very Large Data Bases. Brighton, 1987:71-81
  • 8Barbara D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487-502
  • 9Fuhr N, Rolleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, 1997, 15(1): 32-66
  • 10Zimanyi E. Query evaluation in probabilistic databases. Theoretical Computer Science, 1997, 171(1-2): 179-219

共引文献184

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部