概率数据上基于规则的分类器

Rule-Based Classifier for Probabilistic Data

下载PDF

导出

摘要分类作为一类重要的数据挖掘问题被广泛地研究和应用,然而先前的研究主要针对确定数据上的分类问题,由于目前例如传感器等数据采集工具的普遍使用,概率数据广泛存在,在这类数据上进行分类研究十分必要。提出了一种新的概率数据模型,它既考虑了概率分布上的随机性,又包含了独立区间上的相似度;定义了一种新的辨识距离来衡量这类概率数据元组之间的距离;最后提出了概率数据上基于规则的分类算法,在基础分类算法上,引入了一种带有可变精度的分类算法来降低噪声或者扰动,提高了分类的精度。实验结果证明了该算法的有效性。 Classification as an important problem in data mining is widely studied and applied nowadays, but the previous study is mainly about classification on certain data. Since probabilistic data exist and are widely used in many fields, such as sensor data, it is necessary to do feature selection for probabilistic databases. Firstly, this paper proposes a new probabilistic data model, which considers not only the randomness but also the similarity of different intervals. Secondly, in order to do classification for such probabilistic data, this paper designs a discernible distance to measure the distance between such and develops a new variable distance tuples. Finally, this paper proposes a basic rule-based classification algorithm, to reduce classification sensitivity to noise or perturbation. The Experimental results verify the effectiveness of the proposed algorithm.

作者赵婷婷赵素云裴斌陈红李翠平

机构地区中国人民大学数据工程与知识工程教育部重点实验室中国人民大学信息学院解放军陆军军官学院计算机教研室

出处《计算机科学与探索》 CSCD 2013年第7期639-648,共10页 Journal of Frontiers of Computer Science and Technology

基金中央高校基本科研业务费专项资金中国人民大学研究基金 No.12XNLF07~~

关键词分类随机性概率数据辨识距离 classification randomness probabilistic data discemible distance

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献15

1Ruadys S, Jain A. Small sample size effects in statistical pattern recognition: recommendations for practitioners[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991, 13(3): 252-264.
2Ichino M, Yaguchi H. Generalized Minkowski metrics for mixed feature-type data analysis [J]. IEEE Transactions on Systems, Man, and Cybernetics, 1994,24(4): 698-708.
3Das Sarma A, Benjelloun 0, Halevy A, et al. Working models for uncertain data[C]//Proceedings of the 22nd International Conference on Data Engineering (ICDE '06), Apr 3-7, 2006. Washington, DC, USA: IEEE Computer Society, 2006: 7.
4Kriegel H-P, Pfeifle M. Hierarchical density based clustering of uncertain data[C]//Proceedings of the 5th IEEE International Conference on Data Mining (lCDM '05). Washington, DC, USA: IEEE Computer Society, 2005: 689-692.
5Zhang Qin, Li Feifei, Yi Ke. Finding frequent items in probabilistic data[C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. New York, NY, USA: ACM, 2008: 819-832.
6Shannon C E. A mathematical theory of communication[J]. The Bell System Technical Journal, 1948,27: 379-423.
7Kriegel H-P. Pfeifle M. Density-based clustering of uncertain data[C]//Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD '05). New York, NY, USA: ACM, 2005: 672-677.
8Pei Bin, Zhao Suyun, Chen Hong, et al. FARP: mining fuzzy association rules from a probabilistic quantitative database[J]. Information Sciences, 2013, 237: 242-260.
9Pei Bin, Zhao Tingting, Zhao Suyun, et al. Fuzzy associative classifier for probabilistic numerical data[C]//Proceedings of the 7th International Conference on Intelligent Systems and Knowledge Engineering (ISKE '12), Beijing, Dec 15-17, 2012. Berlin: Springer-Verlag, 2013.
10Srikant R, Agrawal R. Mining generalized association mles[C]11 Proceedings of the 21 st International Conference on Very Large Data Bases (VLDB '95). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 1995: 407-419.

二级参考文献98

1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量：161
2谷峪,于戈,张天成.RFID复杂事件处理技术[J].计算机科学与探索,2007,1(3):255-267. 被引量：54
3Deshpande A, Guestrin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, 2004:588-599
4Madhavan J, Cohen S, Xin D, Halevy A, Jeffery S, Ko D, Yu C. Web-scale data integration: You can afford to pay as you go//Proceedings of the 33rd Biennial Conference on Innovative Data Systems Research. Asilomar, 2007:342-350
5Liu Ling. From data privacy to location privacy: Models and algorithms (tutorial)//Proceedings of the 33rd International Conference on Very Large Data bases. Vienna, 2007: 1429- 1430
6Samarati P, Sweeney L. Generalizing data to provide anonymity when disclosing information (abstract)//Proeeedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Seattle, 1998:188
7Cavallo R, Pittarelli M. The theory of probabilistic databases//Proceedings of the 13th International Conference on Very Large Data Bases. Brighton, 1987:71-81
8Barbara D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487-502
9Fuhr N, Rolleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, 1997, 15(1): 32-66
10Zimanyi E. Query evaluation in probabilistic databases. Theoretical Computer Science, 1997, 171(1-2): 179-219

共引文献184

1刘正伟,文中领,张海涛.云计算和云数据管理技术[J].计算机研究与发展,2012,49(S1):26-31. 被引量：170
2刘殷雷,刘玉葆,陈程.不确定性数据流上频繁项集挖掘的有效算法[J].计算机研究与发展,2011,48(S3):1-7. 被引量：14
3何明,李薇.基于概率信息抽取模型的Top-k查询[J].计算机研究与发展,2011,48(S3):224-231.
4杜凌霞,李翠平,陈红,张应龙.概率图上的对象相似度计算[J].计算机研究与发展,2011,48(S3):326-333. 被引量：1
5叶杰敏,刘国华,貟慧,石丹妮,吴云龙,费凡.Attribute-or模型下不确定关系的无损分解算法[J].计算机研究与发展,2013,50(S1):117-124. 被引量：1
6于洋,赵志滨,鲍玉斌,于戈.面向属性级不确定数据的U-Topk查询优化算法的研究[J].计算机研究与发展,2013,50(S1):125-132.
7梁俊杰,熊亚军.以固态硬盘为缓存的存储技术研究[J].微电子学与计算机,2015,32(1):40-44. 被引量：2
8岳昆,刘惟一,周丽萍.EQPN:数据中不确定性知识的定性表示及推理[J].云南大学学报（自然科学版）,2010,32(S1):340-344.
9张硕,高宏,李建中,邹兆年.不确定图数据库中高效查询处理[J].计算机学报,2009,32(10):2066-2079. 被引量：24
10岳昆,刘惟一.不确定性知识的定性表示、推理及其应用——定性概率网研究综述[J].云南大学学报（自然科学版）,2009,31(6):560-570. 被引量：5

1马安香,张长胜,张斌,张晓红.一种求解分类问题的自适应人工蜂群算法[J].吉林大学学报（工学版）,2016,46(1):252-258. 被引量：2
2郑祺,黄德才.基于引力相似度和相对密度的不确定数据流聚类[J].上海交通大学学报,2016,50(6):873-878. 被引量：5
3何丽娟,周鸣争,陶皖,江自兵.无线传感器网络中不确定数据的估计算法[J].计算机工程与应用,2011,47(28):100-102. 被引量：3
4吴杰.基于单片机的数据采集系统设计[J].无线互联科技,2015,12(20):60-65. 被引量：3
5杜亮,张铁,戴孝亮.激光跟踪仪测量距离误差的机器人运动学参数补偿[J].红外与激光工程,2015,44(8):2351-2357. 被引量：17
6石国强,牛常勇,范明.使用PCA建立基于规则的组合分类器[J].计算机科学与探索,2010,4(5):455-463.
7李俊丽,白尚旺.概率数据模型查询方法的研究[J].软件导刊,2010(8):49-51.
8朱晓燕,宋擒豹.基于排序的关联分类算法[J].计算机科学,2009,36(7):204-207. 被引量：6
9严小卫,袁鼎荣,苏毅娟.概率数据模型的不充分性[J].广西师范大学学报（自然科学版）,1999,17(2):38-42.
10李志义,沈之锐,义梅练.贝叶斯分类算法在社交网站信息过滤中的应用分析[J].图书情报工作,2014,58(13):100-106. 被引量：5

计算机科学与探索

2013年第7期

浏览历史

内容加载中请稍等...

概率数据上基于规则的分类器

参考文献15

二级参考文献98

共引文献184

相关作者

相关机构

相关主题

浏览历史