一种语义数据的核分类方法被引量：1

A Kernel-Based Classification Method for Nominal Data

下载PDF

导出

摘要语义数据的内积计算是个难点问题,制约了有关语义数据的核分类方法的研究和发展。针对此问题,通过给出一种语义数据相异性度量测度的新定义、计算语义数据内积的简化方法、研究核方法和支撑向量机中的核函数的本质,提出了一种语义数据的核分类方法,并把方法向语义数据、连续属性构成的异构数据的分类问题进行了拓展。仿真实验表明方法具有一定的抗离群数据干扰能力,方法的总体性能优于文献中已有的其他方法。通过在异常检测领域中的应用研究,说明方法能高效地实现不平衡数据的分类,具有一定的实用价值。 A kernel-based nominal data classification（KNDC） method is proposed with a new distance definition and a simple inner product computing method in this paper.It＇s insensitivity to outliers and classification capability to unbalanced data in real datasets are further analyzed.The calculation on inner product of nominal data is difficult,often regarded as the bottleneck of SVM.The KNDC possesses a lower computation complexity than SVM over the nominal dataset,which is discussed for its validity.Experimental results on the standard datasets demonstrate that the proposed method has promising performance compared with other methods.

作者李志华任秋英顾言王士同

机构地区江南大学信息工程学院

出处《中文信息学报》 CSCD 北大核心 2010年第6期37-42,共6页 Journal of Chinese Information Processing

基金国家自然科学基金青年科学基金资助项目(60704047)

关键词核分类方法语义数据相异性度量测度内积计算 kernel-based classification method nominal dataset dissimilarity measure inner production calculation

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献16

1Minho Kim, R. S. Ramakrishna, Projected Clustering for Categorical Datasets[J]. Pattern Recognition Letters,2006,27: 1405-1417.
2F. Esposito, D. Malerba, V. Tamma, H. H. Bock, Classical resemblance measure, in: H.-H. Bock, E. Diday (Eds.), Analysis of Symbolic Data, Springer[C]//Berlin,2000,139-152.
3C. Stanfill, D. Waltz, Towards memory-based reasoning [J]. Commun,ACM,1986, 29(12) : 1213-1228.
4Victor Cheng, Chun-Hung Li, James T. Kwok^b, Chi- Kwong Li^c,Dissimilarity learning for nominal data[J]. Pattern Recognition, 2004,37 : 1471-1477.
5J. C. Gower, P. Legendre, Metric and Euclidean properties of dissimilarity coefficients[J]. J. Classif. 1986,3: 5-48.
6H. Spath,Cluster Analysis Algorithm for Data Reduction and Classification[J]. Ellis Horwood, Chichester, 1980.
7Burges J. C. , A tutorial on support vector machine for pattern recognition [J]. Data Mining and Knowledge Discoverty, 1998,2 (2) : 121-167.
8V apnik V N. Statistical learning theo ry [M]. New York: John Wiley & Sons, INC, 1998.
9Scholkopf B, MIka S, Burges C, et al. Input Space Versus Feature Space in Kernel-based Methods [J]. IEEE Trans on Neural Networks, 1999,10 (5) : 1000- 1017.
10Defeng Wang, Daniel S. Yeung, Eric C. C. Tsang, Weighted Mahalanobis Distance Kernels for Support Vector Machines [J]. IEEE Transaction on Neural networks, 2007,18: 1453-1462.

二级参考文献25

1沈红斌,王士同,吴小俊.离群模糊核聚类算法[J].软件学报,2004,15(7):1021-1029. 被引量：37
2邓赵红,王士同.鲁棒性的模糊聚类神经网络[J].软件学报,2005,16(8):1415-1422. 被引量：11
3陈友,程学旗,李洋,戴磊.基于特征选择的轻量级入侵检测系统[J].软件学报,2007,18(7):1639-1651. 被引量：78
4A. -H. Tan. Text mining: The state of the art and the challenges[C]//Ning Zhong and Lizhu Zhou. Proceedings of PAKDD 1999. China:Springer, 1999:65-70.
5Chien-Chung Huang,et al. Using a web based categorization approach to generate thematic metadata from texts[J]. ACM Transactions on Asian Language Information Processing, 2004, 3(3) :190-212.
6S. Bechhofer, C. Gobel. Towards annotation using daml+oil[C]//Yolanda Gil, et al. Proceedings of KCAP 2001. Canada:ACM, 2001.
7M. Erdmann, et al. From manual to semi-automatic semantic annotation: About ontology-based text annotation tools[C]//Buitelaar, P. and Hasida, K. Proceeding of COLING 2000. Germany: Morgan Kaufmann, 2000.
8S. Handschuh, S. Stabb. Authoring and annotation of web pages in cream[C]//David Lassner, et al. Proceeding of WWW2002. USA..ACM, 2002:462-473.
9M.-R. Koivunen, R. Swick. Metadata based annotation infrastructure offer flexibility and extensibility for collaborative applications and beyond[C]//Yolanda Gil, et al. Proceedings of K-CAP 2001. Canada: ACM, 2001.
10P. Martin, P. Eklund. Embedding knowledge in web documents [J]. Computer Networks, 1999, 81: t403-1419.

共引文献4

1张东,王惠临.关于建立中国国家科学技术语料库的思考[J].图书情报工作,2010,54(6):102-106. 被引量：3
2刘耀,穗志方,胡永伟,赵庆亮.基于内容与形式交互的图书馆资源组织语义化方法研究[J].情报理论与实践,2010,33(10):105-107. 被引量：15
3王瑞伟,李志华.离群数据规则挖掘的决策树构造方法[J].计算机工程与设计,2011,32(5):1781-1784.
4陈晨,赵铁军,李生,杨沐昀,齐浩亮.基于语言学知识的查询个性化潜力预测[J].中文信息学报,2012,26(6):11-18. 被引量：1

同被引文献14

1胡国平,张巍,王仁华.基于双层决策的新闻网页正文精确抽取[J].中文信息学报,2006,20(6):1-9. 被引量：16
2刘迁,焦慧,贾惠波.信息抽取技术的发展现状及构建方法的研究[J].计算机应用研究,2007,24(7):6-9. 被引量：41
3Bing Liu, Robert Grossman, Yanhong Zhai. Mining data records in Web pages[ C]//Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM ,2003:601 - 606.
4Hu Wenehen, Chang Kaihsiung, Gerhard X. Ritter: Web document classification using modified decision trees [ C ]//ACM Southeast Re- gional Conference ,2000:262 - 263.
5Wong Taklam. Wai Lam: Learning to Adapt Web Information Extrac- tion Knowledge and Discovering New Attributes via a Bayesian Ap- proach [ J ]. IEEE Trans. Kn0wl. Data Eng. (TKDE) ,2010,22 (4) : 523 - 536.
6Cfistianini N, Shawe J T. An introduction to support vector machines [ M ]. Cambridge : Cambridge University Press ,2000:35 - 38.
7Yang S X,Tian Y J,Zhang C H. Rule Extraction from Support Vector Machines and Its Applications[ C ]//Web Intelligence/IAT Workshops 2011:221 -224.
8Cai Deng,Yu Shipeng,Wen J R,et al. VIPS: A vision-based page seg- mentation algorithm. Microsoft Technical Report [ R ]. MSR-TR-2003- 79,2003 : 10.
9Yoav Freund, Robert E. Schapire: A decision-theoretic generalization of on-line learning and an application to boosting [ J] . EuroCOLT, 1995 : 23 -37.
10韩先培,刘康,赵军.基于布局特征与语言特征的网页主要内容块发现[J].中文信息学报,2008,22(1):15-21. 被引量：8

引证文献1

1伍杰华,倪振声.改进多分类器集成AdaBoost算法的Web主题分类[J].计算机应用与软件,2013,30(11):64-67. 被引量：2

二级引证文献2

1何颖.嵌入拒识的极限学习机在基因表达数据分类中的应用[J].计算机应用与软件,2015,32(7):177-181. 被引量：1
2蒲国林.基于粗糙集与信息增益的情感特征选择方法[J].微电子学与计算机,2016,33(1):96-99. 被引量：5

1何新鹏,黄英,刘奇,刘云峰,潘琦.基于投影的快速模板匹配算法[J].自动化技术与应用,2011,30(7):72-75. 被引量：4
2祝红光,周亮瑾,李天翼,贾成强.计算机监控系统的监控对象数据结构研究[J].铁路计算机应用,2010,19(7):11-13. 被引量：1
3冉鹏,张文科,杨浩淼.可验证的安全内积计算协议的设计与实现[J].通信技术,2016,49(10):1369-1374.
4李红婵,朱颢东.基于分辨矩阵的属性集依赖度计算方法[J].计算机工程与应用,2012,48(26):131-133. 被引量：1
5吴明珠,王洋,李兴民.二维钱方法的改进及其在图像去噪中的应用[J].华南师范大学学报（自然科学版）,2016,48(4):119-124.
6李志华,王士同.一种基于量子机制的分类属性数据模糊聚类算法[J].系统仿真学报,2008,20(8):2119-2122. 被引量：6
7刘怡静,唐莉萍,曾培峰.基于向量内积的骨架提取算法[J].东华大学学报（自然科学版）,2010,36(2):158-164. 被引量：2
8赵正天,赵小强,李炜.基于量子机制的改进的分类属性数据聚类算法[J].兰州理工大学学报,2009,35(3):98-102. 被引量：2
9宋君强,龚西平,张理论,赵文涛,吴建平.细长矩阵的块正交化方法[J].计算机工程与科学,2010,32(4):90-92. 被引量：1
10李志华,顾言,陈孟涛,王士同,陈秀宏.异构数据的结构熵聚类算法[J].计算机科学,2011,38(2):171-174. 被引量：5

中文信息学报

2010年第6期

浏览历史

内容加载中请稍等...

一种语义数据的核分类方法被引量：1

参考文献16

二级参考文献25

共引文献4

同被引文献14

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种语义数据的核分类方法 被引量：1

参考文献16

二级参考文献25

共引文献4

同被引文献14

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种语义数据的核分类方法被引量：1