Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets

下载PDF

导出

摘要 Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on the benchmark datasets have been proposed for multi-label classification task in the literature.Furthermore,several open-source tools implementing these approaches have also been developed.However,the characteristics of real-world multi-label patent and publication datasets are not completely in line with those of benchmark ones.Therefore,the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.Research limitations:Three real-world datasets differ in the following aspects:statement,data quality,and purposes.Additionally,open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection,which in turn impacts the performance of a multi-label classification approach.In the near future,we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.Practical implications:The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets,underscoring the complexity of real-world multi-label classification tasks.Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels.With ongoing enhancements in deep learning algorithms and large-scale models,it is expected that the efficacy of multi-label classification tasks will be significantly improved,reaching a level of practical utility in the foreseeable future.Originality/value:(1)Seven multi-label classification methods are comprehensively compared on three real-world datasets.(2)The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution.(3)The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution.

作者 Shuo Xu Yuefu Zhang Xin An Sainan Pi

机构地区 College of Economics and Management School of Economics&Management

出处《Journal of Data and Information Science》 CSCD 2024年第2期81-103,共23页 数据与情报科学学报（英文版）

基金 the Natural Science Foundation of China(Grant Numbers 72074014 and 72004012).

关键词 Multi-label classification Real-World datasets Hierarchical structure Classification system Label correlation Machine learning

分类号 G353.1 [文化科学—情报学]

引文网络
相关文献

1ZHANG Yongwei.Learning Label Correlations for Multi-Label Online Passive Aggressive Classification Algorithm[J].Wuhan University Journal of Natural Sciences,2024,29(1):51-58.
2Giada Bertini,Claudia Becagli,Ugo Chiavetta,Fabrizio Ferretti,Gianfranco Fabbio,Luca Salvati.Exploratory analysis of structural diversity indicators at stand level in three Italian beech sites and implications for sustainable forest management[J].Journal of Forestry Research,2019,30(1):121-127.
3Xinpei Chen,Tao Yu,Zhenning Pan,Zihao Wang,Shengchun Yang.Graph representation learning-based residential electricity behavior identification and energy management[J].Protection and Control of Modern Power Systems,2023,8(2):218-230.
4Call for Papers Special Issue on Edge AI Empowered Giant Model Training[J].Big Data Mining and Analytics,2023,6(4).
5程小梅.大语言模型在电视领域的应用[J].电视技术,2024,48(3):153-155.
6Maya Majueran.CHINA RISES TO WORLD INNOVATION POWERHOUSE[J].China Report ASEAN,2024,9(5):64-64.
7Guo Yan.The Application of AI Big Model Will Usher in the“Wave Year”[J].China's Foreign Trade,2024(1):19-21.
8Anas W.Abulfaraj.Pervasive Attentive Neural Network for Intelligent Image Classification Based on N-CDE’s[J].Computers, Materials & Continua,2024,79(4):1137-1156.
9Bo-Jing Feng,Xi Cheng,Hao-Nan Xu,Wen-Fang Xue.Corporate Credit Ratings Based on Hierarchical Heterogeneous Graph Neural Networks[J].Machine Intelligence Research,2024,21(2):257-271.
10Bashar Alshouha,Jesus Serrano-Guerrero,Francisco Chiclana,Francisco P.Romero,Jose A.Olivas.Personality Trait Detection via Transfer Learning[J].Computers, Materials & Continua,2024,78(2):1933-1956.

Journal of Data and Information Science

2024年第2期

浏览历史

内容加载中请稍等...

Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets

相关作者

相关机构

相关主题

浏览历史