Process discovery, as one of the most challenging process analysis techniques, aims to uncover business process models from event logs. Many process discovery approaches were invented in the past twenty years;however,...Process discovery, as one of the most challenging process analysis techniques, aims to uncover business process models from event logs. Many process discovery approaches were invented in the past twenty years;however, most of them have difficulties in handling multi-instance sub-processes. To address this challenge, we first introduce a multi-instance business process model(MBPM) to support the modeling of processes with multiple sub-process instantiations. Formal semantics of MBPMs are precisely defined by using multi-instance Petri nets(MPNs)that are an extension of Petri nets with distinguishable tokens.Then, a novel process discovery technique is developed to support the discovery of MBPMs from event logs with sub-process multi-instantiation information. In addition, we propose to measure the quality of the discovered MBPMs against the input event logs by transforming an MBPM to a classical Petri net such that existing quality metrics, e.g., fitness and precision, can be used.The proposed discovery approach is properly implemented as plugins in the Pro M toolkit. Based on a cloud resource management case study, we compare our approach with the state-of-theart process discovery techniques. The results demonstrate that our approach outperforms existing approaches to discover process models with multi-instance sub-processes.展开更多
Image has become an essential medium for expressing meaning and disseminating information.Many images are uploaded to the Internet,among which some are pornographic,causing adverse effects on public psychological heal...Image has become an essential medium for expressing meaning and disseminating information.Many images are uploaded to the Internet,among which some are pornographic,causing adverse effects on public psychological health.To create a clean and positive Internet environment,network enforcement agencies need an automatic and efficient pornographic image recognition tool.Previous studies on pornographic images mainly rely on convolutional neural networks(CNN).Because of CNN’s many parameters,they must rely on a large labeled training dataset,which takes work to build.To reduce the effect of the database on the recognition performance of pornographic images,many researchers view pornographic image recognition as a binary classification task.In actual application,when faced with pornographic images of various features,the performance and recognition accuracy of the network model often decrease.In addition,the pornographic content in images usually lies in several small-sized local regions,which are not a large proportion of the image.CNN,this kind of strong supervised learning method,usually cannot automatically focus on the pornographic area of the image,thus affecting the recognition accuracy of pornographic images.This paper established an image dataset with seven classes by crawling pornographic websites and Baidu Image Library.A weakly supervised pornographic image recognition method based on multiple instance learning(MIL)is proposed.The Squeeze and Extraction(SE)module is introduced in the feature extraction to strengthen the critical information and weaken the influence of non-key and useless information on the result of pornographic image recognition.To meet the requirements of the pooling layer operation in Multiple Instance Learning,we introduced the idea of an attention mechanism to weight and average instances.The experimental results show that the proposed method has better accuracy and F1 scores than other methods.展开更多
In multi-instance learning, the training set comprises labeled bags that are composed of unlabeled instances, and the task is to predict the labels of unseen bags. This paper studies multi-instance learning from the v...In multi-instance learning, the training set comprises labeled bags that are composed of unlabeled instances, and the task is to predict the labels of unseen bags. This paper studies multi-instance learning from the view of supervised learning. First, by analyzing some representative learning algorithms, this paper shows that multi-instance learners can be derived from supervised learners by shifting their focuses from the discrimination on the instances to the discrimination on the bags. Second, considering that ensemble learning paradigms can effectively enhance supervised learners, this paper proposes to build multi-instance ensembles to solve multi-instance problems. Experiments on a real-world benchmark test show that ensemble learning paradigms can significantly enhance multi-instance learners.展开更多
Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled b...Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.展开更多
We investigate a problem of object-oriented (OO) software quality estimation from a multi-instance (MI) perspective. In detail,each set of classes that have an inheritance relation,named 'class hierarchy',is r...We investigate a problem of object-oriented (OO) software quality estimation from a multi-instance (MI) perspective. In detail,each set of classes that have an inheritance relation,named 'class hierarchy',is regarded as a bag,while each class in the set is regarded as an instance. The learning task in this study is to estimate the label of unseen bags,i.e.,the fault-proneness of untested class hierarchies. A fault-prone class hierarchy contains at least one fault-prone (negative) class,while a non-fault-prone (positive) one has no negative class. Based on the modification records (MRs) of the previous project releases and OO software metrics,the fault-proneness of an untested class hierarchy can be predicted. Several selected MI learning algorithms were evalu-ated on five datasets collected from an industrial software project. Among the MI learning algorithms investigated in the ex-periments,the kernel method using a dedicated MI-kernel was better than the others in accurately and correctly predicting the fault-proneness of the class hierarchies. In addition,when compared to a supervised support vector machine (SVM) algorithm,the MI-kernel method still had a competitive performance with much less cost.展开更多
Fusion of multiple instances within a modality for biometric verification performance improvement has received considerable attention. In this letter, we present an iris recognition method based on multiinstance fusio...Fusion of multiple instances within a modality for biometric verification performance improvement has received considerable attention. In this letter, we present an iris recognition method based on multiinstance fusion, which combines the left and right irises of an individual at the matching score level. When fusing, a novel fusion strategy using minimax probability machine (MPM) is applied to generate a fused score for the final decision. The experimental results on CASIA and UBIRIS databases show that the proposed method can bring obvious performance improvement compared with the single-instance method. The comparison among different fusion strategies demonstrates the superiority of the fusion strategy based on MPM.展开更多
Domain-based protein-protein interactions( PPIs) is a problem that has drawn the attentions of many researchers in recent years and it has been studied using lots of computational approaches from many different perspe...Domain-based protein-protein interactions( PPIs) is a problem that has drawn the attentions of many researchers in recent years and it has been studied using lots of computational approaches from many different perspectives. Existing domain-based methods to predict PPIs typically infer domain interactions from known interacting sets of proteins. However,these methods are costly and complex to implement. In this paper, a simple and effective prediction model is proposed. In this model,an improved multiinstance learning( MIL) algorithm( MilCaA) is designed that doesn't need to take the domain interactions into consideration to construct MIL bags. Then, the pseudo-amino acid composition( PseAAC) transformation method is used to encode the instances in a multi-instance bag and the principal components analysis( PCA) is also used to reduce the feature dimension. Finally, several traditional machine learning and MIL methods are used to verify the proposed model. Experimental results demonstrate that MilCaA performs better than state-of-the-art techniques including the traditional machine learning methods which are widely used in PPIs prediction.展开更多
Supervised models for event detection usually require large-scale human-annotated training data,especially neural models.A data augmentation technique is proposed to improve the performance of event detection by gener...Supervised models for event detection usually require large-scale human-annotated training data,especially neural models.A data augmentation technique is proposed to improve the performance of event detection by generating paraphrase sentences to enrich expressions of the original data.Specifically,based on an existing human-annotated event detection dataset,we first automatically build a paraphrase dataset and label it with a designed event annotation alignment algorithm.To alleviate possible wrong labels in the generated paraphrase dataset,a multi-instance learning(MIL)method is adopted for joint training on both the gold human-annotated data and the generated paraphrase dataset.Experimental results on a widely used dataset ACE2005 show the effectiveness of our approach.展开更多
In higher education,the initial studying period of each course plays a crucial role for students,and seriously influences the subsequent learning activities.However,given the large size of a course’s students at univ...In higher education,the initial studying period of each course plays a crucial role for students,and seriously influences the subsequent learning activities.However,given the large size of a course’s students at universities,it has become impossible for teachers to keep track of the performance of individual students.In this circumstance,an academic early warning system is desirable,which automatically detects students with difficulties in learning(i.e.,at-risk students)prior to a course starting.However,previous studies are not well suited to this purpose for two reasons:1)they have mainly concentrated on e-learning platforms,e.g.,massive open online courses(MOOCs),and relied on the data about students’online activities,which is hardly accessed in traditional teaching scenarios;and 2)they have only made performance prediction when a course is in progress or even close to the end.In this paper,for traditional classroom-teaching scenarios,we investigate the task of pre-course student performance prediction,which refers to detecting at-risk students for each course before its commencement.To better represent a student sample and utilize the correlations among courses,we cast the problem as a multi-instance multi-label(MIML)problem.Besides,given the problem of data scarcity,we propose a novel multi-task learning method,i.e.,MIML-Circle,to predict the performance of students from different specialties in a unified framework.Extensive experiments are conducted on five real-world datasets,and the results demonstrate the superiority of our approach over the state-of-the-art methods.展开更多
To protect consumers and those who manufacture and sell the products they enjoy,it is important to develop convenient tools to help consumers distinguish an authentic product from a counterfeit one.The advancement of ...To protect consumers and those who manufacture and sell the products they enjoy,it is important to develop convenient tools to help consumers distinguish an authentic product from a counterfeit one.The advancement of deep learning techniques for fine-grained object recognition creates new possibilities for genuine product identification.In this paper,we develop a Semi-Supervised Attention(SSA)model to work in conjunction with a large-scale multiple-source dataset named YSneaker,which consists of sneakers from various brands and their authentication results,to identify authentic sneakers.Specifically,the SSA model has a self-attention structure for different images of a labeled sneaker and a novel prototypical loss is designed to exploit unlabeled data within the data structure.The model draws on the weighted average of the output feature representations,where the weights are determined by an additional shallow neural network.This allows the SSA model to focus on the most important images of a sneaker for use in identification.A unique feature of the SSA model is its ability to take advantage of unlabeled data,which can help to further minimize the intra-class variation for more discriminative feature embedding.To validate the model,we collect a large number of labeled and unlabeled sneaker images and perform extensive experimental studies.The results show that YSneaker together with the proposed SSA architecture can identify authentic sneakers with a high accuracy rate.展开更多
基金supported by the National Natural Science Foundation of China(61902222)the Taishan Scholars Program of Shandong Province(tsqn201909109)+1 种基金the Natural Science Excellent Youth Foundation of Shandong Province(ZR2021YQ45)the Youth Innovation Science and Technology Team Foundation of Shandong Higher School(2021KJ031)。
文摘Process discovery, as one of the most challenging process analysis techniques, aims to uncover business process models from event logs. Many process discovery approaches were invented in the past twenty years;however, most of them have difficulties in handling multi-instance sub-processes. To address this challenge, we first introduce a multi-instance business process model(MBPM) to support the modeling of processes with multiple sub-process instantiations. Formal semantics of MBPMs are precisely defined by using multi-instance Petri nets(MPNs)that are an extension of Petri nets with distinguishable tokens.Then, a novel process discovery technique is developed to support the discovery of MBPMs from event logs with sub-process multi-instantiation information. In addition, we propose to measure the quality of the discovered MBPMs against the input event logs by transforming an MBPM to a classical Petri net such that existing quality metrics, e.g., fitness and precision, can be used.The proposed discovery approach is properly implemented as plugins in the Pro M toolkit. Based on a cloud resource management case study, we compare our approach with the state-of-theart process discovery techniques. The results demonstrate that our approach outperforms existing approaches to discover process models with multi-instance sub-processes.
基金This work is supported by the Academic Research Project of Henan Police College(Grant:HNJY-2021-QN-14 and HNJY202220)the Key Technology R&D Program of Henan Province(Grant:222102210041).
文摘Image has become an essential medium for expressing meaning and disseminating information.Many images are uploaded to the Internet,among which some are pornographic,causing adverse effects on public psychological health.To create a clean and positive Internet environment,network enforcement agencies need an automatic and efficient pornographic image recognition tool.Previous studies on pornographic images mainly rely on convolutional neural networks(CNN).Because of CNN’s many parameters,they must rely on a large labeled training dataset,which takes work to build.To reduce the effect of the database on the recognition performance of pornographic images,many researchers view pornographic image recognition as a binary classification task.In actual application,when faced with pornographic images of various features,the performance and recognition accuracy of the network model often decrease.In addition,the pornographic content in images usually lies in several small-sized local regions,which are not a large proportion of the image.CNN,this kind of strong supervised learning method,usually cannot automatically focus on the pornographic area of the image,thus affecting the recognition accuracy of pornographic images.This paper established an image dataset with seven classes by crawling pornographic websites and Baidu Image Library.A weakly supervised pornographic image recognition method based on multiple instance learning(MIL)is proposed.The Squeeze and Extraction(SE)module is introduced in the feature extraction to strengthen the critical information and weaken the influence of non-key and useless information on the result of pornographic image recognition.To meet the requirements of the pooling layer operation in Multiple Instance Learning,we introduced the idea of an attention mechanism to weight and average instances.The experimental results show that the proposed method has better accuracy and F1 scores than other methods.
基金Supported by the National Natural Science Foundation of China under Grant Nos. 60105004 and 60325207. Acknowledgements The author wants to thank Min-Ling Zhang for running the experiments, Clancarlo Ruffo for providing the code of RELIC, and Nicolas Bredeche for providing the code of RIPPER-MI. A preliminary version of this paper has been presented at ECML'03 (the 14th European Conference on Machine Learning).
文摘In multi-instance learning, the training set comprises labeled bags that are composed of unlabeled instances, and the task is to predict the labels of unseen bags. This paper studies multi-instance learning from the view of supervised learning. First, by analyzing some representative learning algorithms, this paper shows that multi-instance learners can be derived from supervised learners by shifting their focuses from the discrimination on the instances to the discrimination on the bags. Second, considering that ensemble learning paradigms can effectively enhance supervised learners, this paper proposes to build multi-instance ensembles to solve multi-instance problems. Experiments on a real-world benchmark test show that ensemble learning paradigms can significantly enhance multi-instance learners.
基金the National Natural Science Foundation of China (Nos. 61073117 and 61175046)the Provincial Natural Science Research Program of Higher Education Institutions of Anhui Province (No. KJ2013A016)+1 种基金the Academic Innovative Research Projects of Anhui University Graduate Students (No. 10117700183)the 211 Project of Anhui University
文摘Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.
文摘We investigate a problem of object-oriented (OO) software quality estimation from a multi-instance (MI) perspective. In detail,each set of classes that have an inheritance relation,named 'class hierarchy',is regarded as a bag,while each class in the set is regarded as an instance. The learning task in this study is to estimate the label of unseen bags,i.e.,the fault-proneness of untested class hierarchies. A fault-prone class hierarchy contains at least one fault-prone (negative) class,while a non-fault-prone (positive) one has no negative class. Based on the modification records (MRs) of the previous project releases and OO software metrics,the fault-proneness of an untested class hierarchy can be predicted. Several selected MI learning algorithms were evalu-ated on five datasets collected from an industrial software project. Among the MI learning algorithms investigated in the ex-periments,the kernel method using a dedicated MI-kernel was better than the others in accurately and correctly predicting the fault-proneness of the class hierarchies. In addition,when compared to a supervised support vector machine (SVM) algorithm,the MI-kernel method still had a competitive performance with much less cost.
基金supported by the PhD Programs Foundation of Ministry of Education of China (No.20050698025)the National Natural Science Foundation of China (No.60602025).
文摘Fusion of multiple instances within a modality for biometric verification performance improvement has received considerable attention. In this letter, we present an iris recognition method based on multiinstance fusion, which combines the left and right irises of an individual at the matching score level. When fusing, a novel fusion strategy using minimax probability machine (MPM) is applied to generate a fused score for the final decision. The experimental results on CASIA and UBIRIS databases show that the proposed method can bring obvious performance improvement compared with the single-instance method. The comparison among different fusion strategies demonstrates the superiority of the fusion strategy based on MPM.
基金National Natural Science Foundations of China(Nos.61503116,61402007)Foundation for Young Talents in the Colleges of Anhui Province Committee,China(No.2013SQRL097ZD)+1 种基金Natural Science Foundation of Anhui Educational Committee,China(No.KJ2014A198)Natural Science Foundation of Anhui Province,China(No.1408085QF108)
文摘Domain-based protein-protein interactions( PPIs) is a problem that has drawn the attentions of many researchers in recent years and it has been studied using lots of computational approaches from many different perspectives. Existing domain-based methods to predict PPIs typically infer domain interactions from known interacting sets of proteins. However,these methods are costly and complex to implement. In this paper, a simple and effective prediction model is proposed. In this model,an improved multiinstance learning( MIL) algorithm( MilCaA) is designed that doesn't need to take the domain interactions into consideration to construct MIL bags. Then, the pseudo-amino acid composition( PseAAC) transformation method is used to encode the instances in a multi-instance bag and the principal components analysis( PCA) is also used to reduce the feature dimension. Finally, several traditional machine learning and MIL methods are used to verify the proposed model. Experimental results demonstrate that MilCaA performs better than state-of-the-art techniques including the traditional machine learning methods which are widely used in PPIs prediction.
基金National Natural Science Foundation of China(No.62006039)。
文摘Supervised models for event detection usually require large-scale human-annotated training data,especially neural models.A data augmentation technique is proposed to improve the performance of event detection by generating paraphrase sentences to enrich expressions of the original data.Specifically,based on an existing human-annotated event detection dataset,we first automatically build a paraphrase dataset and label it with a designed event annotation alignment algorithm.To alleviate possible wrong labels in the generated paraphrase dataset,a multi-instance learning(MIL)method is adopted for joint training on both the gold human-annotated data and the generated paraphrase dataset.Experimental results on a widely used dataset ACE2005 show the effectiveness of our approach.
基金This work was supported by the National Natural Sci-ence Foundation of China(Grant Nos.61701281,61573219,and 61876098)Shandong Provincial Natural Science Foundation(ZR2016FM34 andZR2017QF009)+1 种基金Shandong Science and Technology Development Plan(J18KA375),Shandong Social Science Project(18BJYJ04)the Foster-ing Project of Dominant Discipline and Talent Team of Shandong ProvinceHigher Education Institutions.
文摘In higher education,the initial studying period of each course plays a crucial role for students,and seriously influences the subsequent learning activities.However,given the large size of a course’s students at universities,it has become impossible for teachers to keep track of the performance of individual students.In this circumstance,an academic early warning system is desirable,which automatically detects students with difficulties in learning(i.e.,at-risk students)prior to a course starting.However,previous studies are not well suited to this purpose for two reasons:1)they have mainly concentrated on e-learning platforms,e.g.,massive open online courses(MOOCs),and relied on the data about students’online activities,which is hardly accessed in traditional teaching scenarios;and 2)they have only made performance prediction when a course is in progress or even close to the end.In this paper,for traditional classroom-teaching scenarios,we investigate the task of pre-course student performance prediction,which refers to detecting at-risk students for each course before its commencement.To better represent a student sample and utilize the correlations among courses,we cast the problem as a multi-instance multi-label(MIML)problem.Besides,given the problem of data scarcity,we propose a novel multi-task learning method,i.e.,MIML-Circle,to predict the performance of students from different specialties in a unified framework.Extensive experiments are conducted on five real-world datasets,and the results demonstrate the superiority of our approach over the state-of-the-art methods.
基金supported by the National Key R&D Program of China(No.2018YFB1004300)the National Natural Science Foundation of China(Nos.61773198,61632004,and 61751306)+2 种基金the National Natural Science Foundation of China-Korea Research Foundation Joint Research Project(No.61861146001)Collaborative Innovation Center of Novel Software Technology and IndustrializationPostgraduate Research&Practice Innovation Program of Jiangsu Province(No.KYCX180045).
文摘To protect consumers and those who manufacture and sell the products they enjoy,it is important to develop convenient tools to help consumers distinguish an authentic product from a counterfeit one.The advancement of deep learning techniques for fine-grained object recognition creates new possibilities for genuine product identification.In this paper,we develop a Semi-Supervised Attention(SSA)model to work in conjunction with a large-scale multiple-source dataset named YSneaker,which consists of sneakers from various brands and their authentication results,to identify authentic sneakers.Specifically,the SSA model has a self-attention structure for different images of a labeled sneaker and a novel prototypical loss is designed to exploit unlabeled data within the data structure.The model draws on the weighted average of the output feature representations,where the weights are determined by an additional shallow neural network.This allows the SSA model to focus on the most important images of a sneaker for use in identification.A unique feature of the SSA model is its ability to take advantage of unlabeled data,which can help to further minimize the intra-class variation for more discriminative feature embedding.To validate the model,we collect a large number of labeled and unlabeled sneaker images and perform extensive experimental studies.The results show that YSneaker together with the proposed SSA architecture can identify authentic sneakers with a high accuracy rate.