Taking into account the increasing volume of text documents,automatic summarization is one of the important tools for quick and optimal utilization of such sources.Automatic summarization is a text compression process...Taking into account the increasing volume of text documents,automatic summarization is one of the important tools for quick and optimal utilization of such sources.Automatic summarization is a text compression process for producing a shorter document in order to quickly access the important goals and main features of the input document.In this study,a novel method is introduced for selective text summarization using the genetic algorithm and generation of repetitive patterns.One of the important features of the proposed summarization is to identify and extract the relationship between the main features of the input text and the creation of repetitive patterns in order to produce and optimize the vector of the main document features in the production of the summary document compared to other previous methods.In this study,attempts were made to encompass all the main parameters of the summary text including unambiguous summary with the highest precision,continuity and consistency.To investigate the efficiency of the proposed algorithm,the results of the study were evaluated with respect to the precision and recall criteria.The results of the study evaluation showed the optimization the dimensions of the features and generation of a sequence of summary document sentences having the most consistency with the main goals and features of the input document.展开更多
These days,imbalanced datasets,denoted throughout the paper by ID,(a dataset that contains some(usually two)classes where one contains considerably smaller number of samples than the other(s))emerge in many real world...These days,imbalanced datasets,denoted throughout the paper by ID,(a dataset that contains some(usually two)classes where one contains considerably smaller number of samples than the other(s))emerge in many real world problems(like health care systems or disease diagnosis systems,anomaly detection,fraud detection,stream based malware detection systems,and so on)and these datasets cause some problems(like under-training of minority class(es)and over-training of majority class(es),bias towards majority class(es),and so on)in classification process and application.Therefore,these datasets take the focus of many researchers in any science and there are several solutions for dealing with this problem.The main aim of this study for dealing with IDs is to resample the borderline samples discovered by Support Vector Data Description(SVDD).There are naturally two kinds of resampling:Under-sampling(U-S)and oversampling(O-S).The O-S may cause the occurrence of over-fitting(the occurrence of over-fitting is its main drawback).The U-S can cause the occurrence of significant information loss(the occurrence of significant information loss is its main drawback).In this study,to avoid the drawbacks of the sampling techniques,we focus on the samples that may be misclassified.The data points that can be misclassified are considered to be the borderline data points which are on border(s)between the majority class(es)and minority class(es).First by SVDD,we find the borderline examples;then,the data resampling is applied over them.At the next step,the base classifier is trained on the newly created dataset.Finally,we compare the result of our method in terms of Area Under Curve(AUC)and F-measure and G-mean with the other state-of-the-art methods.We show that our method has betterresults than the other state-of-the-art methods on our experimental study.展开更多
In order to improve performance and robustness of clustering,it is proposed to generate and aggregate a number of primary clusters via clustering ensemble technique.Fuzzy clustering ensemble approaches attempt to impr...In order to improve performance and robustness of clustering,it is proposed to generate and aggregate a number of primary clusters via clustering ensemble technique.Fuzzy clustering ensemble approaches attempt to improve the performance of fuzzy clustering tasks.However,in these approaches,cluster(or clustering)reliability has not paid much attention to.Ignoring cluster(or clustering)reliability makes these approaches weak in dealing with low-quality base clustering methods.In this paper,we have utilized cluster unreliability estimation and local weighting strategy to propose a new fuzzy clustering ensemble method which has introduced Reliability Based weighted co-association matrix Fuzzy C-Means(RBFCM),Reliability Based Graph Partitioning(RBGP)and Reliability Based Hyper Clustering(RBHC)as three new fuzzy clustering consensus functions.Our fuzzy clustering ensemble approach works based on fuzzy cluster unreliability estimation.Cluster unreliability is estimated according to an entropic criterion using the cluster labels in the entire ensemble.To do so,the new metric is dened to estimate the fuzzy cluster unreliability;then,the reliability value of any cluster is determined using a Reliability Driven Cluster Indicator(RDCI).The time complexities of RBHC and RBGP are linearly proportional with thnumber of data objects.Performance and robustness of the proposed method are experimentally evaluated for some benchmark datasets.The experimental results demonstrate efciency and suitability of the proposed method.展开更多
An invariant can be described as an essential relationship between program variables.The invariants are very useful in software checking and verification.The tools that are used to detect invariants are invariant dete...An invariant can be described as an essential relationship between program variables.The invariants are very useful in software checking and verification.The tools that are used to detect invariants are invariant detectors.There are two types of invariant detectors:dynamic invariant detectors and static invariant detectors.Daikon software is an available computer program that implements a special case of a dynamic invariant detection algorithm.Daikon proposes a dynamic invariant detection algorithm based on several runs of the tested program;then,it gathers the values of its variables,and finally,it detects relationships between the variables based on a simple statistical analysis.This method has some drawbacks.One of its biggest drawbacks is its overwhelming time order.It is observed that the runtime for the Daikon invariant detection tool is dependent on the ordering of traces in the trace file.A mechanism is proposed in order to reduce differences in adjacent trace files.It is done by applying some special techniques of mutation/crossover in genetic algorithm(GA).An experiment is run to assess the benefits of this approach.Experimental findings reveal that the runtime of the proposed dynamic invariant detection algorithm is superior to the main approach with respect to these improvements.展开更多
With the remarkable growth of textual data sources in recent years,easy,fast,and accurate text processing has become a challenge with significant payoffs.Automatic text summarization is the process of compressing text...With the remarkable growth of textual data sources in recent years,easy,fast,and accurate text processing has become a challenge with significant payoffs.Automatic text summarization is the process of compressing text documents into shorter summaries for easier review of its core contents,which must be done without losing important features and information.This paper introduces a new hybrid method for extractive text summarization with feature selection based on text structure.The major advantage of the proposed summarization method over previous systems is the modeling of text structure and relationship between entities in the input text,which improves the sentence feature selection process and leads to the generation of unambiguous,concise,consistent,and coherent summaries.The paper also presents the results of the evaluation of the proposed method based on precision and recall criteria.It is shown that the method produces summaries consisting of chains of sentences with the aforementioned characteristics from the original text.展开更多
文摘Taking into account the increasing volume of text documents,automatic summarization is one of the important tools for quick and optimal utilization of such sources.Automatic summarization is a text compression process for producing a shorter document in order to quickly access the important goals and main features of the input document.In this study,a novel method is introduced for selective text summarization using the genetic algorithm and generation of repetitive patterns.One of the important features of the proposed summarization is to identify and extract the relationship between the main features of the input text and the creation of repetitive patterns in order to produce and optimize the vector of the main document features in the production of the summary document compared to other previous methods.In this study,attempts were made to encompass all the main parameters of the summary text including unambiguous summary with the highest precision,continuity and consistency.To investigate the efficiency of the proposed algorithm,the results of the study were evaluated with respect to the precision and recall criteria.The results of the study evaluation showed the optimization the dimensions of the features and generation of a sequence of summary document sentences having the most consistency with the main goals and features of the input document.
基金grants to HAR and HP.HAR is supported by UNSW Scientia Program Fellowship and is a member of the UNSW Graduate School of Biomedical Engineering.
文摘These days,imbalanced datasets,denoted throughout the paper by ID,(a dataset that contains some(usually two)classes where one contains considerably smaller number of samples than the other(s))emerge in many real world problems(like health care systems or disease diagnosis systems,anomaly detection,fraud detection,stream based malware detection systems,and so on)and these datasets cause some problems(like under-training of minority class(es)and over-training of majority class(es),bias towards majority class(es),and so on)in classification process and application.Therefore,these datasets take the focus of many researchers in any science and there are several solutions for dealing with this problem.The main aim of this study for dealing with IDs is to resample the borderline samples discovered by Support Vector Data Description(SVDD).There are naturally two kinds of resampling:Under-sampling(U-S)and oversampling(O-S).The O-S may cause the occurrence of over-fitting(the occurrence of over-fitting is its main drawback).The U-S can cause the occurrence of significant information loss(the occurrence of significant information loss is its main drawback).In this study,to avoid the drawbacks of the sampling techniques,we focus on the samples that may be misclassified.The data points that can be misclassified are considered to be the borderline data points which are on border(s)between the majority class(es)and minority class(es).First by SVDD,we find the borderline examples;then,the data resampling is applied over them.At the next step,the base classifier is trained on the newly created dataset.Finally,we compare the result of our method in terms of Area Under Curve(AUC)and F-measure and G-mean with the other state-of-the-art methods.We show that our method has betterresults than the other state-of-the-art methods on our experimental study.
文摘In order to improve performance and robustness of clustering,it is proposed to generate and aggregate a number of primary clusters via clustering ensemble technique.Fuzzy clustering ensemble approaches attempt to improve the performance of fuzzy clustering tasks.However,in these approaches,cluster(or clustering)reliability has not paid much attention to.Ignoring cluster(or clustering)reliability makes these approaches weak in dealing with low-quality base clustering methods.In this paper,we have utilized cluster unreliability estimation and local weighting strategy to propose a new fuzzy clustering ensemble method which has introduced Reliability Based weighted co-association matrix Fuzzy C-Means(RBFCM),Reliability Based Graph Partitioning(RBGP)and Reliability Based Hyper Clustering(RBHC)as three new fuzzy clustering consensus functions.Our fuzzy clustering ensemble approach works based on fuzzy cluster unreliability estimation.Cluster unreliability is estimated according to an entropic criterion using the cluster labels in the entire ensemble.To do so,the new metric is dened to estimate the fuzzy cluster unreliability;then,the reliability value of any cluster is determined using a Reliability Driven Cluster Indicator(RDCI).The time complexities of RBHC and RBGP are linearly proportional with thnumber of data objects.Performance and robustness of the proposed method are experimentally evaluated for some benchmark datasets.The experimental results demonstrate efciency and suitability of the proposed method.
文摘An invariant can be described as an essential relationship between program variables.The invariants are very useful in software checking and verification.The tools that are used to detect invariants are invariant detectors.There are two types of invariant detectors:dynamic invariant detectors and static invariant detectors.Daikon software is an available computer program that implements a special case of a dynamic invariant detection algorithm.Daikon proposes a dynamic invariant detection algorithm based on several runs of the tested program;then,it gathers the values of its variables,and finally,it detects relationships between the variables based on a simple statistical analysis.This method has some drawbacks.One of its biggest drawbacks is its overwhelming time order.It is observed that the runtime for the Daikon invariant detection tool is dependent on the ordering of traces in the trace file.A mechanism is proposed in order to reduce differences in adjacent trace files.It is done by applying some special techniques of mutation/crossover in genetic algorithm(GA).An experiment is run to assess the benefits of this approach.Experimental findings reveal that the runtime of the proposed dynamic invariant detection algorithm is superior to the main approach with respect to these improvements.
文摘With the remarkable growth of textual data sources in recent years,easy,fast,and accurate text processing has become a challenge with significant payoffs.Automatic text summarization is the process of compressing text documents into shorter summaries for easier review of its core contents,which must be done without losing important features and information.This paper introduces a new hybrid method for extractive text summarization with feature selection based on text structure.The major advantage of the proposed summarization method over previous systems is the modeling of text structure and relationship between entities in the input text,which improves the sentence feature selection process and leads to the generation of unambiguous,concise,consistent,and coherent summaries.The paper also presents the results of the evaluation of the proposed method based on precision and recall criteria.It is shown that the method produces summaries consisting of chains of sentences with the aforementioned characteristics from the original text.