Purpose–The fuzziness and complexity of evaluation information are common phenomenon in practical decision-making problem,interval neutrosophic sets(INSs)is a power tool to deal with ambiguous information.Similarity ...Purpose–The fuzziness and complexity of evaluation information are common phenomenon in practical decision-making problem,interval neutrosophic sets(INSs)is a power tool to deal with ambiguous information.Similarity measure plays an important role in judging the degree between ideal and each alternative in decision-making process,the purpose of this paper is to establish a multi-criteria decision-making method based on similarity measure under INSs.Design/methodology/approach–Based on an extension of existing cosine similarity,this paper first introduces an improved cosine similarity measure between interval neutosophic numbers,which considers the degrees of the truth membership,the indeterminacy membership and the falsity membership of the evaluation values.And then a multi-criteria decision-making method is established based on the improved cosine similarity measure,in which the ordered weighted averaging(OWA)is adopted to aggregate the neutrosophic information related to each alternative.Finally,an example on supplier selection is given to illustrate the feasibility and practicality of the presented decision-making method.Findings–In the whole process of research and practice,it was realized that the application field of the proposed similarity measure theory still should be expanded,and the development of interval number theory is one of further research direction.Originality/value–The main contributions of this paper are as follows:this study presents an improved cosine similarity measure under INSs,in which the weights of the three independent components of an interval number are taken into account;OWA are adopted to aggregate the neutrosophic information related to each alternative;and a multi-criteria decision-making method using the proposed similarity is developed under INSs.展开更多
Broadcasting gateway equipment generally uses a method of simply switching to a spare input stream when a failure occurs in a main input stream.However,when the transmission environment is unstable,problems such as re...Broadcasting gateway equipment generally uses a method of simply switching to a spare input stream when a failure occurs in a main input stream.However,when the transmission environment is unstable,problems such as reduction in the lifespan of equipment due to frequent switching and interruption,delay,and stoppage of services may occur.Therefore,applying a machine learning(ML)method,which is possible to automatically judge and classify network-related service anomaly,and switch multi-input signals without dropping or changing signals by predicting or quickly determining the time of error occurrence for smooth stream switching when there are problems such as transmission errors,is required.In this paper,we propose an intelligent packet switching method based on the ML method of classification,which is one of the supervised learning methods,that presents the risk level of abnormal multi-stream occurring in broadcasting gateway equipment based on data.Furthermore,we subdivide the risk levels obtained from classification techniques into probabilities and then derive vectorized representative values for each attribute value of the collected input data and continuously update them.The obtained reference vector value is used for switching judgment through the cosine similarity value between input data obtained when a dangerous situation occurs.In the broadcasting gateway equipment to which the proposed method is applied,it is possible to perform more stable and smarter switching than before by solving problems of reliability and broadcasting accidents of the equipment and can maintain stable video streaming as well.展开更多
This paper presents a new dimension reduction strategy for medium and large-scale linear programming problems. The proposed method uses a subset of the original constraints and combines two algorithms: the weighted av...This paper presents a new dimension reduction strategy for medium and large-scale linear programming problems. The proposed method uses a subset of the original constraints and combines two algorithms: the weighted average and the cosine simplex algorithm. The first approach identifies binding constraints by using the weighted average of each constraint, whereas the second algorithm is based on the cosine similarity between the vector of the objective function and the constraints. These two approaches are complementary, and when used together, they locate the essential subset of initial constraints required for solving medium and large-scale linear programming problems. After reducing the dimension of the linear programming problem using the subset of the essential constraints, the solution method can be chosen from any suitable method for linear programming. The proposed approach was applied to a set of well-known benchmarks as well as more than 2000 random medium and large-scale linear programming problems. The results are promising, indicating that the new approach contributes to the reduction of both the size of the problems and the total number of iterations required. A tree-based classification model also confirmed the need for combining the two approaches. A detailed numerical example, the general numerical results, and the statistical analysis for the decision tree procedure are presented.展开更多
The growing collection of scientific data in various web repositories is referred to as Scientific Big Data,as it fulfills the four“V’s”of Big Data—volume,variety,velocity,and veracity.This phenomenon has created ...The growing collection of scientific data in various web repositories is referred to as Scientific Big Data,as it fulfills the four“V’s”of Big Data—volume,variety,velocity,and veracity.This phenomenon has created new opportunities for startups;for instance,the extraction of pertinent research papers from enormous knowledge repositories using certain innovative methods has become an important task for researchers and entrepreneurs.Traditionally,the content of the papers are compared to list the relevant papers from a repository.The conventional method results in a long list of papers that is often impossible to interpret productively.Therefore,the need for a novel approach that intelligently utilizes the available data is imminent.Moreover,the primary element of the scientific knowledge base is a research article,which consists of various logical sections such as the Abstract,Introduction,Related Work,Methodology,Results,and Conclusion.Thus,this study utilizes these logical sections of research articles,because they hold significant potential in finding relevant papers.In this study,comprehensive experiments were performed to determine the role of the logical sections-based terms indexing method in improving the quality of results(i.e.,retrieving relevant papers).Therefore,we proposed,implemented,and evaluated the logical sections-based content comparisons method to address the research objective with a standard method of indexing terms.The section-based approach outperformed the standard content-based approach in identifying relevant documents from all classified topics of computer science.Overall,the proposed approach extracted 14%more relevant results from the entire dataset.As the experimental results suggested that employing a finer content similarity technique improved the quality of results,the proposed approach has led the foundation of knowledge-based startups.展开更多
Due to the complexity of marine environment,underwater acoustic signal will be affected by complex background noise during transmission.Underwater acoustic signal denoising is always a difficult problem in underwater ...Due to the complexity of marine environment,underwater acoustic signal will be affected by complex background noise during transmission.Underwater acoustic signal denoising is always a difficult problem in underwater acoustic signal processing.To obtain a better denoising effect,a new denoising method of underwater acoustic signal based on optimized variational mode decomposition by black widow optimization algorithm(BVMD),fluctuation-based dispersion entropy threshold improved by Otsu method(OFDE),cosine similarity stationary threshold(CSST),BVMD,fluctuation-based dispersion entropy(FDE),named BVMD-OFDE-CSST-BVMD-FDE,is proposed.In the first place,decompose the original signal into a series of intrinsic mode functions(IMFs)by BVMD.Afterwards,distinguish pure IMFs,mixed IMFs and noise IMFs by OFDE and CSST,and reconstruct pure IMFs and mixed IMFs to obtain primary denoised signal.In the end,decompose primary denoising signal into IMFs by BVMD again,use the FDE value to distinguish noise IMFs and pure IMFs,and reconstruct pure IMFs to obtain the final denoised signal.The proposed mothod has three advantages:(i)BVMD can adaptively select the decomposition layer and penalty factor of VMD.(ii)FDE and CS are used as double criteria to distinguish noise IMFs from useful IMFs,and Otsu algorithm and CSST algorithm can effectively avoid the error caused by manually selecting thresholds.(iii)Secondary decomposition can make up for the deficiency of primary decomposition and further remove a small amount of noise.The chaotic signal and real ship signal are denoised.The experiment result shows that the proposed method can effectively denoise.It improves the denoising effect after primary decomposition,and has good practical value.展开更多
Labeled data is widely used in various classification tasks.However,there is a huge challenge that labels are often added artificially.Wrong labels added by malicious users will affect the training effect of the model...Labeled data is widely used in various classification tasks.However,there is a huge challenge that labels are often added artificially.Wrong labels added by malicious users will affect the training effect of the model.The unreliability of labeled data has hindered the research.In order to solve the above problems,we propose a framework of Label Noise Filtering and Missing Label Supplement(LNFS).And we take location labels in Location-Based Social Networks(LBSN)as an example to implement our framework.For the problem of label noise filtering,we first use FastText to transform the restaurant's labels into vectors,and then based on the assumption that the label most similar to all other labels in the location is most representative.We use cosine similarity to judge and select the most representative label.For the problem of label missing,we use simple common word similarity to judge the similarity of users'comments,and then use the label of the similar restaurant to supplement the missing labels.To optimize the performance of the model,we introduce game theory into our model to simulate the game between the malicious users and the model to improve the reliability of the model.Finally,a case study is given to illustrate the effectiveness and reliability of LNFS.展开更多
Occurrence of crimes has been on the constant rise despite the emerging discoveries and advancements in the technological field in the past decade.One of the most tedious tasks is to track a suspect once a crime is co...Occurrence of crimes has been on the constant rise despite the emerging discoveries and advancements in the technological field in the past decade.One of the most tedious tasks is to track a suspect once a crime is committed.As most of the crimes are committed by individuals who have a history of felonies,it is essential for a monitoring system that does not just detect the person’s face who has committed the crime,but also their identity.Hence,a smart criminal detection and identification system that makes use of the OpenCV Deep Neural Network(DNN)model which employs a Single Shot Multibox Detector for detection of face and an auto-encoder model in which the encoder part is used for matching the captured facial images with the criminals has been proposed.After detection and extraction of the face in the image by face cropping,the captured face is then compared with the images in the CriminalDatabase.The comparison is performed by calculating the similarity value between each pair of images that are obtained by using the Cosine Similarity metric.After plotting the values in a graph to find the threshold value,we conclude that the confidence rate of the encoder model is 0.75 and above.展开更多
To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although havin...To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.展开更多
Text classification of low resource language is always a trivial and challenging problem.This paper discusses the process of Urdu news classification and Urdu documents similarity.Urdu is one of the most famous spoken...Text classification of low resource language is always a trivial and challenging problem.This paper discusses the process of Urdu news classification and Urdu documents similarity.Urdu is one of the most famous spoken languages in Asia.The implementation of computational methodologies for text classification has increased over time.However,Urdu language has not much experimented with research,it does not have readily available datasets,which turn out to be the primary reason behind limited research and applying the latest methodologies to the Urdu.To overcome these obstacles,a mediumsized dataset having six categories is collected from authentic Pakistani news sources.Urdu is a rich but complex language.Text processing can be challenging for Urdu due to its complex features as compared to other languages.Term frequency-inverse document frequency(TFIDF)based term weighting scheme for extracting features,chi-2 for selecting essential features,and Linear discriminant analysis(LDA)for dimensionality reduction have been used.TFIDF matrix and cosine similarity measure have been used to identify similar documents in a collection and find the semantic meaning of words in a document FastText model has been applied.The training-test split evaluation methodology is used for this experimentation,which includes 70%for training data and 30%for testing data.State-of-the-art machine learning and deep dense neural network approaches for Urdu news classification have been used.Finally,we trained Multinomial Naïve Bayes,XGBoost,Bagging,and Deep dense neural network.Bagging and deep dense neural network outperformed the other algorithms.The experimental results show that deep dense achieves 92.0%mean f1 score,and Bagging 95.0%f1 score.展开更多
Intelligent seismic facies identification based on deep learning can alleviate the time-consuming and labor-intensive problem of manual interpretation,which has been widely applied.Supervised learning can realize faci...Intelligent seismic facies identification based on deep learning can alleviate the time-consuming and labor-intensive problem of manual interpretation,which has been widely applied.Supervised learning can realize facies identification with high efficiency and accuracy;however,it depends on the usage of a large amount of well-labeled data.To solve this issue,we propose herein an incremental semi-supervised method for intelligent facies identification.Our method considers the continuity of the lateral variation of strata and uses cosine similarity to quantify the similarity of the seismic data feature domain.The maximum-diff erence sample in the neighborhood of the currently used training data is then found to reasonably expand the training sets.This process continuously increases the amount of training data and learns its distribution.We integrate old knowledge while absorbing new ones to realize incremental semi-supervised learning and achieve the purpose of evolving the network models.In this work,accuracy and confusion matrix are employed to jointly control the predicted results of the model from both overall and partial aspects.The obtained values are then applied to a three-dimensional(3D)real dataset and used to quantitatively evaluate the results.Using unlabeled data,our proposed method acquires more accurate and stable testing results compared to conventional supervised learning algorithms that only use well-labeled data.A considerable improvement for small-sample categories is also observed.Using less than 1%of the training data,the proposed method can achieve an average accuracy of over 95%on the 3D dataset.In contrast,the conventional supervised learning algorithm achieved only approximately 85%.展开更多
Text Summarization is an essential area in text mining,which has procedures for text extraction.In natural language processing,text summarization maps the documents to a representative set of descriptive words.Therefo...Text Summarization is an essential area in text mining,which has procedures for text extraction.In natural language processing,text summarization maps the documents to a representative set of descriptive words.Therefore,the objective of text extraction is to attain reduced expressive contents from the text documents.Text summarization has two main areas such as abstractive,and extractive summarization.Extractive text summarization has further two approaches,in which the first approach applies the sentence score algorithm,and the second approach follows the word embedding principles.All such text extractions have limitations in providing the basic theme of the underlying documents.In this paper,we have employed text summarization by TF-IDF with PageRank keywords,sentence score algorithm,and Word2Vec word embedding.The study compared these forms of the text summarizations with the actual text,by calculating cosine similarities.Furthermore,TF-IDF based PageRank keywords are extracted from the other two extractive summarizations.An intersection over these three types of TD-IDF keywords to generate the more representative set of keywords for each text document is performed.This technique generates variable-length keywords as per document diversity instead of selecting fixedlength keywords for each document.This form of abstractive summarization improves metadata similarity to the original text compared to all other forms of summarized text.It also solves the issue of deciding the number of representative keywords for a specific text document.To evaluate the technique,the study used a sample of more than eighteen hundred text documents.The abstractive summarization follows the principles of deep learning to create uniform similarity of extracted words with actual text and all other forms of text summarization.The proposed technique provides a stable measure of similarity as compared to existing forms of text summarization.展开更多
The substantial competition among the news industries puts editors under the pressure of posting news articleswhich are likely to gain more user attention. Anticipating the popularity of news articles can help the edi...The substantial competition among the news industries puts editors under the pressure of posting news articleswhich are likely to gain more user attention. Anticipating the popularity of news articles can help the editorial teamsin making decisions about posting a news article. Article similarity extracted from the articles posted within a smallperiod of time is found to be a useful feature in existing popularity prediction approaches. This work proposesa new approach to estimate the popularity of news articles by adding semantics in the article similarity basedapproach of popularity estimation. A semantically enriched model is proposed which estimates news popularity bymeasuring cosine similarity between document embeddings of the news articles. Word2vec model has been used togenerate distributed representations of the news content. In this work, we define popularity as the number of timesa news article is posted on different websites. We collect data from different websites that post news concerning thedomain of cybersecurity and estimate the popularity of cybersecurity news. The proposed approach is comparedwith different models and it is shown that it outperforms the other models.展开更多
A novel channel attention residual network(CAN)for SISR has been proposed to rescale pixel-wise features by explicitly modeling interdependencies between channels and encoding where the visual attention is located.The...A novel channel attention residual network(CAN)for SISR has been proposed to rescale pixel-wise features by explicitly modeling interdependencies between channels and encoding where the visual attention is located.The backbone of CAN is channel attention block(CAB).The proposed CAB combines cosine similarity block(CSB)and back-projection gating block(BG).CSB fully considers global spatial information of each channel and computes the cosine similarity between each channel to obtain finer channel statistics than the first-order statistics.For further exploration of channel attention,we introduce effective back-projection to the gating mechanism and propose BG.Meanwhile,we adopt local and global residual connections in SISR which directly convey most low-frequency information to the final SR outputs and valuable high-frequency components are allocated more computational resources through channel attention mechanism.Extensive experiments show the superiority of the proposed CAN over the state-of-the-art methods on benchmark datasets in both accuracy and visual quality.展开更多
Artificial Intelligence(AI)tools become essential across industries,distinguishing AI-generated from human-authored text is increasingly challenging.This study assesses the coherence of AI-generated titles and corresp...Artificial Intelligence(AI)tools become essential across industries,distinguishing AI-generated from human-authored text is increasingly challenging.This study assesses the coherence of AI-generated titles and corresponding abstracts in anticipation of rising AI-assisted document production.Our main goal is to examine the correlation between original and AI-generated titles,emphasizing semantic depth and similarity measures,particularly in the context of Large Language Models(LLMs).We argue that LLMs have transformed research focus,dissemination,and citation patterns across five selected knowledge areas:Business Administration and Management(BAM),Computer Science and Information Technology(CS),Engineering and Material Science(EMS),Medicine and Healthcare(MH),and Psychology and Behavioral Sciences(PBS).We collected 15000 titles and abstracts,narrowing the selection to 2000 through a rigorous multi-stage screening process adhering to our study’s criteria.Result shows that there is insufficient evidence to suggest that LLM outperforms human authors in article title generation or articles from the LLM era demonstrates a marked difference in semantic richness and readability compared to those from the pre-LLM.Instead,it asserts that LLM is a valuable tool and can assist researchers in generating titles.With LLM’s assistance,the researcher ensures that the content is reflective of the finalized abstract and core research themes,potentially increasing the impact and accessibility and readability of the academic work.展开更多
It is a significant and challenging task to detect the informative features to carry out explainable analysis for high dimensional data,especially for those with very small number of samples.Feature selection especial...It is a significant and challenging task to detect the informative features to carry out explainable analysis for high dimensional data,especially for those with very small number of samples.Feature selection especially the unsupervised ones are the right way to deal with this challenge and realize the task.Therefore,two unsupervised spectral feature selection algorithms are proposed in this paper.They group features using advanced Self-Tuning spectral clustering algorithm based on local standard deviation,so as to detect the global optimal feature clusters as far as possible.Then two feature ranking techniques,including cosine-similarity-based feature ranking and entropy-based feature ranking,are proposed,so that the representative feature of each cluster can be detected to comprise the feature subset on which the explainable classification system will be built.The effectiveness of the proposed algorithms is tested on high dimensional benchmark omics datasets and compared to peer methods,and the statistical test are conducted to determine whether or not the proposed spectral feature selection algorithms are significantly different from those of the peer methods.The extensive experiments demonstrate the proposed unsupervised spectral feature selection algorithms outperform the peer ones in comparison,especially the one based on cosine similarity feature ranking technique.The statistical test results show that the entropy feature ranking based spectral feature selection algorithm performs best.The detected features demonstrate strong discriminative capabilities in downstream classifiers for omics data,such that the AI system built on them would be reliable and explainable.It is especially significant in building transparent and trustworthy medical diagnostic systems from an interpretable AI perspective.展开更多
In recent years,various speech embedding methods based on deep learning have been proposed and have shown better performance in speaker verification.Those new technologies will inevitably promote the development of fo...In recent years,various speech embedding methods based on deep learning have been proposed and have shown better performance in speaker verification.Those new technologies will inevitably promote the development of forensic speaker verification.We propose a new forensic speaker verification method based on embeddings trained with loss function called generalized end-to-end(GE2E)loss.First,a long short-term memory(LSTM)based deep neural network(DNN)is trained as the embedding extractor,then the cosine similarity scores between embeddings from same speaker comparison pairs and different speaker comparison pairs are trained to represent within-speaker model and between-speaker model respectively,and finally,the cosine similarity scores between the questioned embeddings and enrolled embeddings are evaluated in the above two models to get the likelihood ratio(LR)value.On the subset of LibriSpeech,test-other-500,we achieve a new state of the art.Both all the same speaker comparison pairs and different speaker comparison pairs get correct results and can provide considerable strong evidence strength for courts.展开更多
文摘Purpose–The fuzziness and complexity of evaluation information are common phenomenon in practical decision-making problem,interval neutrosophic sets(INSs)is a power tool to deal with ambiguous information.Similarity measure plays an important role in judging the degree between ideal and each alternative in decision-making process,the purpose of this paper is to establish a multi-criteria decision-making method based on similarity measure under INSs.Design/methodology/approach–Based on an extension of existing cosine similarity,this paper first introduces an improved cosine similarity measure between interval neutosophic numbers,which considers the degrees of the truth membership,the indeterminacy membership and the falsity membership of the evaluation values.And then a multi-criteria decision-making method is established based on the improved cosine similarity measure,in which the ordered weighted averaging(OWA)is adopted to aggregate the neutrosophic information related to each alternative.Finally,an example on supplier selection is given to illustrate the feasibility and practicality of the presented decision-making method.Findings–In the whole process of research and practice,it was realized that the application field of the proposed similarity measure theory still should be expanded,and the development of interval number theory is one of further research direction.Originality/value–The main contributions of this paper are as follows:this study presents an improved cosine similarity measure under INSs,in which the weights of the three independent components of an interval number are taken into account;OWA are adopted to aggregate the neutrosophic information related to each alternative;and a multi-criteria decision-making method using the proposed similarity is developed under INSs.
基金This work was supported by a research grant from Seoul Women’s University(2023-0183).
文摘Broadcasting gateway equipment generally uses a method of simply switching to a spare input stream when a failure occurs in a main input stream.However,when the transmission environment is unstable,problems such as reduction in the lifespan of equipment due to frequent switching and interruption,delay,and stoppage of services may occur.Therefore,applying a machine learning(ML)method,which is possible to automatically judge and classify network-related service anomaly,and switch multi-input signals without dropping or changing signals by predicting or quickly determining the time of error occurrence for smooth stream switching when there are problems such as transmission errors,is required.In this paper,we propose an intelligent packet switching method based on the ML method of classification,which is one of the supervised learning methods,that presents the risk level of abnormal multi-stream occurring in broadcasting gateway equipment based on data.Furthermore,we subdivide the risk levels obtained from classification techniques into probabilities and then derive vectorized representative values for each attribute value of the collected input data and continuously update them.The obtained reference vector value is used for switching judgment through the cosine similarity value between input data obtained when a dangerous situation occurs.In the broadcasting gateway equipment to which the proposed method is applied,it is possible to perform more stable and smarter switching than before by solving problems of reliability and broadcasting accidents of the equipment and can maintain stable video streaming as well.
文摘This paper presents a new dimension reduction strategy for medium and large-scale linear programming problems. The proposed method uses a subset of the original constraints and combines two algorithms: the weighted average and the cosine simplex algorithm. The first approach identifies binding constraints by using the weighted average of each constraint, whereas the second algorithm is based on the cosine similarity between the vector of the objective function and the constraints. These two approaches are complementary, and when used together, they locate the essential subset of initial constraints required for solving medium and large-scale linear programming problems. After reducing the dimension of the linear programming problem using the subset of the essential constraints, the solution method can be chosen from any suitable method for linear programming. The proposed approach was applied to a set of well-known benchmarks as well as more than 2000 random medium and large-scale linear programming problems. The results are promising, indicating that the new approach contributes to the reduction of both the size of the problems and the total number of iterations required. A tree-based classification model also confirmed the need for combining the two approaches. A detailed numerical example, the general numerical results, and the statistical analysis for the decision tree procedure are presented.
基金supported by Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(2020-0-01592)Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(2019R1F1A1058548).
文摘The growing collection of scientific data in various web repositories is referred to as Scientific Big Data,as it fulfills the four“V’s”of Big Data—volume,variety,velocity,and veracity.This phenomenon has created new opportunities for startups;for instance,the extraction of pertinent research papers from enormous knowledge repositories using certain innovative methods has become an important task for researchers and entrepreneurs.Traditionally,the content of the papers are compared to list the relevant papers from a repository.The conventional method results in a long list of papers that is often impossible to interpret productively.Therefore,the need for a novel approach that intelligently utilizes the available data is imminent.Moreover,the primary element of the scientific knowledge base is a research article,which consists of various logical sections such as the Abstract,Introduction,Related Work,Methodology,Results,and Conclusion.Thus,this study utilizes these logical sections of research articles,because they hold significant potential in finding relevant papers.In this study,comprehensive experiments were performed to determine the role of the logical sections-based terms indexing method in improving the quality of results(i.e.,retrieving relevant papers).Therefore,we proposed,implemented,and evaluated the logical sections-based content comparisons method to address the research objective with a standard method of indexing terms.The section-based approach outperformed the standard content-based approach in identifying relevant documents from all classified topics of computer science.Overall,the proposed approach extracted 14%more relevant results from the entire dataset.As the experimental results suggested that employing a finer content similarity technique improved the quality of results,the proposed approach has led the foundation of knowledge-based startups.
基金supported by the National Natural Science Foundation of China(Grant No.51709228)。
文摘Due to the complexity of marine environment,underwater acoustic signal will be affected by complex background noise during transmission.Underwater acoustic signal denoising is always a difficult problem in underwater acoustic signal processing.To obtain a better denoising effect,a new denoising method of underwater acoustic signal based on optimized variational mode decomposition by black widow optimization algorithm(BVMD),fluctuation-based dispersion entropy threshold improved by Otsu method(OFDE),cosine similarity stationary threshold(CSST),BVMD,fluctuation-based dispersion entropy(FDE),named BVMD-OFDE-CSST-BVMD-FDE,is proposed.In the first place,decompose the original signal into a series of intrinsic mode functions(IMFs)by BVMD.Afterwards,distinguish pure IMFs,mixed IMFs and noise IMFs by OFDE and CSST,and reconstruct pure IMFs and mixed IMFs to obtain primary denoised signal.In the end,decompose primary denoising signal into IMFs by BVMD again,use the FDE value to distinguish noise IMFs and pure IMFs,and reconstruct pure IMFs to obtain the final denoised signal.The proposed mothod has three advantages:(i)BVMD can adaptively select the decomposition layer and penalty factor of VMD.(ii)FDE and CS are used as double criteria to distinguish noise IMFs from useful IMFs,and Otsu algorithm and CSST algorithm can effectively avoid the error caused by manually selecting thresholds.(iii)Secondary decomposition can make up for the deficiency of primary decomposition and further remove a small amount of noise.The chaotic signal and real ship signal are denoised.The experiment result shows that the proposed method can effectively denoise.It improves the denoising effect after primary decomposition,and has good practical value.
基金supported by the National Natural Science Foundation of China(No.61872219)the Natural Science Foundation of Shandong Province(ZR2019MF001).
文摘Labeled data is widely used in various classification tasks.However,there is a huge challenge that labels are often added artificially.Wrong labels added by malicious users will affect the training effect of the model.The unreliability of labeled data has hindered the research.In order to solve the above problems,we propose a framework of Label Noise Filtering and Missing Label Supplement(LNFS).And we take location labels in Location-Based Social Networks(LBSN)as an example to implement our framework.For the problem of label noise filtering,we first use FastText to transform the restaurant's labels into vectors,and then based on the assumption that the label most similar to all other labels in the location is most representative.We use cosine similarity to judge and select the most representative label.For the problem of label missing,we use simple common word similarity to judge the similarity of users'comments,and then use the label of the similar restaurant to supplement the missing labels.To optimize the performance of the model,we introduce game theory into our model to simulate the game between the malicious users and the model to improve the reliability of the model.Finally,a case study is given to illustrate the effectiveness and reliability of LNFS.
文摘Occurrence of crimes has been on the constant rise despite the emerging discoveries and advancements in the technological field in the past decade.One of the most tedious tasks is to track a suspect once a crime is committed.As most of the crimes are committed by individuals who have a history of felonies,it is essential for a monitoring system that does not just detect the person’s face who has committed the crime,but also their identity.Hence,a smart criminal detection and identification system that makes use of the OpenCV Deep Neural Network(DNN)model which employs a Single Shot Multibox Detector for detection of face and an auto-encoder model in which the encoder part is used for matching the captured facial images with the criminals has been proposed.After detection and extraction of the face in the image by face cropping,the captured face is then compared with the images in the CriminalDatabase.The comparison is performed by calculating the similarity value between each pair of images that are obtained by using the Cosine Similarity metric.After plotting the values in a graph to find the threshold value,we conclude that the confidence rate of the encoder model is 0.75 and above.
文摘To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.
文摘Text classification of low resource language is always a trivial and challenging problem.This paper discusses the process of Urdu news classification and Urdu documents similarity.Urdu is one of the most famous spoken languages in Asia.The implementation of computational methodologies for text classification has increased over time.However,Urdu language has not much experimented with research,it does not have readily available datasets,which turn out to be the primary reason behind limited research and applying the latest methodologies to the Urdu.To overcome these obstacles,a mediumsized dataset having six categories is collected from authentic Pakistani news sources.Urdu is a rich but complex language.Text processing can be challenging for Urdu due to its complex features as compared to other languages.Term frequency-inverse document frequency(TFIDF)based term weighting scheme for extracting features,chi-2 for selecting essential features,and Linear discriminant analysis(LDA)for dimensionality reduction have been used.TFIDF matrix and cosine similarity measure have been used to identify similar documents in a collection and find the semantic meaning of words in a document FastText model has been applied.The training-test split evaluation methodology is used for this experimentation,which includes 70%for training data and 30%for testing data.State-of-the-art machine learning and deep dense neural network approaches for Urdu news classification have been used.Finally,we trained Multinomial Naïve Bayes,XGBoost,Bagging,and Deep dense neural network.Bagging and deep dense neural network outperformed the other algorithms.The experimental results show that deep dense achieves 92.0%mean f1 score,and Bagging 95.0%f1 score.
基金financially supported by the National Key R&D Program of China(No.2018YFA0702504)the National Natural Science Foundation of China(No.42174152 and No.41974140)+1 种基金the Science Foundation of China University of Petroleum,Beijing(No.2462020YXZZ008 and No.2462020QZDX003)the Strategic Cooperation Technology Projects of CNPC and CUPB(No.ZLZX2020-03).
文摘Intelligent seismic facies identification based on deep learning can alleviate the time-consuming and labor-intensive problem of manual interpretation,which has been widely applied.Supervised learning can realize facies identification with high efficiency and accuracy;however,it depends on the usage of a large amount of well-labeled data.To solve this issue,we propose herein an incremental semi-supervised method for intelligent facies identification.Our method considers the continuity of the lateral variation of strata and uses cosine similarity to quantify the similarity of the seismic data feature domain.The maximum-diff erence sample in the neighborhood of the currently used training data is then found to reasonably expand the training sets.This process continuously increases the amount of training data and learns its distribution.We integrate old knowledge while absorbing new ones to realize incremental semi-supervised learning and achieve the purpose of evolving the network models.In this work,accuracy and confusion matrix are employed to jointly control the predicted results of the model from both overall and partial aspects.The obtained values are then applied to a three-dimensional(3D)real dataset and used to quantitatively evaluate the results.Using unlabeled data,our proposed method acquires more accurate and stable testing results compared to conventional supervised learning algorithms that only use well-labeled data.A considerable improvement for small-sample categories is also observed.Using less than 1%of the training data,the proposed method can achieve an average accuracy of over 95%on the 3D dataset.In contrast,the conventional supervised learning algorithm achieved only approximately 85%.
文摘Text Summarization is an essential area in text mining,which has procedures for text extraction.In natural language processing,text summarization maps the documents to a representative set of descriptive words.Therefore,the objective of text extraction is to attain reduced expressive contents from the text documents.Text summarization has two main areas such as abstractive,and extractive summarization.Extractive text summarization has further two approaches,in which the first approach applies the sentence score algorithm,and the second approach follows the word embedding principles.All such text extractions have limitations in providing the basic theme of the underlying documents.In this paper,we have employed text summarization by TF-IDF with PageRank keywords,sentence score algorithm,and Word2Vec word embedding.The study compared these forms of the text summarizations with the actual text,by calculating cosine similarities.Furthermore,TF-IDF based PageRank keywords are extracted from the other two extractive summarizations.An intersection over these three types of TD-IDF keywords to generate the more representative set of keywords for each text document is performed.This technique generates variable-length keywords as per document diversity instead of selecting fixedlength keywords for each document.This form of abstractive summarization improves metadata similarity to the original text compared to all other forms of summarized text.It also solves the issue of deciding the number of representative keywords for a specific text document.To evaluate the technique,the study used a sample of more than eighteen hundred text documents.The abstractive summarization follows the principles of deep learning to create uniform similarity of extracted words with actual text and all other forms of text summarization.The proposed technique provides a stable measure of similarity as compared to existing forms of text summarization.
基金This research was supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0012724,The Competency Development Program for Industry Specialist)and the Soonchunhyang University Research Fund.
文摘The substantial competition among the news industries puts editors under the pressure of posting news articleswhich are likely to gain more user attention. Anticipating the popularity of news articles can help the editorial teamsin making decisions about posting a news article. Article similarity extracted from the articles posted within a smallperiod of time is found to be a useful feature in existing popularity prediction approaches. This work proposesa new approach to estimate the popularity of news articles by adding semantics in the article similarity basedapproach of popularity estimation. A semantically enriched model is proposed which estimates news popularity bymeasuring cosine similarity between document embeddings of the news articles. Word2vec model has been used togenerate distributed representations of the news content. In this work, we define popularity as the number of timesa news article is posted on different websites. We collect data from different websites that post news concerning thedomain of cybersecurity and estimate the popularity of cybersecurity news. The proposed approach is comparedwith different models and it is shown that it outperforms the other models.
文摘A novel channel attention residual network(CAN)for SISR has been proposed to rescale pixel-wise features by explicitly modeling interdependencies between channels and encoding where the visual attention is located.The backbone of CAN is channel attention block(CAB).The proposed CAB combines cosine similarity block(CSB)and back-projection gating block(BG).CSB fully considers global spatial information of each channel and computes the cosine similarity between each channel to obtain finer channel statistics than the first-order statistics.For further exploration of channel attention,we introduce effective back-projection to the gating mechanism and propose BG.Meanwhile,we adopt local and global residual connections in SISR which directly convey most low-frequency information to the final SR outputs and valuable high-frequency components are allocated more computational resources through channel attention mechanism.Extensive experiments show the superiority of the proposed CAN over the state-of-the-art methods on benchmark datasets in both accuracy and visual quality.
文摘Artificial Intelligence(AI)tools become essential across industries,distinguishing AI-generated from human-authored text is increasingly challenging.This study assesses the coherence of AI-generated titles and corresponding abstracts in anticipation of rising AI-assisted document production.Our main goal is to examine the correlation between original and AI-generated titles,emphasizing semantic depth and similarity measures,particularly in the context of Large Language Models(LLMs).We argue that LLMs have transformed research focus,dissemination,and citation patterns across five selected knowledge areas:Business Administration and Management(BAM),Computer Science and Information Technology(CS),Engineering and Material Science(EMS),Medicine and Healthcare(MH),and Psychology and Behavioral Sciences(PBS).We collected 15000 titles and abstracts,narrowing the selection to 2000 through a rigorous multi-stage screening process adhering to our study’s criteria.Result shows that there is insufficient evidence to suggest that LLM outperforms human authors in article title generation or articles from the LLM era demonstrates a marked difference in semantic richness and readability compared to those from the pre-LLM.Instead,it asserts that LLM is a valuable tool and can assist researchers in generating titles.With LLM’s assistance,the researcher ensures that the content is reflective of the finalized abstract and core research themes,potentially increasing the impact and accessibility and readability of the academic work.
基金supported in part by the National Natural Science Foundation of China(Grant Nos.62076159,12031010,61673251,and 61771297)was also supported by the Fundamental Research Funds for the Central Universities(GK202105003)+1 种基金the Natural Science Basic Research Program of Shaanxi Province of China(2022JM334)the Innovation Funds of Graduate Programs at Shaanxi Normal University(2015CXS028 and 2016CSY009).
文摘It is a significant and challenging task to detect the informative features to carry out explainable analysis for high dimensional data,especially for those with very small number of samples.Feature selection especially the unsupervised ones are the right way to deal with this challenge and realize the task.Therefore,two unsupervised spectral feature selection algorithms are proposed in this paper.They group features using advanced Self-Tuning spectral clustering algorithm based on local standard deviation,so as to detect the global optimal feature clusters as far as possible.Then two feature ranking techniques,including cosine-similarity-based feature ranking and entropy-based feature ranking,are proposed,so that the representative feature of each cluster can be detected to comprise the feature subset on which the explainable classification system will be built.The effectiveness of the proposed algorithms is tested on high dimensional benchmark omics datasets and compared to peer methods,and the statistical test are conducted to determine whether or not the proposed spectral feature selection algorithms are significantly different from those of the peer methods.The extensive experiments demonstrate the proposed unsupervised spectral feature selection algorithms outperform the peer ones in comparison,especially the one based on cosine similarity feature ranking technique.The statistical test results show that the entropy feature ranking based spectral feature selection algorithm performs best.The detected features demonstrate strong discriminative capabilities in downstream classifiers for omics data,such that the AI system built on them would be reliable and explainable.It is especially significant in building transparent and trustworthy medical diagnostic systems from an interpretable AI perspective.
基金Supported by the National Key Research and Development Projects(2017YFC0821000)Guangzhou Science and Technology Project(2019030004)Key Lab of Forensic Science,Ministry of Justice,China(KF202117)。
文摘In recent years,various speech embedding methods based on deep learning have been proposed and have shown better performance in speaker verification.Those new technologies will inevitably promote the development of forensic speaker verification.We propose a new forensic speaker verification method based on embeddings trained with loss function called generalized end-to-end(GE2E)loss.First,a long short-term memory(LSTM)based deep neural network(DNN)is trained as the embedding extractor,then the cosine similarity scores between embeddings from same speaker comparison pairs and different speaker comparison pairs are trained to represent within-speaker model and between-speaker model respectively,and finally,the cosine similarity scores between the questioned embeddings and enrolled embeddings are evaluated in the above two models to get the likelihood ratio(LR)value.On the subset of LibriSpeech,test-other-500,we achieve a new state of the art.Both all the same speaker comparison pairs and different speaker comparison pairs get correct results and can provide considerable strong evidence strength for courts.