Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can...Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can beextracted from this massive data using the Data Mining process. The informationextracted can be used to make vital decisions in various industries. Clustering is avery popular Data Mining method which divides the data points into differentgroups such that all similar data points form a part of the same group. Clusteringmethods are of various types. Many parameters and indexes exist for the evaluationand comparison of these methods. In this paper, we have compared partitioningbased methods K-Means, Fuzzy C-Means (FCM), Partitioning AroundMedoids (PAM) and Clustering Large Application (CLARA) on secure perturbeddata. Comparison and identification has been done for the method which performsbetter for analyzing the data perturbed using Extended NMF on the basis of thevalues of various indexes like Dunn Index, Silhouette Index, Xie-Beni Indexand Davies-Bouldin Index.展开更多
Underwater direction of arrival(DOA)estimation has always been a very challenging theoretical and practical problem.Due to the serious non-stationary,non-linear,and non-Gaussian characteristics,machine learning based ...Underwater direction of arrival(DOA)estimation has always been a very challenging theoretical and practical problem.Due to the serious non-stationary,non-linear,and non-Gaussian characteristics,machine learning based DOA estimation methods trained on simulated Gaussian noised array data cannot be directly applied to actual underwater DOA estimation tasks.In order to deal with this problem,environmental data with no target echoes can be employed to analyze the non-Gaussian components.Then,the obtained information about non-Gaussian components can be used to whiten the array data.Based on these considerations,a novel practical sonar array whitening method was proposed.Specifically,based on a weak assumption that the non-Gaussian components in adjacent patches with and without target echoes are almost the same,canonical cor-relation analysis(CCA)and non-negative matrix factorization(NMF)techniques are employed for whitening the array data.With the whitened array data,machine learning based DOA estimation models trained on simulated Gaussian noised datasets can be used to perform underwater DOA estimation tasks.Experimental results illustrated that,using actual underwater datasets for testing with known machine learning based DOA estimation models,accurate and robust DOA estimation performance can be achieved by using the proposed whitening method in different underwater con-ditions.展开更多
Contrastive learning is a significant research direction in the field of deep learning.However,existing data augmentation methods often lead to issues such as semantic drift in generated views while the complexity of ...Contrastive learning is a significant research direction in the field of deep learning.However,existing data augmentation methods often lead to issues such as semantic drift in generated views while the complexity of model pre-training limits further improvement in the performance of existing methods.To address these challenges,we propose the Efficient Clustering Network based on Matrix Factorization(ECN-MF).Specifically,we design a batched low-rank Singular Value Decomposition(SVD)algorithm for data augmentation to eliminate redundant information and uncover major patterns of variation and key information in the data.Additionally,we design a Mutual Information-Enhanced Clustering Module(MI-ECM)to accelerate the training process by leveraging a simple architecture to bring samples from the same cluster closer while pushing samples from other clusters apart.Extensive experiments on six datasets demonstrate that ECN-MF exhibits more effective performance compared to state-of-the-art algorithms.展开更多
The proliferation of internet communication channels has increased telecom fraud,causing billions of euros in losses for customers and the industry each year.Fraudsters constantly find new ways to engage in illegal ac...The proliferation of internet communication channels has increased telecom fraud,causing billions of euros in losses for customers and the industry each year.Fraudsters constantly find new ways to engage in illegal activity on the network.To reduce these losses,a new fraud detection approach is required.Telecom fraud detection involves identifying a small number of fraudulent calls from a vast amount of call traffic.Developing an effective strategy to combat fraud has become challenging.Although much effort has been made to detect fraud,most existing methods are designed for batch processing,not real-time detection.To solve this problem,we propose an online fraud detection model using a Neural Factorization Autoencoder(NFA),which analyzes customer calling patterns to detect fraudulent calls.The model employs Neural Factorization Machines(NFM)and an Autoencoder(AE)to model calling patterns and a memory module to adapt to changing customer behaviour.We evaluate our approach on a large dataset of real-world call detail records and compare it with several state-of-the-art methods.Our results show that our approach outperforms the baselines,with an AUC of 91.06%,a TPR of 91.89%,an FPR of 14.76%,and an F1-score of 95.45%.These results demonstrate the effectiveness of our approach in detecting fraud in real-time and suggest that it can be a valuable tool for preventing fraud in telecommunications networks.展开更多
This study aimed to investigate the pollution characteristics, source apportionment, and health risks associated with trace metal(loid)s(TMs) in the major agricultural producing areas in Chongqing, China. We analyzed ...This study aimed to investigate the pollution characteristics, source apportionment, and health risks associated with trace metal(loid)s(TMs) in the major agricultural producing areas in Chongqing, China. We analyzed the source apportionment and assessed the health risk of TMs in agricultural soils by using positive matrix factorization(PMF) model and health risk assessment(HRA) model based on Monte Carlo simulation. Meanwhile, we combined PMF and HRA models to explore the health risks of TMs in agricultural soils by different pollution sources to determine the priority control factors. Results showed that the average contents of cadmium(Cd), arsenic (As), lead(Pb), chromium(Cr), copper(Cu), nickel(Ni), and zinc(Zn) in the soil were found to be 0.26, 5.93, 27.14, 61.32, 23.81, 32.45, and 78.65 mg/kg, respectively. Spatial analysis and source apportionment analysis revealed that urban and industrial sources, agricultural sources, and natural sources accounted for 33.0%, 27.7%, and 39.3% of TM accumulation in the soil, respectively. In the HRA model based on Monte Carlo simulation, noncarcinogenic risks were deemed negligible(hazard index <1), the carcinogenic risks were at acceptable level(10^(-6)<total carcinogenic risk ≤ 10^(-4)), with higher risks observed for children compared to adults. The relationship between TMs, their sources, and health risks indicated that urban and industrial sources were primarily associated with As, contributing to 75.1% of carcinogenic risks and 55.7% of non-carcinogenic risks, making them the primary control factors. Meanwhile, agricultural sources were primarily linked to Cd and Pb, contributing to 13.1% of carcinogenic risks and 21.8% of non-carcinogenic risks, designating them as secondary control factors.展开更多
行人检测在机器人、驾驶辅助系统和视频监控等领域有广泛的应用,该文提出一种基于显著性检测与方向梯度直方图-非负矩阵分解(Histogram of Oriented Gradient-Non-negative Matrix Factorization,HOG-NMF)特征的快速行人检测方法。采用...行人检测在机器人、驾驶辅助系统和视频监控等领域有广泛的应用,该文提出一种基于显著性检测与方向梯度直方图-非负矩阵分解(Histogram of Oriented Gradient-Non-negative Matrix Factorization,HOG-NMF)特征的快速行人检测方法。采用频谱调谐显著性检测提取显著图,并基于熵值门限进行感兴趣区域的提取;组合非负矩阵分解和方向梯度直方图生成HOG-NMF特征;采用加性交叉核支持向量机方法(Intersection Kernel Support Vector Machine,IKSVM)。该算法显著降低了特征维数,在相同的计算复杂度下明显改善了线性支持向量机的检测率。在INRIA数据库的实验结果表明,该方法对比HOG/线性SVM和HOG/RBF-SVM显著减少了检测时间,并达到了满意的检测率。展开更多
文摘Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can beextracted from this massive data using the Data Mining process. The informationextracted can be used to make vital decisions in various industries. Clustering is avery popular Data Mining method which divides the data points into differentgroups such that all similar data points form a part of the same group. Clusteringmethods are of various types. Many parameters and indexes exist for the evaluationand comparison of these methods. In this paper, we have compared partitioningbased methods K-Means, Fuzzy C-Means (FCM), Partitioning AroundMedoids (PAM) and Clustering Large Application (CLARA) on secure perturbeddata. Comparison and identification has been done for the method which performsbetter for analyzing the data perturbed using Extended NMF on the basis of thevalues of various indexes like Dunn Index, Silhouette Index, Xie-Beni Indexand Davies-Bouldin Index.
基金supported by the National Natural Science Foundation of China(No.51279033).
文摘Underwater direction of arrival(DOA)estimation has always been a very challenging theoretical and practical problem.Due to the serious non-stationary,non-linear,and non-Gaussian characteristics,machine learning based DOA estimation methods trained on simulated Gaussian noised array data cannot be directly applied to actual underwater DOA estimation tasks.In order to deal with this problem,environmental data with no target echoes can be employed to analyze the non-Gaussian components.Then,the obtained information about non-Gaussian components can be used to whiten the array data.Based on these considerations,a novel practical sonar array whitening method was proposed.Specifically,based on a weak assumption that the non-Gaussian components in adjacent patches with and without target echoes are almost the same,canonical cor-relation analysis(CCA)and non-negative matrix factorization(NMF)techniques are employed for whitening the array data.With the whitened array data,machine learning based DOA estimation models trained on simulated Gaussian noised datasets can be used to perform underwater DOA estimation tasks.Experimental results illustrated that,using actual underwater datasets for testing with known machine learning based DOA estimation models,accurate and robust DOA estimation performance can be achieved by using the proposed whitening method in different underwater con-ditions.
基金supported by the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+3 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211)Innovative Research Project for Graduate Students in Hainan Province(Grant Nos.Qhys2023-96,Qhys2023-95).
文摘Contrastive learning is a significant research direction in the field of deep learning.However,existing data augmentation methods often lead to issues such as semantic drift in generated views while the complexity of model pre-training limits further improvement in the performance of existing methods.To address these challenges,we propose the Efficient Clustering Network based on Matrix Factorization(ECN-MF).Specifically,we design a batched low-rank Singular Value Decomposition(SVD)algorithm for data augmentation to eliminate redundant information and uncover major patterns of variation and key information in the data.Additionally,we design a Mutual Information-Enhanced Clustering Module(MI-ECM)to accelerate the training process by leveraging a simple architecture to bring samples from the same cluster closer while pushing samples from other clusters apart.Extensive experiments on six datasets demonstrate that ECN-MF exhibits more effective performance compared to state-of-the-art algorithms.
基金This research work has been conducted in cooperation with members of DETSI project supported by BPI France and Pays de Loire and Auvergne Rhone Alpes.
文摘The proliferation of internet communication channels has increased telecom fraud,causing billions of euros in losses for customers and the industry each year.Fraudsters constantly find new ways to engage in illegal activity on the network.To reduce these losses,a new fraud detection approach is required.Telecom fraud detection involves identifying a small number of fraudulent calls from a vast amount of call traffic.Developing an effective strategy to combat fraud has become challenging.Although much effort has been made to detect fraud,most existing methods are designed for batch processing,not real-time detection.To solve this problem,we propose an online fraud detection model using a Neural Factorization Autoencoder(NFA),which analyzes customer calling patterns to detect fraudulent calls.The model employs Neural Factorization Machines(NFM)and an Autoencoder(AE)to model calling patterns and a memory module to adapt to changing customer behaviour.We evaluate our approach on a large dataset of real-world call detail records and compare it with several state-of-the-art methods.Our results show that our approach outperforms the baselines,with an AUC of 91.06%,a TPR of 91.89%,an FPR of 14.76%,and an F1-score of 95.45%.These results demonstrate the effectiveness of our approach in detecting fraud in real-time and suggest that it can be a valuable tool for preventing fraud in telecommunications networks.
基金supported by Project of Chongqing Science and Technology Bureau (cstc2022jxjl0005)。
文摘This study aimed to investigate the pollution characteristics, source apportionment, and health risks associated with trace metal(loid)s(TMs) in the major agricultural producing areas in Chongqing, China. We analyzed the source apportionment and assessed the health risk of TMs in agricultural soils by using positive matrix factorization(PMF) model and health risk assessment(HRA) model based on Monte Carlo simulation. Meanwhile, we combined PMF and HRA models to explore the health risks of TMs in agricultural soils by different pollution sources to determine the priority control factors. Results showed that the average contents of cadmium(Cd), arsenic (As), lead(Pb), chromium(Cr), copper(Cu), nickel(Ni), and zinc(Zn) in the soil were found to be 0.26, 5.93, 27.14, 61.32, 23.81, 32.45, and 78.65 mg/kg, respectively. Spatial analysis and source apportionment analysis revealed that urban and industrial sources, agricultural sources, and natural sources accounted for 33.0%, 27.7%, and 39.3% of TM accumulation in the soil, respectively. In the HRA model based on Monte Carlo simulation, noncarcinogenic risks were deemed negligible(hazard index <1), the carcinogenic risks were at acceptable level(10^(-6)<total carcinogenic risk ≤ 10^(-4)), with higher risks observed for children compared to adults. The relationship between TMs, their sources, and health risks indicated that urban and industrial sources were primarily associated with As, contributing to 75.1% of carcinogenic risks and 55.7% of non-carcinogenic risks, making them the primary control factors. Meanwhile, agricultural sources were primarily linked to Cd and Pb, contributing to 13.1% of carcinogenic risks and 21.8% of non-carcinogenic risks, designating them as secondary control factors.
文摘行人检测在机器人、驾驶辅助系统和视频监控等领域有广泛的应用,该文提出一种基于显著性检测与方向梯度直方图-非负矩阵分解(Histogram of Oriented Gradient-Non-negative Matrix Factorization,HOG-NMF)特征的快速行人检测方法。采用频谱调谐显著性检测提取显著图,并基于熵值门限进行感兴趣区域的提取;组合非负矩阵分解和方向梯度直方图生成HOG-NMF特征;采用加性交叉核支持向量机方法(Intersection Kernel Support Vector Machine,IKSVM)。该算法显著降低了特征维数,在相同的计算复杂度下明显改善了线性支持向量机的检测率。在INRIA数据库的实验结果表明,该方法对比HOG/线性SVM和HOG/RBF-SVM显著减少了检测时间,并达到了满意的检测率。