The number of mobile application services is showing an explosive growth trend,which makes it difficult for users to determine which ones are of interest.Especially,the new mobile application services are emerge conti...The number of mobile application services is showing an explosive growth trend,which makes it difficult for users to determine which ones are of interest.Especially,the new mobile application services are emerge continuously,most of them have not be rated when they need to be recommended to users.This is the typical problem of cold start in the field of collaborative filtering recommendation.This problem may makes it difficult for users to locate and acquire the services that they actually want,and the accuracy and novelty of service recommendations are also difficult to satisfy users.To solve this problem,a hybrid recommendation method for mobile application services based on content feature extraction is proposed in this paper.First,the proposed method in this paper extracts service content features through Natural Language Processing technologies such as word segmentation,part-of-speech tagging,and dependency parsing.It improves the accuracy of describing service attributes and the rationality of the method of calculating service similarity.Then,a language representation model called Bidirectional Encoder Representation from Transformers(BERT)is used to vectorize the content feature text,and an improved weighted word mover’s distance algorithm based on Term Frequency-Inverse Document Frequency(TFIDF-WMD)is used to calculate the similarity of mobile application services.Finally,the recommendation process is completed by combining the item-based collaborative filtering recommendation algorithm.The experimental results show that by using the proposed hybrid recommendation method presented in this paper,the cold start problem is alleviated to a certain extent,and the accuracy of the recommendation result has been significantly improved.展开更多
The successful face recognition based on local binary pattern(LBP)relies on the effective extraction of LBP features and the inferring of similarity between the extracted features.In this paper,we focus on the latter ...The successful face recognition based on local binary pattern(LBP)relies on the effective extraction of LBP features and the inferring of similarity between the extracted features.In this paper,we focus on the latter and propose two novel similarity measures for the local matching methods and the holistic matching methods respectively.One is Earth Mover's Distance with Hamming and Lp ground distance(EMD-HammingLp),which is a cross-bin dissimilarity measure for LBP histograms.The other is IMage Hamming Distance(IMHD),which is a dissimilarity measure for the whole LBP images.Experiments on FERET database show that the proposed two similarity measures outperform the state-of-the-art Chi-square similarity measure for extraction of LBP features.展开更多
Behavior targeting(BT)based on individual web-browsing history has become more valuable in precision marketing for many companies through capturing users’interest and preference.It is common in practice that the beha...Behavior targeting(BT)based on individual web-browsing history has become more valuable in precision marketing for many companies through capturing users’interest and preference.It is common in practice that the behavior data collected from different online shopping applications are inconsistent since they are labelled by different item taxonomy,where the same behavior could have different representations and therefore analysis confusion arises.To address this issue,we propose a semantic similarity based strategy to transform the heterogeneous behavior extracted from deep packet inspection(DPI)data of a telecommunication operator into a unique standard one.The Word Mover’s Distance algorithm is exploited to evaluate the semantic similarity of the distributed representations of two web-browsing histories.Moreover,the architecture of the behavior targeting platform on Hadoop is implemented,which is capable of processing data with size of PB level every day.展开更多
The statute recommendation problem is a sub problem of the automated decision system, which can help the legal staff to deal with the process of the case in an intelligent and automated way. In this paper, an improved...The statute recommendation problem is a sub problem of the automated decision system, which can help the legal staff to deal with the process of the case in an intelligent and automated way. In this paper, an improved common word similarity algorithm is proposed for normalization. Meanwhile, word mover’s distance (WMD) algorithm was applied to the similarity measurement and statute recommendation problem, and the problem scene which was originally used for classification was extended. Finally, a variety of recommendation strategies different from traditional collaborative filtering methods were proposed. The experimental results show that it achieves the best value of Fmeasure reaching 0.799. And the comparative experiment shows that WMD algorithm can achieve better results than TF-IDF and LDA algorithm.展开更多
针对基于EMD(Earth Mover's Distance)的文档语义相似性算法不满足度量公理因而难以在信息检索与数据挖掘中推广应用的问题,该文提出了一种新的基于EMD的文档语义相似性度量——..Mdss_EMD(Metric for document semantic similarity...针对基于EMD(Earth Mover's Distance)的文档语义相似性算法不满足度量公理因而难以在信息检索与数据挖掘中推广应用的问题,该文提出了一种新的基于EMD的文档语义相似性度量——..Mdss_EMD(Metric for document semantic similarity based EMD)。首先在分析EMD及现有改进方法缺陷的基础上,给出了文档宽度、虚拟项的概念;随后通过增加虚拟项来对齐文档矢量的总权值,使所有度量公理得到满足;最后,为提高该度量的适应能力及处理速度,还实现了虚拟项相似距离的弹性设计并对EMD算法进行了简化。该方法把EMD扩展到度量空间中来,很大程度上提高了EMD的索引能力与精度,初步实验表明,Mdss_EMD的整体性能优于原EMD及现有其它类似方法。展开更多
基金Project supported by the National Natural Science Foundation,China(No.62172123)the Postdoctoral Science Foundation of Heilongjiang Province,China(No.LBH-Z19067)+1 种基金the special projects for the central government to guide the development of local science and technology,China(No.ZY20B11)the Natural Science Foundation of Heilongjiang Province,China(No.QC2018081).
文摘The number of mobile application services is showing an explosive growth trend,which makes it difficult for users to determine which ones are of interest.Especially,the new mobile application services are emerge continuously,most of them have not be rated when they need to be recommended to users.This is the typical problem of cold start in the field of collaborative filtering recommendation.This problem may makes it difficult for users to locate and acquire the services that they actually want,and the accuracy and novelty of service recommendations are also difficult to satisfy users.To solve this problem,a hybrid recommendation method for mobile application services based on content feature extraction is proposed in this paper.First,the proposed method in this paper extracts service content features through Natural Language Processing technologies such as word segmentation,part-of-speech tagging,and dependency parsing.It improves the accuracy of describing service attributes and the rationality of the method of calculating service similarity.Then,a language representation model called Bidirectional Encoder Representation from Transformers(BERT)is used to vectorize the content feature text,and an improved weighted word mover’s distance algorithm based on Term Frequency-Inverse Document Frequency(TFIDF-WMD)is used to calculate the similarity of mobile application services.Finally,the recommendation process is completed by combining the item-based collaborative filtering recommendation algorithm.The experimental results show that by using the proposed hybrid recommendation method presented in this paper,the cold start problem is alleviated to a certain extent,and the accuracy of the recommendation result has been significantly improved.
文摘The successful face recognition based on local binary pattern(LBP)relies on the effective extraction of LBP features and the inferring of similarity between the extracted features.In this paper,we focus on the latter and propose two novel similarity measures for the local matching methods and the holistic matching methods respectively.One is Earth Mover's Distance with Hamming and Lp ground distance(EMD-HammingLp),which is a cross-bin dissimilarity measure for LBP histograms.The other is IMage Hamming Distance(IMHD),which is a dissimilarity measure for the whole LBP images.Experiments on FERET database show that the proposed two similarity measures outperform the state-of-the-art Chi-square similarity measure for extraction of LBP features.
基金Beijing University of Posts and Telecommunications,ChinaChina Telecom for cooperation and support for this paper
文摘Behavior targeting(BT)based on individual web-browsing history has become more valuable in precision marketing for many companies through capturing users’interest and preference.It is common in practice that the behavior data collected from different online shopping applications are inconsistent since they are labelled by different item taxonomy,where the same behavior could have different representations and therefore analysis confusion arises.To address this issue,we propose a semantic similarity based strategy to transform the heterogeneous behavior extracted from deep packet inspection(DPI)data of a telecommunication operator into a unique standard one.The Word Mover’s Distance algorithm is exploited to evaluate the semantic similarity of the distributed representations of two web-browsing histories.Moreover,the architecture of the behavior targeting platform on Hadoop is implemented,which is capable of processing data with size of PB level every day.
文摘The statute recommendation problem is a sub problem of the automated decision system, which can help the legal staff to deal with the process of the case in an intelligent and automated way. In this paper, an improved common word similarity algorithm is proposed for normalization. Meanwhile, word mover’s distance (WMD) algorithm was applied to the similarity measurement and statute recommendation problem, and the problem scene which was originally used for classification was extended. Finally, a variety of recommendation strategies different from traditional collaborative filtering methods were proposed. The experimental results show that it achieves the best value of Fmeasure reaching 0.799. And the comparative experiment shows that WMD algorithm can achieve better results than TF-IDF and LDA algorithm.
文摘针对基于EMD(Earth Mover's Distance)的文档语义相似性算法不满足度量公理因而难以在信息检索与数据挖掘中推广应用的问题,该文提出了一种新的基于EMD的文档语义相似性度量——..Mdss_EMD(Metric for document semantic similarity based EMD)。首先在分析EMD及现有改进方法缺陷的基础上,给出了文档宽度、虚拟项的概念;随后通过增加虚拟项来对齐文档矢量的总权值,使所有度量公理得到满足;最后,为提高该度量的适应能力及处理速度,还实现了虚拟项相似距离的弹性设计并对EMD算法进行了简化。该方法把EMD扩展到度量空间中来,很大程度上提高了EMD的索引能力与精度,初步实验表明,Mdss_EMD的整体性能优于原EMD及现有其它类似方法。