To thoroughly understand market opportunity of freeze-dried facial mask and deeply get insight of consumers’usage behavior and needs,evaluate sensory feelings of 10 screened commercial freeze-dried facial mask produc...To thoroughly understand market opportunity of freeze-dried facial mask and deeply get insight of consumers’usage behavior and needs,evaluate sensory feelings of 10 screened commercial freeze-dried facial mask products,group test products according to the differences of sensory attributions via Principal Component Analysis(PCA)and Agglomerative Hierarchical Clustering(AHC),pick up the representative products.Freeze-dried facial mask users evaluate satisfaction degree of picked up products and participate survey of usage behavior/cognition.Analyze consumer data by AHC to get consumer segmentations and their profile.The test results show that,sensory data and consumer data,which is from consumers test of screened representative products by performing PCA and AHC on sensory data,can be verified mutually.It is helpful to understand the needs of consumer segmentations and reason to buy by combining sensory data and consumer test.展开更多
As social media and online activity continue to pervade all age groups, it serves as a crucial platform for sharing personal experiences and opinions as well as information about attitudes and preferences for certain ...As social media and online activity continue to pervade all age groups, it serves as a crucial platform for sharing personal experiences and opinions as well as information about attitudes and preferences for certain interests or purchases. This generates a wealth of behavioral data, which, while invaluable to businesses, researchers, policymakers, and the cybersecurity sector, presents significant challenges due to its unstructured nature. Existing tools for analyzing this data often lack the capability to effectively retrieve and process it comprehensively. This paper addresses the need for an advanced analytical tool that ethically and legally collects and analyzes social media data and online activity logs, constructing detailed and structured user profiles. It reviews current solutions, highlights their limitations, and introduces a new approach, the Advanced Social Analyzer (ASAN), that bridges these gaps. The proposed solutions technical aspects, implementation, and evaluation are discussed, with results compared to existing methodologies. The paper concludes by suggesting future research directions to further enhance the utility and effectiveness of social media data analysis.展开更多
With the increase in the aging population,the need for elderly care services has diversified,and smart elderly care has become an effective measure to cope with this increasing aging population.Based on the data from ...With the increase in the aging population,the need for elderly care services has diversified,and smart elderly care has become an effective measure to cope with this increasing aging population.Based on the data from the platform“Guan Hu Tong”of RQ Company in the community of Shaanxi Province in western China,this study mined the data of smart elderly care services through the recency,frequency and monetary value(RFM)model and the backpropagation(BP)neural network model,constructed the user profile of the elderly,and predicted users’practical demands.The following conclusions were drawn:The oldest users are important target users of smart elderly care service platforms;Elderly women living alone rely more on smart elderly care services;Meal delivery and health follow-up services are the most popular among elderly users.展开更多
With the rapid development of the mobile Internet,users generate massive data in different forms in social network every day,and different characteristics of users are reflected by these social media data.How to integ...With the rapid development of the mobile Internet,users generate massive data in different forms in social network every day,and different characteristics of users are reflected by these social media data.How to integrate multiple heterogeneous information and establish user profiles from multiple perspectives plays an important role in providing personalized services,marketing,and recommendation systems.In this paper,we propose Multi-source&Multi-task Learning for User Profiles in Social Network which integrates multiple social data sources and contains a multi-task learning framework to simultaneously predict various attributes of a user.Firstly,we design their own feature extraction models for multiple heterogeneous data sources.Secondly,we design a shared layer to fuse multiple heterogeneous data sources as general shared representation for multi-task learning.Thirdly,we design each task’s own unique presentation layer for discriminant output of specific-task.Finally,we design a weighted loss function to improve the learning efficiency and prediction accuracy of each task.Our experimental results on more than 5000 Sina Weibo users demonstrate that our approach outperforms state-of-the-art baselines for inferring gender,age and region of social media users.展开更多
Emotions of users do not converge in a single application but are scattered across diverse applications.Mobile devices are the closest media for handling user data and these devices have the advantage of integrating p...Emotions of users do not converge in a single application but are scattered across diverse applications.Mobile devices are the closest media for handling user data and these devices have the advantage of integrating private user information and emotions spread over different applications.In this paper,we first analyze user profile on a mobile device by describing the problem of the user sentiment profile system in terms of data granularity,media diversity,and server-side solution.Fine-grained data requires additional data and structural analysis in mobile devices.Media diversity requires standard parameters to integrate user data from various applications.A server-side solution presents a potential risk when handling individual privacy information.Therefore,in order to overcome these problems,we propose a general-purposed user profile system based on sentiment analysis that extracts individual emotional preferences by comparing the difference between public and individual data based on particular features.The proposed system is built based on a sentiment hierarchy,which is created by using unstructured data on mobile devices.It can compensate for the concentration of single media,and analyze individual private data without the invasion of privacy on mobile devices.展开更多
The user’s intent to seek online information has been an active area of research in user profiling.User profiling considers user characteristics,behaviors,activities,and preferences to sketch user intentions,interest...The user’s intent to seek online information has been an active area of research in user profiling.User profiling considers user characteristics,behaviors,activities,and preferences to sketch user intentions,interests,and motivations.Determining user characteristics can help capture implicit and explicit preferences and intentions for effective user-centric and customized content presentation.The user’s complete online experience in seeking information is a blend of activities such as searching,verifying,and sharing it on social platforms.However,a combination of multiple behaviors in profiling users has yet to be considered.This research takes a novel approach and explores user intent types based on multidimensional online behavior in information acquisition.This research explores information search,verification,and dissemination behavior and identifies diverse types of users based on their online engagement using machine learning.The research proposes a generic user profile template that explains the user characteristics based on the internet experience and uses it as ground truth for data annotation.User feedback is based on online behavior and practices collected by using a survey method.The participants include both males and females from different occupation sectors and different ages.The data collected is subject to feature engineering,and the significant features are presented to unsupervised machine learning methods to identify user intent classes or profiles and their characteristics.Different techniques are evaluated,and the K-Mean clustering method successfully generates five user groups observing different user characteristics with an average silhouette of 0.36 and a distortion score of 1136.Feature average is computed to identify user intent type characteristics.The user intent classes are then further generalized to create a user intent template with an Inter-Rater Reliability of 75%.This research successfully extracts different user types based on their preferences in online content,platforms,criteria,and frequency.The study also validates the proposed template on user feedback data through Inter-Rater Agreement process using an external human rater.展开更多
With the growing popularity of the World Wide Web, large volume of useraccess data has been gathered automatically by Web servers and stored in Web logs. Discovering andunderstanding user behavior patterns from log fi...With the growing popularity of the World Wide Web, large volume of useraccess data has been gathered automatically by Web servers and stored in Web logs. Discovering andunderstanding user behavior patterns from log files can provide Web personalized recommendationservices. In this paper, a novel clustering method is presented for log files called Clusteringlarge Weblog based on Key Path Model (CWKPM), which is based on user browsing key path model, to getuser behavior profiles. Compared with the previous Boolean model, key path model considers themajor features of users'' accessing to the Web: ordinal, contiguous and duplicate. Moreover, forclustering, it has fewer dimensions. The analysis and experiments show that CWKPM is an efficientand effective approach for clustering large and high-dimension Web logs.展开更多
User profiles are widely used in the age of big data.However,generating and releasing user profiles may cause serious privacy leakage,since a large number of personal data are collected and analyzed.In this paper,we p...User profiles are widely used in the age of big data.However,generating and releasing user profiles may cause serious privacy leakage,since a large number of personal data are collected and analyzed.In this paper,we propose a differentially private user profile construction method DP-UserPro,which is composed of DP-CLIQUE and privately top-κtags selection.DP-CLIQUE is a differentially private high dimensional data cluster algorithm based on CLIQUE.The multidimensional tag space is divided into cells,Laplace noises are added into the count value of each cell.Based on the breadth-first-search,the largest connected dense cells are clustered into a cluster.Then a privately top-κtags selection approach is proposed based on the score function of each tag,to select the most importantκtags which can represent the characteristics of the cluster.Privacy and utility of DP-UserPro are theoretically analyzed and experimentally evaluated in the last.Comparison experiments are carried out with Tag Suppression algorithm on two real datasets,to measure the False Negative Rate(FNR)and precision.The results show that DP-UserPro outperforms Tag Suppression by 62.5%in the best case and 14.25%in the worst case on FNR,and DP-UserPro is about 21.1%better on precision than that of Tag Suppression,in average.展开更多
User profile matching can establish social relationships between different users in the social network.If the user profile is matched in plaintext,the user's privacy might face a security challenge.Although there ...User profile matching can establish social relationships between different users in the social network.If the user profile is matched in plaintext,the user's privacy might face a security challenge.Although there exist some schemes realizing privacypreserving user profile matching,the resource-limited users or social service providers in these schemes need to take higher computational complexity to ensure the privacy or matching of the data.To overcome the problems,a novel privacy-preserving user profile matching protocol in social networks is proposed by using t-out-of n servers and the bloom filter technique,in which the computational complexity of a user is reduced by applying the Chinese Remainder Theorem,the matching users can be found with the help of any t matching servers,and the privacy of the user profile is not compromised.Furthermore,if at most t-1 servers are allowed to collude,our scheme can still fulfill user profile privacy and user query privacy.Finally,the performance of the proposed scheme is compared with the other two schemes,and the results show that our scheme is superior to them.展开更多
With the popularity of social media,there has been an increasing interest in user profiling and its applications nowadays.This paper presents our system named UIR-SIST for User Profiling Technology Evaluation Campaign...With the popularity of social media,there has been an increasing interest in user profiling and its applications nowadays.This paper presents our system named UIR-SIST for User Profiling Technology Evaluation Campaign in SMP CUP 2017.UIR-SIST aims to complete three tasks,including keywords extraction from blogs,user interests labeling and user growth value prediction.To this end,we first extract keywords from a user’s blog,including the blog itself,blogs on the same topic and other blogs published by the same user.Then a unified neural network model is constructed based on a convolutional neural network(CNN)for user interests tagging.Finally,we adopt a stacking model for predicting user growth value.We eventually receive the sixth place with evaluation scores of 0.563,0.378 and 0.751 on the three tasks,respectively.展开更多
In order to solve the problem that current search engines provide query-oriented searches rather than user-oriented ones, and that this improper orientation leads to the search engines' inability to meet the personal...In order to solve the problem that current search engines provide query-oriented searches rather than user-oriented ones, and that this improper orientation leads to the search engines' inability to meet the personalized requirements of users, a novel method based on probabilistic latent semantic analysis (PLSA) is proposed to convert query-oriented web search to user-oriented web search. First, a user profile represented as a user' s topics of interest vector is created by analyzing the user' s click through data based on PLSA, then the user' s queries are mapped into categories based on the user' s preferences, and finally the result list is re-ranked according to the user' s interests based on the new proposed method named user-oriented PageRank (UOPR). Experiments on real life datasets show that the user-oriented search system that adopts PLSA takes considerable consideration of user preferences and better satisfies a user' s personalized information needs.展开更多
HIV/AIDS has brought to light the challenge of incorporating the many influences between living conditions, social characteristics and health services performance to an adequate care for PLWHA (people living with AID...HIV/AIDS has brought to light the challenge of incorporating the many influences between living conditions, social characteristics and health services performance to an adequate care for PLWHA (people living with AIDS). Vulnerability of these populations is under the responsibility of specialized care units whose assistance does not always occur according to their real needs and demands. Therefore, this study aimed to analyze demographic, social and clinical profiles of PLWHA, as well as their follow-up in SS (Specialized Health Services) in Ribeir^o Preto, Brazil. It is a descriptive study conducted by the application of structured questionnaires to 253 patients with HIV/AIDS in follow-up during the years of 2012-2013. Variables were analyzed by descriptive statistics procedures. The findings pointed out gender parity, aging population, low education and economic predominance of class C. Regarding clinical characteristics, there was a predominance of asymptomatic individuals, with no clinical manifestations of AIDS or major comorbidities. The main mode of transmission was through sexual contact. The results led to the need of adequating the assistance provided to the specificities inherent to PLWHA. The care provision should cross an interdisciplinary perspective, targeting recognition of problems and ensuring comprehensive health care adequate to users' needs and demands.展开更多
The basic idea behind a personalized web search is to deliver search results that are tailored to meet user needs, which is one of the growing concepts in web technologies. The personalized web search presented in thi...The basic idea behind a personalized web search is to deliver search results that are tailored to meet user needs, which is one of the growing concepts in web technologies. The personalized web search presented in this paper is based on exploiting the implicit feedbacks of user satisfaction during her web browsing history to construct a user profile storing the web pages the user is highly interested in. A weight is assigned to each page stored in the user’s profile;this weight reflects the user’s interest in this page. We name this weight the relative rank of the page, since it depends on the user issuing the query. Therefore, the ranking algorithm provided in this paper is based on the principle that;the rank assigned to a page is the addition of two rank values R_rank and A_rank. A_rank is an absolute rank, since it is fixed for all users issuing the same query, it only depends on the link structures of the web and on the keywords of the query. Thus, it could be calculated by the PageRank algorithm suggested by Brin and Page in 1998 and used by the google search engine. While, R_rank is the relative rank, it is calculated by the methods given in this paper which depends mainly on recording implicit measures of user satisfaction during her previous browsing history.展开更多
User profiles representing users’preferences and interests play an important role in many applications of personalized recommendation.With the rapid growth of social platforms,there is a critical need for efficient s...User profiles representing users’preferences and interests play an important role in many applications of personalized recommendation.With the rapid growth of social platforms,there is a critical need for efficient solutions to learn user profiles from the information they shared on social platforms so as to improve the quality of recommendation services.The problem of user profile learning is significantly challenging due to difficulty in handling data from multiple sources,in different formats and often associated with uncertainty.In this paper,we introduce an integrated approach that combines advanced Machine Learning techniques with evidential reasoning based on Dempster-Shafer theory of evidence for user profiling and recommendation.The developed methods for user profile learning and multi-criteria collaborative filtering are demonstrated with experimental results and analysis that show the effectiveness and practicality of the integrated approach.A proposal for extending multi-criteria recommendation systems by incorporating user profiles learned from different sources of data into the recommendation process so as to provide better recommendation capabilities is also highlighted.展开更多
User profiling by inferring user personality traits,such as age and gender,plays an increasingly important role in many real-world applications.Most existing methods for user profiling either use only one type of data...User profiling by inferring user personality traits,such as age and gender,plays an increasingly important role in many real-world applications.Most existing methods for user profiling either use only one type of data or ignore handling the noisy information of data.Moreover,they usually consider this problem from only one perspective.In this paper,we propose a joint user profiling model with hierarchical attention networks(JUHA)to learn informative user representations for user profiling.Our JUHA method does user profiling based on both inner-user and inter-user features.We explore inner-user features from user behaviors(e.g.,purchased items and posted blogs),and inter-user features from a user-user graph(where similar users could be connected to each other).JUHA learns basic sentence and bag representations from multiple separate sources of data(user behaviors)as the first round of data preparation.In this module,convolutional neural networks(CNNs)are introduced to capture word and sentence features of age and gender while the self-attention mechanism is exploited to weaken the noisy data.Following this,we build another bag which contains a user-user graph.Inter-user features are learned from this bag using propagation information between linked users in the graph.To acquire more robust data,inter-user features and other inner-user bag representations are joined into each sentence in the current bag to learn the final bag representation.Subsequently,all of the bag representations are integrated to lean comprehensive user representation by the self-attention mechanism.Our experimental results demonstrate that our approach outperforms several state-of-the-art methods and improves prediction performance.展开更多
A new method using support vector data description (SVDD) to distinguishlegitimate users from mas-queradcrs based on UNIX user command sequences is proposed Sliding windowsare used to get low detection delay. Experime...A new method using support vector data description (SVDD) to distinguishlegitimate users from mas-queradcrs based on UNIX user command sequences is proposed Sliding windowsare used to get low detection delay. Experiments demonstrate that the detection effect usingenriched sequences is better than that of using truncated sequences. As a SVDD profile is composedof a small amount of support vectors, our SVDD-based method can achieve computation and storageadvantage when the detection performance issimilar to existing method.展开更多
To solve the problem that traditional pull based information service can’t meet the demand of long term users getting domain information timely and properly, an adaptive and active computing paradigm (AACP) for per...To solve the problem that traditional pull based information service can’t meet the demand of long term users getting domain information timely and properly, an adaptive and active computing paradigm (AACP) for personalized information service in heterogeneous environment is proposed to provide user centered, push based higsh quality information service timely in a proper way, the motivation of which is generalized as R 4 Service: the right information at the right time in the right way to the right person, upon which formalized algorithms framework of adaptive user profile management, incremental information retrieval, information filtering, and active delivery mechanism are discussed in details. The AACP paradigm serves users in a push based, event driven, interest related, adaptive and active information service mode, which is useful and promising for long term user to gain fresh information instead of polling from kinds of information sources.展开更多
Personalization is the adaptation of the services to fit the user’s interests,characteristics and needs.The key to effective personalization is user profiling.Apart from traditional collaborative and content-based ap...Personalization is the adaptation of the services to fit the user’s interests,characteristics and needs.The key to effective personalization is user profiling.Apart from traditional collaborative and content-based approaches,a number of classification and clustering algorithms have been used to classify user related information to create user profiles.However,they are not able to achieve accurate user profiles.In this paper,we present a new clustering algorithm,namely Multi-Dimensional Clustering(MDC),to determine user profiling.The MDC is a version of the Instance-Based Learner(IBL)algorithm that assigns weights to feature values and considers these weights for the clustering.Three feature weight methods are proposed for the MDC and,all three,have been tested and evaluated.Simulations were conducted with using two sets of user profile datasets,which are the training(includes 10,000 instances)and test(includes 1000 instances)datasets.These datasets reflect each user’s personal information,preferences and interests.Additional simulations and comparisons with existing weighted and non-weighted instance-based algorithms were carried out in order to demonstrate the performance of proposed algorithm.Experimental results using the user profile datasets demonstrate that the proposed algorithm has better clustering accuracy performance compared to other algorithms.This work is based on the doctoral thesis of the corresponding author.展开更多
The Chinese Software Developer Network(CSDN)is one of the largest information technology communities and service platforms in China.This paper describes the user profiling for CSDN,an evaluation track of SMP Cup 2017....The Chinese Software Developer Network(CSDN)is one of the largest information technology communities and service platforms in China.This paper describes the user profiling for CSDN,an evaluation track of SMP Cup 2017.It contains three tasks:(1)user document keyphrase extraction,(2)user tagging and(3)user growth value prediction.In the first task,we treat keyphrase extraction as a classification problem and train a Gradient-Boosting-Decision-Tree model with comprehensive features.In the second task,to deal with class imbalance and capture the interdependency between classes,we propose a two-stage framework:(1)for each class,we train a binary classifier to model each class against all of the other classes independently;(2)we feed the output of the trained classifiers into a softmax classifier,tagging each user with multiple labels.In the third task,we propose a comprehensive architecture to predict user growth value.Our contributions in this paper are summarized as follows:(1)we extract various types of features to identify the key factors in user value growth;(2)we use the semi-supervised method and the stacking technique to extend labeled data sets and increase the generality of the trained model,resulting in an impressive performance in our experiments.In the competition,we achieved the first place out of 329 teams.展开更多
文摘To thoroughly understand market opportunity of freeze-dried facial mask and deeply get insight of consumers’usage behavior and needs,evaluate sensory feelings of 10 screened commercial freeze-dried facial mask products,group test products according to the differences of sensory attributions via Principal Component Analysis(PCA)and Agglomerative Hierarchical Clustering(AHC),pick up the representative products.Freeze-dried facial mask users evaluate satisfaction degree of picked up products and participate survey of usage behavior/cognition.Analyze consumer data by AHC to get consumer segmentations and their profile.The test results show that,sensory data and consumer data,which is from consumers test of screened representative products by performing PCA and AHC on sensory data,can be verified mutually.It is helpful to understand the needs of consumer segmentations and reason to buy by combining sensory data and consumer test.
文摘As social media and online activity continue to pervade all age groups, it serves as a crucial platform for sharing personal experiences and opinions as well as information about attitudes and preferences for certain interests or purchases. This generates a wealth of behavioral data, which, while invaluable to businesses, researchers, policymakers, and the cybersecurity sector, presents significant challenges due to its unstructured nature. Existing tools for analyzing this data often lack the capability to effectively retrieve and process it comprehensively. This paper addresses the need for an advanced analytical tool that ethically and legally collects and analyzes social media data and online activity logs, constructing detailed and structured user profiles. It reviews current solutions, highlights their limitations, and introduces a new approach, the Advanced Social Analyzer (ASAN), that bridges these gaps. The proposed solutions technical aspects, implementation, and evaluation are discussed, with results compared to existing methodologies. The paper concludes by suggesting future research directions to further enhance the utility and effectiveness of social media data analysis.
基金supported by Graduate Innovation Funds of Xi’an University of Finance and Economics(Nos.21YC037,22YCZ03)。
文摘With the increase in the aging population,the need for elderly care services has diversified,and smart elderly care has become an effective measure to cope with this increasing aging population.Based on the data from the platform“Guan Hu Tong”of RQ Company in the community of Shaanxi Province in western China,this study mined the data of smart elderly care services through the recency,frequency and monetary value(RFM)model and the backpropagation(BP)neural network model,constructed the user profile of the elderly,and predicted users’practical demands.The following conclusions were drawn:The oldest users are important target users of smart elderly care service platforms;Elderly women living alone rely more on smart elderly care services;Meal delivery and health follow-up services are the most popular among elderly users.
基金This work is supported by State Grid Science and Technology Project under Grant No.520613180002,62061318C002the Fundamental Research Funds for the Central Universities(Grant No.HIT.NSRIF.201714)+4 种基金Weihai Science and Technology Development Program(2016DXGJMS15)Key Research and Development Program in Shandong Provincial(2017GGX90103)Sanming Science and Technology Project,Grant No.2015-G-6,Shandong province vocational education educational reform research project.Grant No.2017209Study and Development of Smart Agriculture Control System Based on Spark Big Data Decision(2017N0029)Jiangsu Province industrial Communication Technology Application Technology Innovation Team Project.
文摘With the rapid development of the mobile Internet,users generate massive data in different forms in social network every day,and different characteristics of users are reflected by these social media data.How to integrate multiple heterogeneous information and establish user profiles from multiple perspectives plays an important role in providing personalized services,marketing,and recommendation systems.In this paper,we propose Multi-source&Multi-task Learning for User Profiles in Social Network which integrates multiple social data sources and contains a multi-task learning framework to simultaneously predict various attributes of a user.Firstly,we design their own feature extraction models for multiple heterogeneous data sources.Secondly,we design a shared layer to fuse multiple heterogeneous data sources as general shared representation for multi-task learning.Thirdly,we design each task’s own unique presentation layer for discriminant output of specific-task.Finally,we design a weighted loss function to improve the learning efficiency and prediction accuracy of each task.Our experimental results on more than 5000 Sina Weibo users demonstrate that our approach outperforms state-of-the-art baselines for inferring gender,age and region of social media users.
基金This work was supported by Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.2019-0-00231,Development of artificial intelligence based video security technology and systems for public infrastructure safety).
文摘Emotions of users do not converge in a single application but are scattered across diverse applications.Mobile devices are the closest media for handling user data and these devices have the advantage of integrating private user information and emotions spread over different applications.In this paper,we first analyze user profile on a mobile device by describing the problem of the user sentiment profile system in terms of data granularity,media diversity,and server-side solution.Fine-grained data requires additional data and structural analysis in mobile devices.Media diversity requires standard parameters to integrate user data from various applications.A server-side solution presents a potential risk when handling individual privacy information.Therefore,in order to overcome these problems,we propose a general-purposed user profile system based on sentiment analysis that extracts individual emotional preferences by comparing the difference between public and individual data based on particular features.The proposed system is built based on a sentiment hierarchy,which is created by using unstructured data on mobile devices.It can compensate for the concentration of single media,and analyze individual private data without the invasion of privacy on mobile devices.
文摘The user’s intent to seek online information has been an active area of research in user profiling.User profiling considers user characteristics,behaviors,activities,and preferences to sketch user intentions,interests,and motivations.Determining user characteristics can help capture implicit and explicit preferences and intentions for effective user-centric and customized content presentation.The user’s complete online experience in seeking information is a blend of activities such as searching,verifying,and sharing it on social platforms.However,a combination of multiple behaviors in profiling users has yet to be considered.This research takes a novel approach and explores user intent types based on multidimensional online behavior in information acquisition.This research explores information search,verification,and dissemination behavior and identifies diverse types of users based on their online engagement using machine learning.The research proposes a generic user profile template that explains the user characteristics based on the internet experience and uses it as ground truth for data annotation.User feedback is based on online behavior and practices collected by using a survey method.The participants include both males and females from different occupation sectors and different ages.The data collected is subject to feature engineering,and the significant features are presented to unsupervised machine learning methods to identify user intent classes or profiles and their characteristics.Different techniques are evaluated,and the K-Mean clustering method successfully generates five user groups observing different user characteristics with an average silhouette of 0.36 and a distortion score of 1136.Feature average is computed to identify user intent type characteristics.The user intent classes are then further generalized to create a user intent template with an Inter-Rater Reliability of 75%.This research successfully extracts different user types based on their preferences in online content,platforms,criteria,and frequency.The study also validates the proposed template on user feedback data through Inter-Rater Agreement process using an external human rater.
文摘With the growing popularity of the World Wide Web, large volume of useraccess data has been gathered automatically by Web servers and stored in Web logs. Discovering andunderstanding user behavior patterns from log files can provide Web personalized recommendationservices. In this paper, a novel clustering method is presented for log files called Clusteringlarge Weblog based on Key Path Model (CWKPM), which is based on user browsing key path model, to getuser behavior profiles. Compared with the previous Boolean model, key path model considers themajor features of users'' accessing to the Web: ordinal, contiguous and duplicate. Moreover, forclustering, it has fewer dimensions. The analysis and experiments show that CWKPM is an efficientand effective approach for clustering large and high-dimension Web logs.
基金the National Natural Science Foundation of China(Grant No.62002098)Natural Science Foundation of Hebei Province(F2020207001,F2019207061)+1 种基金the Scientific Research Projects of Hebei Education Department(QN2018116)the Research Foundation of Hebei University of Economics and Business(2018QZ04,2019JYQ08).
文摘User profiles are widely used in the age of big data.However,generating and releasing user profiles may cause serious privacy leakage,since a large number of personal data are collected and analyzed.In this paper,we propose a differentially private user profile construction method DP-UserPro,which is composed of DP-CLIQUE and privately top-κtags selection.DP-CLIQUE is a differentially private high dimensional data cluster algorithm based on CLIQUE.The multidimensional tag space is divided into cells,Laplace noises are added into the count value of each cell.Based on the breadth-first-search,the largest connected dense cells are clustered into a cluster.Then a privately top-κtags selection approach is proposed based on the score function of each tag,to select the most importantκtags which can represent the characteristics of the cluster.Privacy and utility of DP-UserPro are theoretically analyzed and experimentally evaluated in the last.Comparison experiments are carried out with Tag Suppression algorithm on two real datasets,to measure the False Negative Rate(FNR)and precision.The results show that DP-UserPro outperforms Tag Suppression by 62.5%in the best case and 14.25%in the worst case on FNR,and DP-UserPro is about 21.1%better on precision than that of Tag Suppression,in average.
基金supported in part by the Natural Science Foundation of Beijing(no.4212019,M22002)the National Natural Science Foundation of China(no.62172005)+1 种基金the Open Research Fund of Key Laboratory of Cryptography of Zhejiang Province(No.ZCL21014)the Foundation of Guizhou Provincial Key Laboratory of Public Big Data(no.2019BDKF JJ012)。
文摘User profile matching can establish social relationships between different users in the social network.If the user profile is matched in plaintext,the user's privacy might face a security challenge.Although there exist some schemes realizing privacypreserving user profile matching,the resource-limited users or social service providers in these schemes need to take higher computational complexity to ensure the privacy or matching of the data.To overcome the problems,a novel privacy-preserving user profile matching protocol in social networks is proposed by using t-out-of n servers and the bloom filter technique,in which the computational complexity of a user is reduced by applying the Chinese Remainder Theorem,the matching users can be found with the help of any t matching servers,and the privacy of the user profile is not compromised.Furthermore,if at most t-1 servers are allowed to collude,our scheme can still fulfill user profile privacy and user query privacy.Finally,the performance of the proposed scheme is compared with the other two schemes,and the results show that our scheme is superior to them.
基金This work is partially supported by the National Natural Science Foundation of China(Grant numbers:61502115,61602326,U1636103 and U1536207)the Fundamental Research Fund for the Central Universities(Grant numbers:3262017T12,3262017T18,3262018T02 and 3262018T58).
文摘With the popularity of social media,there has been an increasing interest in user profiling and its applications nowadays.This paper presents our system named UIR-SIST for User Profiling Technology Evaluation Campaign in SMP CUP 2017.UIR-SIST aims to complete three tasks,including keywords extraction from blogs,user interests labeling and user growth value prediction.To this end,we first extract keywords from a user’s blog,including the blog itself,blogs on the same topic and other blogs published by the same user.Then a unified neural network model is constructed based on a convolutional neural network(CNN)for user interests tagging.Finally,we adopt a stacking model for predicting user growth value.We eventually receive the sixth place with evaluation scores of 0.563,0.378 and 0.751 on the three tasks,respectively.
基金The National Natural Science Foundation of China(No60573090,60673139)
文摘In order to solve the problem that current search engines provide query-oriented searches rather than user-oriented ones, and that this improper orientation leads to the search engines' inability to meet the personalized requirements of users, a novel method based on probabilistic latent semantic analysis (PLSA) is proposed to convert query-oriented web search to user-oriented web search. First, a user profile represented as a user' s topics of interest vector is created by analyzing the user' s click through data based on PLSA, then the user' s queries are mapped into categories based on the user' s preferences, and finally the result list is re-ranked according to the user' s interests based on the new proposed method named user-oriented PageRank (UOPR). Experiments on real life datasets show that the user-oriented search system that adopts PLSA takes considerable consideration of user preferences and better satisfies a user' s personalized information needs.
文摘HIV/AIDS has brought to light the challenge of incorporating the many influences between living conditions, social characteristics and health services performance to an adequate care for PLWHA (people living with AIDS). Vulnerability of these populations is under the responsibility of specialized care units whose assistance does not always occur according to their real needs and demands. Therefore, this study aimed to analyze demographic, social and clinical profiles of PLWHA, as well as their follow-up in SS (Specialized Health Services) in Ribeir^o Preto, Brazil. It is a descriptive study conducted by the application of structured questionnaires to 253 patients with HIV/AIDS in follow-up during the years of 2012-2013. Variables were analyzed by descriptive statistics procedures. The findings pointed out gender parity, aging population, low education and economic predominance of class C. Regarding clinical characteristics, there was a predominance of asymptomatic individuals, with no clinical manifestations of AIDS or major comorbidities. The main mode of transmission was through sexual contact. The results led to the need of adequating the assistance provided to the specificities inherent to PLWHA. The care provision should cross an interdisciplinary perspective, targeting recognition of problems and ensuring comprehensive health care adequate to users' needs and demands.
文摘The basic idea behind a personalized web search is to deliver search results that are tailored to meet user needs, which is one of the growing concepts in web technologies. The personalized web search presented in this paper is based on exploiting the implicit feedbacks of user satisfaction during her web browsing history to construct a user profile storing the web pages the user is highly interested in. A weight is assigned to each page stored in the user’s profile;this weight reflects the user’s interest in this page. We name this weight the relative rank of the page, since it depends on the user issuing the query. Therefore, the ranking algorithm provided in this paper is based on the principle that;the rank assigned to a page is the addition of two rank values R_rank and A_rank. A_rank is an absolute rank, since it is fixed for all users issuing the same query, it only depends on the link structures of the web and on the keywords of the query. Thus, it could be calculated by the PageRank algorithm suggested by Brin and Page in 1998 and used by the google search engine. While, R_rank is the relative rank, it is calculated by the methods given in this paper which depends mainly on recording implicit measures of user satisfaction during her previous browsing history.
基金This work is supported by the University of Information Technology-Vietnam National University Ho Chi Minh City under grant No.D1-2023-10.
文摘User profiles representing users’preferences and interests play an important role in many applications of personalized recommendation.With the rapid growth of social platforms,there is a critical need for efficient solutions to learn user profiles from the information they shared on social platforms so as to improve the quality of recommendation services.The problem of user profile learning is significantly challenging due to difficulty in handling data from multiple sources,in different formats and often associated with uncertainty.In this paper,we introduce an integrated approach that combines advanced Machine Learning techniques with evidential reasoning based on Dempster-Shafer theory of evidence for user profiling and recommendation.The developed methods for user profile learning and multi-criteria collaborative filtering are demonstrated with experimental results and analysis that show the effectiveness and practicality of the integrated approach.A proposal for extending multi-criteria recommendation systems by incorporating user profiles learned from different sources of data into the recommendation process so as to provide better recommendation capabilities is also highlighted.
基金This work was supported in part by the National Key Research and Development Program of China(2016YFB1000901)Innovative Research Team in University of the Ministry of Education(IRT17R32)the National Natural Science Foundation of China(Grant Nos.91746209 and 61906060)。
文摘User profiling by inferring user personality traits,such as age and gender,plays an increasingly important role in many real-world applications.Most existing methods for user profiling either use only one type of data or ignore handling the noisy information of data.Moreover,they usually consider this problem from only one perspective.In this paper,we propose a joint user profiling model with hierarchical attention networks(JUHA)to learn informative user representations for user profiling.Our JUHA method does user profiling based on both inner-user and inter-user features.We explore inner-user features from user behaviors(e.g.,purchased items and posted blogs),and inter-user features from a user-user graph(where similar users could be connected to each other).JUHA learns basic sentence and bag representations from multiple separate sources of data(user behaviors)as the first round of data preparation.In this module,convolutional neural networks(CNNs)are introduced to capture word and sentence features of age and gender while the self-attention mechanism is exploited to weaken the noisy data.Following this,we build another bag which contains a user-user graph.Inter-user features are learned from this bag using propagation information between linked users in the graph.To acquire more robust data,inter-user features and other inner-user bag representations are joined into each sentence in the current bag to learn the final bag representation.Subsequently,all of the bag representations are integrated to lean comprehensive user representation by the self-attention mechanism.Our experimental results demonstrate that our approach outperforms several state-of-the-art methods and improves prediction performance.
基金Supported by the National Natural Science Foundation of China(90104005,66973034,60473023).
文摘A new method using support vector data description (SVDD) to distinguishlegitimate users from mas-queradcrs based on UNIX user command sequences is proposed Sliding windowsare used to get low detection delay. Experiments demonstrate that the detection effect usingenriched sequences is better than that of using truncated sequences. As a SVDD profile is composedof a small amount of support vectors, our SVDD-based method can achieve computation and storageadvantage when the detection performance issimilar to existing method.
文摘To solve the problem that traditional pull based information service can’t meet the demand of long term users getting domain information timely and properly, an adaptive and active computing paradigm (AACP) for personalized information service in heterogeneous environment is proposed to provide user centered, push based higsh quality information service timely in a proper way, the motivation of which is generalized as R 4 Service: the right information at the right time in the right way to the right person, upon which formalized algorithms framework of adaptive user profile management, incremental information retrieval, information filtering, and active delivery mechanism are discussed in details. The AACP paradigm serves users in a push based, event driven, interest related, adaptive and active information service mode, which is useful and promising for long term user to gain fresh information instead of polling from kinds of information sources.
文摘Personalization is the adaptation of the services to fit the user’s interests,characteristics and needs.The key to effective personalization is user profiling.Apart from traditional collaborative and content-based approaches,a number of classification and clustering algorithms have been used to classify user related information to create user profiles.However,they are not able to achieve accurate user profiles.In this paper,we present a new clustering algorithm,namely Multi-Dimensional Clustering(MDC),to determine user profiling.The MDC is a version of the Instance-Based Learner(IBL)algorithm that assigns weights to feature values and considers these weights for the clustering.Three feature weight methods are proposed for the MDC and,all three,have been tested and evaluated.Simulations were conducted with using two sets of user profile datasets,which are the training(includes 10,000 instances)and test(includes 1000 instances)datasets.These datasets reflect each user’s personal information,preferences and interests.Additional simulations and comparisons with existing weighted and non-weighted instance-based algorithms were carried out in order to demonstrate the performance of proposed algorithm.Experimental results using the user profile datasets demonstrate that the proposed algorithm has better clustering accuracy performance compared to other algorithms.This work is based on the doctoral thesis of the corresponding author.
基金The work is supported by the National Natural Science Foundation of China(NSFC)under grant numbers 61472400,91746301 and 61802371H.Shen is also funded by K.C.Wong Education Foundation and the Youth Innovation Promotion Association of the Chinese Academy of Sciences.
文摘The Chinese Software Developer Network(CSDN)is one of the largest information technology communities and service platforms in China.This paper describes the user profiling for CSDN,an evaluation track of SMP Cup 2017.It contains three tasks:(1)user document keyphrase extraction,(2)user tagging and(3)user growth value prediction.In the first task,we treat keyphrase extraction as a classification problem and train a Gradient-Boosting-Decision-Tree model with comprehensive features.In the second task,to deal with class imbalance and capture the interdependency between classes,we propose a two-stage framework:(1)for each class,we train a binary classifier to model each class against all of the other classes independently;(2)we feed the output of the trained classifiers into a softmax classifier,tagging each user with multiple labels.In the third task,we propose a comprehensive architecture to predict user growth value.Our contributions in this paper are summarized as follows:(1)we extract various types of features to identify the key factors in user value growth;(2)we use the semi-supervised method and the stacking technique to extend labeled data sets and increase the generality of the trained model,resulting in an impressive performance in our experiments.In the competition,we achieved the first place out of 329 teams.