期刊文献+
共找到16篇文章
< 1 >
每页显示 20 50 100
Semi-supervised learning based probabilistic latent semantic analysis for automatic image annotation 被引量:1
1
作者 田东平 《High Technology Letters》 EI CAS 2017年第4期367-374,共8页
In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficie... In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation. 展开更多
关键词 automatic image annotation semi-supervised learning probabilistic latent semantic analysis(PLSA) transductive support vector machine(TSVM) image segmentation image retrieval
下载PDF
Metaphor Analysis Method Based on Latent Semantic Analysis
2
作者 陶然 卫亚萍 杨唐峰 《Journal of Donghua University(English Edition)》 CAS 2021年第1期83-90,共8页
Current research on metaphor analysis is generally knowledge-based and corpus-based,which calls for methods of automatic feature extraction and weight calculation.Combining natural language processing(NLP),latent sema... Current research on metaphor analysis is generally knowledge-based and corpus-based,which calls for methods of automatic feature extraction and weight calculation.Combining natural language processing(NLP),latent semantic analysis(LSA),and Pearson correlation coefficient,this paper proposes a metaphor analysis method for extracting the content words from both literal and metaphorical corpus,calculating correlation degree,and analyzing their relationships.The value of the proposed method was demonstrated through a case study by using a corpus with keyword“飞翔(fly)”.When compared with the method of Pearson correlation coefficient,the experiment shows that the LSA can produce better results with greater significance in correlation degree.It is also found that the number of common words that appeared in both literal and metaphorical word bags decreased with the correlation degree.The case study also revealed that there are more nouns appear in literal corpus,and more adjectives and adverbs appear in metaphorical corpus.The method proposed will benefit NLP researchers to develop the required step-by-step calculation tools for accurate quantitative analysis. 展开更多
关键词 latent semantic analysis(LSA) METAPHOR natural language processing(NLP) pearson correlation coefficient
下载PDF
Generating Markov Logic Networks Rulebase Based on Probabilistic Latent Semantics Analysis
3
作者 Shan Cui Tao Zhu +3 位作者 Xiao Zhang Liming Chen Lingfeng Mao Huansheng Ning 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第5期952-964,共13页
Human Activity Recognition(HAR)has become a subject of concern and plays an important role in daily life.HAR uses sensor devices to collect user behavior data,obtain human activity information and identify them.Markov... Human Activity Recognition(HAR)has become a subject of concern and plays an important role in daily life.HAR uses sensor devices to collect user behavior data,obtain human activity information and identify them.Markov Logic Networks(MLN)are widely used in HAR as an effective combination of knowledge and data.MLN can solve the problems of complexity and uncertainty,and has good knowledge expression ability.However,MLN structure learning is relatively weak and requires a lot of computing and storage resources.Essentially,the MLN structure is derived from sensor data in the current scene.Assuming that the sensor data can be effectively sliced and the sliced data can be converted into semantic rules,MLN structure can be obtained.To this end,we propose a rulebase building scheme based on probabilistic latent semantic analysis to provide a semantic rulebase for MLN learning.Such a rulebase can reduce the time required for MLN structure learning.We apply the rulebase building scheme to single-person indoor activity recognition and prove that the scheme can effectively reduce the MLN learning time.In addition,we evaluate the parameters of the rulebase building scheme to check its stability. 展开更多
关键词 Markov Logic Network(MLN) structure learning rulebase construction probabilistic latent semantics
原文传递
A Two-Stage Feature Selection Method for Text Categorization by Using Category Correlation Degree and Latent Semantic Indexing 被引量:2
4
作者 王飞 李彩虹 +2 位作者 王景山 徐娇 李廉 《Journal of Shanghai Jiaotong university(Science)》 EI 2015年第1期44-50,共7页
With the purpose of improving the accuracy of text categorization and reducing the dimension of the feature space,this paper proposes a two-stage feature selection method based on a novel category correlation degree(C... With the purpose of improving the accuracy of text categorization and reducing the dimension of the feature space,this paper proposes a two-stage feature selection method based on a novel category correlation degree(CCD)method and latent semantic indexing(LSI).In the first stage,a novel CCD method is proposed to select the most effective features for text classification,which is more effective than the traditional feature selection method.In the second stage,document representation requires a high dimensionality of the feature space and does not take into account the semantic relation between features,which leads to a poor categorization accuracy.So LSI method is proposed to solve these problems by using statistically derived conceptual indices to replace the individual terms which can discover the important correlative relationship between features and reduce the feature space dimension.Firstly,each feature in our algorithm is ranked depending on their importance of classification using CCD method.Secondly,we construct a new semantic space based on LSI method among features.The experimental results have proved that our method can reduce effectively the dimension of text vector and improve the performance of text categorization. 展开更多
关键词 text categorization feature selection latent semantic indexing(LSI) category correlation degree(CCD)
原文传递
Learning Dual-Layer User Representation for Enhanced Item Recommendation
5
作者 Fuxi Zhu Jin Xie Mohammed Alshahrani 《Computers, Materials & Continua》 SCIE EI 2024年第7期949-971,共23页
User representation learning is crucial for capturing different user preferences,but it is also critical challenging because user intentions are latent and dispersed in complex and different patterns of user-generated... User representation learning is crucial for capturing different user preferences,but it is also critical challenging because user intentions are latent and dispersed in complex and different patterns of user-generated data,and thus cannot be measured directly.Text-based data models can learn user representations by mining latent semantics,which is beneficial to enhancing the semantic function of user representations.However,these technologies only extract common features in historical records and cannot represent changes in user intentions.However,sequential feature can express the user’s interests and intentions that change time by time.But the sequential recommendation results based on the user representation of the item lack the interpretability of preference factors.To address these issues,we propose in this paper a novel model with Dual-Layer User Representation,named DLUR,where the user’s intention is learned based on two different layer representations.Specifically,the latent semantic layer adds an interactive layer based on Transformer to extract keywords and key sentences in the text and serve as a basis for interpretation.The sequence layer uses the Transformer model to encode the user’s preference intention to clarify changes in the user’s intention.Therefore,this dual-layer user mode is more comprehensive than a single text mode or sequence mode and can effectually improve the performance of recommendations.Our extensive experiments on five benchmark datasets demonstrate DLUR’s performance over state-of-the-art recommendation models.In addition,DLUR’s ability to explain recommendation results is also demonstrated through some specific cases. 展开更多
关键词 User representation latent semantic sequential feature INTERPRETABILITY
下载PDF
Semantic Based Greedy Levy Gradient Boosting Algorithm for Phishing Detection
6
作者 R.Sakunthala Jenni S.Shankar 《Computer Systems Science & Engineering》 SCIE EI 2022年第5期525-538,共14页
The detection of phishing and legitimate websites is considered a great challenge for web service providers because the users of such websites are indistinguishable.Phishing websites also create traffic in the entire ... The detection of phishing and legitimate websites is considered a great challenge for web service providers because the users of such websites are indistinguishable.Phishing websites also create traffic in the entire network.Another phishing issue is the broadening malware of the entire network,thus highlighting the demand for their detection while massive datasets(i.e.,big data)are processed.Despite the application of boosting mechanisms in phishing detection,these methods are prone to significant errors in their output,specifically due to the combination of all website features in the training state.The upcoming big data system requires MapReduce,a popular parallel programming,to process massive datasets.To address these issues,a probabilistic latent semantic and greedy levy gradient boosting(PLS-GLGB)algorithm for website phishing detection using MapReduce is proposed.A feature selection-based model is provided using a probabilistic intersective latent semantic preprocessing model to minimize errors in website phishing detection.Here,the missing data in each URL are identified and discarded for further processing to ensure data quality.Subsequently,with the preprocessed features(URLs),feature vectors are updated by the greedy levy divergence gradient(model)that selects the optimal features in the URL and accurately detects the websites.Thus,greedy levy efficiently differentiates between phishing websites and legitimate websites.Experiments are conducted using one of the largest public corpora of a website phish tank dataset.Results show that the PLS-GLGB algorithm for website phishing detection outperforms stateof-the-art phishing detection methods.Significant amounts of phishing detection time and errors are also saved during the detection of website phishing. 展开更多
关键词 Web service providers probabilistic intersective latent semantic greedy levy DIVERGENCE gradient phishing detection big data
下载PDF
Online belief propagation algorithm for probabilistic latent semantic analysis 被引量:2
7
作者 Yun YE Shengrong GONG +3 位作者 Chunping LIU Jia ZENG Ning JIA YiZHANG 《Frontiers of Computer Science》 SCIE EI CSCD 2013年第4期526-535,共10页
Probabilistic latent semantic analysis (PLSA) is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, th... Probabilistic latent semantic analysis (PLSA) is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, the required memory size grows linearly with the data size, and handling massive data streams is very difficult. To process big data streams, we propose an online belief propagation (OBP) algorithm based on the improved factor graph representation for PLSA. The factor graph of PLSA facilitates the classic belief propagation (BP) algorithm. Furthermore, OBP splits the data stream into a set of small segments, and uses the estimated parameters of previous segments to calculate the gradient descent of the current segment. Because OBP removes each segment from memory after processing, it is memoryefficient for big data streams. We examine the performance of OBP on four document data sets, and demonstrate that OBP is competitive in both speed and accuracy for online ex- pectation maximization (OEM) in PLSA, and can also give a more accurate topic evolution. Experiments on massive data streams from Baidu further confirm the effectiveness of the OBP algorithm. 展开更多
关键词 probabilistic latent semantic analysis topicmodels expectation maximization belief propagation
原文传递
Understanding Research Trends in Android Malware Research Using Information Modelling Techniques 被引量:1
8
作者 Jaiteg Singh Tanya Gera +3 位作者 Farman Ali Deepak Thakur Karamjeet Singh Kyung-sup Kwak 《Computers, Materials & Continua》 SCIE EI 2021年第3期2655-2670,共16页
Android has been dominating the smartphone market for more than a decade and has managed to capture 87.8%of the market share.Such popularity of Android has drawn the attention of cybercriminals and malware developers.... Android has been dominating the smartphone market for more than a decade and has managed to capture 87.8%of the market share.Such popularity of Android has drawn the attention of cybercriminals and malware developers.The malicious applications can steal sensitive information like contacts,read personal messages,record calls,send messages to premium-rate numbers,cause financial loss,gain access to the gallery and can access the user’s geographic location.Numerous surveys on Android security have primarily focused on types of malware attack,their propagation,and techniques to mitigate them.To the best of our knowledge,Android malware literature has never been explored using information modelling techniques.Further,promulgation of contemporary research trends in Android malware research has never been done from semantic point of view.This paper intends to identify intellectual core from Android malware literature using Latent Semantic Analysis(LSA).An extensive corpus of 843 articles on Android malware and security,published during 2009–2019,were processed using LSA.Subsequently,the truncated singular Value Decomposition(SVD)technique was used for dimensionality reduction.Later,machine learning methods were deployed to effectively segregate prominent topic solutions with minimal bias.Apropos to observed term and document loading matrix values,this five core research areas and twenty research trends were identified.Further,potential future research directions have been detailed to offer a quick reference for information scientists.The study concludes to the fact that Android security is crucial for pervasive Android devices.Static analysis is the most widely investigated core area within Android security research and is expected to remain in trend in near future.Research trends indicate the need for a faster yet effective model to detect Android applications causing obfuscation,financial attacks and stealing user information. 展开更多
关键词 Android security research trends latent semantic analysis VULNERABILITIES MALWARE machine learning CLUSTERING
下载PDF
Complex human activities recognition using interval temporal syntactic model 被引量:1
9
作者 夏利民 韩芬 王军 《Journal of Central South University》 SCIE EI CAS CSCD 2016年第10期2578-2586,共9页
A novel method based on interval temporal syntactic model was proposed to recognize human activities in video flow. The method is composed of two parts: feature extract and activities recognition. Trajectory shape des... A novel method based on interval temporal syntactic model was proposed to recognize human activities in video flow. The method is composed of two parts: feature extract and activities recognition. Trajectory shape descriptor, speeded up robust features(SURF) and histograms of optical flow(HOF) were proposed to represent human activities, which provide more exhaustive information to describe human activities on shape, structure and motion. In the process of recognition, a probabilistic latent semantic analysis model(PLSA) was used to recognize sample activities at the first step. Then, an interval temporal syntactic model, which combines the syntactic model with the interval algebra to model the temporal dependencies of activities explicitly, was introduced to recognize the complex activities with a time relationship. Experiments results show the effectiveness of the proposed method in comparison with other state-of-the-art methods on the public databases for the recognition of complex activities. 展开更多
关键词 trajectory shape descriptor speeded up robust features(SURF) histograms of optical flow(HOF) PLSA probabilistic latent semantic analysis syntactic model
下载PDF
Designing an automated FAQ answering system for farmers based on hybrid strategies 被引量:1
10
作者 Junliang ZHANG Xuefang ZHU Guang ZHU 《Chinese Journal of Library and Information Science》 2012年第4期21-36,共16页
Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based... Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based on hybrid strategies.Design/methodology/approach: We analyzed the factors influencing the successful matching between a user's question and a question-answer(QA) pair in the FAQ database. Our approach is based on a combination of multiple factors. Experiments were conducted to test the performance of our method.Findings: Experiments show that this proposed method has higher accuracy. Compared with similarity calculation based on TF-IDF,the sentence surface forms and the semantic relations,the proposed method based on hybrid strategies has a superior performance in precision,recall and F-measure value.Research limitations: The FAQ answering system is only capable of meeting users' demand for text retrieval at present. In the future,the system needs to be improved to meet users' demand for retrieving images and videos.Practical implications: This FAQ answering system will help farmers utilize agricultural information resources more efficiently.Originality/value: We design the algorithms for calculating similarity of Chinese sentences based on hybrid strategies,which integrate the question surface similarity,the question semantic similarity and the question-answer similarity based on latent semantic analysis(LSA) to find answers to a user's question. 展开更多
关键词 Frequently asked question(FAQ)answering system Sentence surface similarity semantic similarity latent semantic analysis(LSA) Similarity computation based on hybrid strategies FAQ answering system for farmers
下载PDF
A LSA Based Image Classification Framework Utilizing Relative Spatial Arrangement
11
作者 Chen Guo Campbell Wilson Samar Zutshi 《Journal of Electronic Science and Technology》 CAS 2012年第2期119-123,共5页
This paper focuses on the problem of automatic image classification (AIC) by proposing a framework based on latent semantic analysis (LSA) and image region pairs. The novel framework employs relative spatial arran... This paper focuses on the problem of automatic image classification (AIC) by proposing a framework based on latent semantic analysis (LSA) and image region pairs. The novel framework employs relative spatial arrangements for region pairs as the primary feature to capture semantics. The significance of this paper is twofold. Firstly, to the best our knowledge, this is the first study of the influence of region pairs as well as their relative spatial information in latent semantic analysis as applied to automatic image classification. Secondly, our proposed method for using the relative spatial information of region pairs show great promise in improving image semantic classi- fication compared with the classical latent semantic analysis method and 2D string representation algorithm. 展开更多
关键词 AUTOMATIC image classification latent semantic analysis spatial relationship.
下载PDF
Semantic-Oriented Knowledge Transfer for Review Rating 被引量:1
12
作者 王波 张宁 +2 位作者 林泉 陈松灿 李玉华 《Tsinghua Science and Technology》 SCIE EI CAS 2010年第6期633-641,共9页
With the rapid development of Web 2.0, more and more people are sharing their opinions about online products, so there is much product review data. However, it is difficult to compare products directly using ratings b... With the rapid development of Web 2.0, more and more people are sharing their opinions about online products, so there is much product review data. However, it is difficult to compare products directly using ratings because many ratings are based on different scales or ratings are even missing. This paper addresses the following question: given textual reviews, how can we automatically determine the semantic orientations of reviewers and then rank different items? Due to the absence of ratings in many reviews, it is difficult to collect sufficient rating data for certain specific categories of products (e.g., movies), but it is easier to find rating data in another different but related category (e.g., books). We refer to this problem as transfer rating, and try to train a better ranking model for items in the interested category with the help of rating data from another related category. Specifically, we developed a ranking-oriented method called TRate for determining the semantic orientations and for ranking different items and formulated it in a regularized algorithm for rating knowledge transfer by bridging the two related categories via a shared latent semantic space. Tests on the Epinion dataset verified its effectiveness. 展开更多
关键词 review rating latent semantic space transfer rating
原文传递
Re-examining urban region and inferring regional function based on spatial-temporal interaction 被引量:1
13
作者 Haiyan Tao Keli Wang +1 位作者 Li Zhuo Xuliang Li 《International Journal of Digital Earth》 SCIE EI 2019年第3期293-310,共18页
Urban system is shaped by the interactions between different regions and regions planned by the government,then reshaped by human activities and residents’needs.Understanding the changes of regional structure and dyn... Urban system is shaped by the interactions between different regions and regions planned by the government,then reshaped by human activities and residents’needs.Understanding the changes of regional structure and dynamics of city function based on the residents’movement demand are important to evaluate and adjust the planning and management of urban services and internal structures.This paper constructed a probabilistic factor model on the basis of probabilistic latent semantic analysis and tensor decomposition,for purpose of understanding the higher order interactive population mobility and its impact on urban structure changes.First,a four-dimensional tensor of time(T)×week(W)×origin(O)×destination(D)was constructed to identify the day-to-day activities in three time modes and weekly regularity of weekday/weekend pattern.Then we reclassified the urban regions based on the space clustering formed by the space factor matrix and core tensor.Finally,we further analysed the space–time interaction on different time scales to deduce the actual function and connection strength of each region.Our research shows that the application of individual-based spatial–temporal data in human mobility and space–time interaction study can help to analyse urban spatial structure and understand the actual regional function from a new perspective. 展开更多
关键词 Tensor decomposition probabilistic latent semantic analysis TAXI space–time administrative district
原文传递
Do people communicate about their whereabouts? Investigating the relation between user-generated text messages and Foursquare check-in places
14
作者 Ming Li Rene Westerholt Alexander Zipf 《Geo-Spatial Information Science》 SCIE CSCD 2018年第3期159-172,共14页
The social functionality of places(e.g.school,restaurant)partly determines human behaviors and reflects a region’s functional configuration.Semantic descriptions of places are thus valuable to a range of studies of h... The social functionality of places(e.g.school,restaurant)partly determines human behaviors and reflects a region’s functional configuration.Semantic descriptions of places are thus valuable to a range of studies of humans and geographic spaces.Assuming their potential impacts on human verbalization behaviors,one possibility is to link the functions of places to verbal representations such as users’postings in location-based social networks(LBSNs).In this study,we examine whether the heterogeneous user-generated text snippets found in LBSNs reliably reflect the semantic concepts attached with check-in places.We investigate Foursquare because its available categorization hierarchy provides rich a-priori semantic knowledge about its check-in places,which enables a reliable verification of the semantic concepts identified fromuser-generated text snippets.A latent semantic analysis is conducted on a large Foursquare check-in dataset.The results confirm that attached text messages can represent semantic concepts by demonstrating their large correspondence to the official Foursquare venue categorization.To further elaborate the representativeness of text messages,this work also performs an investigation on the textual terms to quantify their abilities of representing semantic concepts(i.e.,representativeness),and another investigation on semantic concepts to quantify how well they can be represented by text messages(i.e.,representability).The results shed light on featured terms with strong locational characteristics,as well as on distinctive semantic concepts with potentially strong impacts on human verbalizations. 展开更多
关键词 Text mining latent semantic analysis semantic concepts Foursquare location-based social network(LBSN)
原文传递
The effects of person-organization fit on lending behaviors:Empirical evidence from Kiva
15
作者 Hongke Zhao Xiaopei Liu +2 位作者 Xi Zhang Yinyue Wei Chunli Liu 《Journal of Management Science and Engineering》 2022年第1期133-145,共13页
Donation-based crowdfunding,as part of impact investment,plays a vital role in promoting economic development and alleviating poverty.In order to enhance the lender's enthusiasm for lending behavior,some platforms... Donation-based crowdfunding,as part of impact investment,plays a vital role in promoting economic development and alleviating poverty.In order to enhance the lender's enthusiasm for lending behavior,some platforms,for example Kiva,have introduced groups to facilitate lending.This study examines how the group environment can affect the lenders’behaviors in crowdfunding.It has been found that lenders who join groups contribute 1.2 more loans(about$30-$42)per month than those who do not,but the theoretical mechanism of these differences is unclear.To understand in depth how the group environment affects lending behaviors,we introduce and develop the PersonOrganization fit theory and Free-rider theory in this study.Combining machine-learning techniques with empirical analysis,the results show that the matching degree of motivation between group and lender has a positive effect on the lender behavior,i.e.,lending to loans,and this relationship is weakened by free-riding in large groups.In addition,the group openness can have different effects on lenders of different group sizes.Our research enriches the existing crowdfunding literature and fills the gap in the research on new lending models in crowdfunding,and it will also be useful for crowdfunding platforms in setting the rules for building groups. 展开更多
关键词 CROWDFUNDING Group environment Lending behavior latent semantic indexing Empirical analysis Person-organization fit theory
原文传递
Question-answering system based on concepts and statistics
16
作者 LIN Hongfei YANG Zhihao ZHAO Jing 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2007年第1期23-28,共6页
Question-answering systems provide short answers with the use of available information.The implementation mechanism for a question answering system is presented in this paper and is based on concepts and statistics.Th... Question-answering systems provide short answers with the use of available information.The implementation mechanism for a question answering system is presented in this paper and is based on concepts and statistics.The system determines the question and focuses on the answer types,making different conceptual expansions for different questions.It applies the latent semantic indexing(LSI)method to retrieve relevant passages.It uses matching algorithms to find a match between questions and sentences stored in a database.It also extracts answers from a frequently asked questions(FAQ)database by finding matching or similar sentences.The answering ability of the system has been improved with the use of LSI and FAQ.The question-answering system introduced in Chinese universities is a developed and proven system capable of precise results. 展开更多
关键词 question-answering system concept expansion latent semantic analysis similarity of sentence passage match
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部