To further enhance the efficiencies of search engines,achieving capabilities of searching,indexing and locating the information in the deep web,latent semantic analysis is a simple and effective way.Through the latent...To further enhance the efficiencies of search engines,achieving capabilities of searching,indexing and locating the information in the deep web,latent semantic analysis is a simple and effective way.Through the latent semantic analysis of the attributes in the query interfaces and the unique entrances of the deep web sites,the hidden semantic structure information can be retrieved and dimension reduction can be achieved to a certain extent.Using this semantic structure information,the contents in the site can be inferred and the similarity measures among sites in deep web can be revised.Experimental results show that latent semantic analysis revises and improves the semantic understanding of the query form in the deep web,which overcomes the shortcomings of the keyword-based methods.This approach can be used to effectively search the most similar site for any given site and to obtain a site list which conforms to the restrictions one specifies.展开更多
In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficie...In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.展开更多
Current research on metaphor analysis is generally knowledge-based and corpus-based,which calls for methods of automatic feature extraction and weight calculation.Combining natural language processing(NLP),latent sema...Current research on metaphor analysis is generally knowledge-based and corpus-based,which calls for methods of automatic feature extraction and weight calculation.Combining natural language processing(NLP),latent semantic analysis(LSA),and Pearson correlation coefficient,this paper proposes a metaphor analysis method for extracting the content words from both literal and metaphorical corpus,calculating correlation degree,and analyzing their relationships.The value of the proposed method was demonstrated through a case study by using a corpus with keyword“飞翔(fly)”.When compared with the method of Pearson correlation coefficient,the experiment shows that the LSA can produce better results with greater significance in correlation degree.It is also found that the number of common words that appeared in both literal and metaphorical word bags decreased with the correlation degree.The case study also revealed that there are more nouns appear in literal corpus,and more adjectives and adverbs appear in metaphorical corpus.The method proposed will benefit NLP researchers to develop the required step-by-step calculation tools for accurate quantitative analysis.展开更多
加密型勒索软件通过加密用户文件来勒索赎金.现有的基于第一条加密应用编程接口(Application Programming Interface,API)的早期检测方法无法在勒索软件执行加密行为前将其检出.由于不同家族的勒索软件开始执行其加密行为的时刻各不相同...加密型勒索软件通过加密用户文件来勒索赎金.现有的基于第一条加密应用编程接口(Application Programming Interface,API)的早期检测方法无法在勒索软件执行加密行为前将其检出.由于不同家族的勒索软件开始执行其加密行为的时刻各不相同,现有的基于固定时间阈值的早期检测方法仅能将少量勒索软件在其执行加密行为前准确检出.为进一步提升勒索软件检测的及时性,本文在分析多款勒索软件运行初期调用动态链接库(Dynamic Link Library,DLL)和API行为的基础上,提出了一个表征软件从开始运行到首次调用加密相关DLL之间的时间段的概念——运行初始阶段(Initial Phase of Operation,IPO),并提出了一个以软件在IPO内产生的API序列为检测对象的勒索软件早期检测方法,即基于API潜在语义的勒索软件早期检测方法(Ransomware Early Detection Method based on API Latent Semantics,REDMALS).REDMALS采集IPO内的API序列后,采用TF-IDF(Term Frequency-Inverse Document Frequency)算法以及潜在语义分析(Latent Semantic Analysis,LSA)算法对采集的API序列生成特征向量及提取潜在的语义结构,再运用机器学习算法构建检测模型用于勒索软件检测.实验结果显示运用随机森林算法的REDMALS在构建的变种测试集和未知测试集上可分别获得97.7%、96.0%的准确率,且两个测试集中83%和76%的勒索软件样本可在其执行加密行为前被检出.展开更多
文摘To further enhance the efficiencies of search engines,achieving capabilities of searching,indexing and locating the information in the deep web,latent semantic analysis is a simple and effective way.Through the latent semantic analysis of the attributes in the query interfaces and the unique entrances of the deep web sites,the hidden semantic structure information can be retrieved and dimension reduction can be achieved to a certain extent.Using this semantic structure information,the contents in the site can be inferred and the similarity measures among sites in deep web can be revised.Experimental results show that latent semantic analysis revises and improves the semantic understanding of the query form in the deep web,which overcomes the shortcomings of the keyword-based methods.This approach can be used to effectively search the most similar site for any given site and to obtain a site list which conforms to the restrictions one specifies.
基金Supported by the National Program on Key Basic Research Project(No.2013CB329502)the National Natural Science Foundation of China(No.61202212)+1 种基金the Special Research Project of the Educational Department of Shaanxi Province of China(No.15JK1038)the Key Research Project of Baoji University of Arts and Sciences(No.ZK16047)
文摘In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.
基金Fundamental Research Funds for the Central Universities of Ministry of Education of China(No.19D111201)。
文摘Current research on metaphor analysis is generally knowledge-based and corpus-based,which calls for methods of automatic feature extraction and weight calculation.Combining natural language processing(NLP),latent semantic analysis(LSA),and Pearson correlation coefficient,this paper proposes a metaphor analysis method for extracting the content words from both literal and metaphorical corpus,calculating correlation degree,and analyzing their relationships.The value of the proposed method was demonstrated through a case study by using a corpus with keyword“飞翔(fly)”.When compared with the method of Pearson correlation coefficient,the experiment shows that the LSA can produce better results with greater significance in correlation degree.It is also found that the number of common words that appeared in both literal and metaphorical word bags decreased with the correlation degree.The case study also revealed that there are more nouns appear in literal corpus,and more adjectives and adverbs appear in metaphorical corpus.The method proposed will benefit NLP researchers to develop the required step-by-step calculation tools for accurate quantitative analysis.
文摘加密型勒索软件通过加密用户文件来勒索赎金.现有的基于第一条加密应用编程接口(Application Programming Interface,API)的早期检测方法无法在勒索软件执行加密行为前将其检出.由于不同家族的勒索软件开始执行其加密行为的时刻各不相同,现有的基于固定时间阈值的早期检测方法仅能将少量勒索软件在其执行加密行为前准确检出.为进一步提升勒索软件检测的及时性,本文在分析多款勒索软件运行初期调用动态链接库(Dynamic Link Library,DLL)和API行为的基础上,提出了一个表征软件从开始运行到首次调用加密相关DLL之间的时间段的概念——运行初始阶段(Initial Phase of Operation,IPO),并提出了一个以软件在IPO内产生的API序列为检测对象的勒索软件早期检测方法,即基于API潜在语义的勒索软件早期检测方法(Ransomware Early Detection Method based on API Latent Semantics,REDMALS).REDMALS采集IPO内的API序列后,采用TF-IDF(Term Frequency-Inverse Document Frequency)算法以及潜在语义分析(Latent Semantic Analysis,LSA)算法对采集的API序列生成特征向量及提取潜在的语义结构,再运用机器学习算法构建检测模型用于勒索软件检测.实验结果显示运用随机森林算法的REDMALS在构建的变种测试集和未知测试集上可分别获得97.7%、96.0%的准确率,且两个测试集中83%和76%的勒索软件样本可在其执行加密行为前被检出.
文摘提出了一种基于WordNet本体标注和概率潜在语义分析(PLSA,ProbabilisticLatent Semantic Analysis)的语义Web服务发现方法OntoPLSA.首先使用WordNet本体标注Web服务的操作名、参数以及用户请求,以经过标注后的输出参数集合为词汇集,服务描述文档集合为文档集,组成词汇-文档矩阵,以该矩阵为输入,使用PLSA方法对服务集进行分类,并将用户请求带入PLSA模型,确定其所属的类;然后在类中以标注后的输出参数为键,含有这个输出的服务的列表为键值,建立一个映射表,查找与用户请求的输出相似的映射表键,进而找出对应的键值,即服务列表;最后根据QoS(Quality of Service)和用户请求中的输入参数确定满足条件的服务结果集合.在415个Web服务组成的数据集上的测试结果表明,性能较其他方法有优势,召回率和R准确率也得到了改善.