In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficie...In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.展开更多
Social media platforms provide new value for markets and research companies.This article explores the use of social media data to enhance customer value propositions.The case study involves a company that develops wea...Social media platforms provide new value for markets and research companies.This article explores the use of social media data to enhance customer value propositions.The case study involves a company that develops wearable Internet of Things(IoT)devices and services for stress management.Netnography and semantic annotation for recognizing and categorizing the context of tweets are conducted to gain a better understanding of users’stress management practices.The aim is to analyze the tweets about stress management practices and to identify the context from the tweets.Thereafter,we map the tweets on pleasure and arousal to elicit customer insights.We analyzed a case study of a marketing strategy on the Twitter platform.Participants in the marketing campaign shared photos and texts about their stress management practices.Machine learning techniques were used to evaluate and estimate the emotions and contexts of the tweets posted by the campaign participants.The computational semantic analysis of the tweets was compared to the text analysis of the tweets.The content analysis of only tweet images resulted in 96%accuracy in detecting tweet context,while that of the textual content of tweets yielded an accuracy of 91%.Semantic tagging by Ontotext was able to detect correct tweet context with an accuracy of 50%.展开更多
Current research on metaphor analysis is generally knowledge-based and corpus-based,which calls for methods of automatic feature extraction and weight calculation.Combining natural language processing(NLP),latent sema...Current research on metaphor analysis is generally knowledge-based and corpus-based,which calls for methods of automatic feature extraction and weight calculation.Combining natural language processing(NLP),latent semantic analysis(LSA),and Pearson correlation coefficient,this paper proposes a metaphor analysis method for extracting the content words from both literal and metaphorical corpus,calculating correlation degree,and analyzing their relationships.The value of the proposed method was demonstrated through a case study by using a corpus with keyword“飞翔(fly)”.When compared with the method of Pearson correlation coefficient,the experiment shows that the LSA can produce better results with greater significance in correlation degree.It is also found that the number of common words that appeared in both literal and metaphorical word bags decreased with the correlation degree.The case study also revealed that there are more nouns appear in literal corpus,and more adjectives and adverbs appear in metaphorical corpus.The method proposed will benefit NLP researchers to develop the required step-by-step calculation tools for accurate quantitative analysis.展开更多
Chinese Color Words two words have a higher degree turn and turn-grade class degree,they are "white"(white) and "black"(black),these two sets of words are generally located on human color perceptio...Chinese Color Words two words have a higher degree turn and turn-grade class degree,they are "white"(white) and "black"(black),these two sets of words are generally located on human color perception of the system the top three,we believe that the typical basic color terms most likely to turn and turn-grade class,but different history,culture and other aspects of cognition,cross-grammatical category they are different order.Based on this,in English and Chinese Basic Color Terms "Black" and "White" Cognitive Semantic Analysis of the research topic,this in-depth study of this aspect of the study hope to be beneficial to help.展开更多
A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed docume...A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.展开更多
Probabilistic latent semantic analysis (PLSA) is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, th...Probabilistic latent semantic analysis (PLSA) is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, the required memory size grows linearly with the data size, and handling massive data streams is very difficult. To process big data streams, we propose an online belief propagation (OBP) algorithm based on the improved factor graph representation for PLSA. The factor graph of PLSA facilitates the classic belief propagation (BP) algorithm. Furthermore, OBP splits the data stream into a set of small segments, and uses the estimated parameters of previous segments to calculate the gradient descent of the current segment. Because OBP removes each segment from memory after processing, it is memoryefficient for big data streams. We examine the performance of OBP on four document data sets, and demonstrate that OBP is competitive in both speed and accuracy for online ex- pectation maximization (OEM) in PLSA, and can also give a more accurate topic evolution. Experiments on massive data streams from Baidu further confirm the effectiveness of the OBP algorithm.展开更多
The study on binary code evolution is very crucial for understanding vulnerability repair and malicious code variants.Researchers on code evolution focus on the source code level,whereas very few works have been done ...The study on binary code evolution is very crucial for understanding vulnerability repair and malicious code variants.Researchers on code evolution focus on the source code level,whereas very few works have been done to tackle this problem at the binary code level.In this paper,a binary code evolution analysis framework is proposed to automatically locate evolution area and identify evolution semantic with concrete semantic difference.Difference of binary function domain was applied based on function similarity.Trace alignment was used to find evolution blocks,instruction classification semantic was utilized to identify evolution operation,and evolution semantic was extracted combined with function domain elements.The experimental results show that binary code evolution analysis framework can correctly locate binary code evolution area and identify all concrete semantic evolution.展开更多
Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Seman...Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Semantic Weighted Sum(NSWS) rule is established by defining a new feature of shots,semantic observation weight.The test video is detected based on the HMM and the NSWS rule,respectively.Finally,a fusion scheme based on logic distance is proposed and the detection results of the HMM and the NSWS rule are fused by optimal weights in the decision level,obtaining the final result.Experimental results indicate that the proposed method achieves 96.43% precision and 100% recall,which shows the effectiveness of this letter.展开更多
The main focus of the article is the semantic analysis and genesis of the words that create the lexical base of the modern Azerbaijani language to a certain extent and belong to the roots system of the language.The go...The main focus of the article is the semantic analysis and genesis of the words that create the lexical base of the modern Azerbaijani language to a certain extent and belong to the roots system of the language.The goal is to restore the words which have gone through deformation and flexion for thousands of years to their initial forms.The concept of stem cells in genetics has also been utilized as an analogy method because the author believes that languages are living organisms too and they have words and elements functioning as stem cells.Thus,the principal idea is that the linguistic units and words entering the organic system of a language are deprivations of the aforementioned linguistic stem cells.The stem words and concepts-the original elements of a language are determined in the first place and all the following analyses are built upon them.Such studies contain a wide range of comparativist investigations as well.Examples from the Ancient Greek and Latin languages have also been used as comparativism objects.Discovery of such words will not only give us linguistic information but also objective historical information on different aspects.This fact can be considered one of the main reasons making this kind of study very significant.展开更多
Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,wa...Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,water contamination,and general pollution.Environmental complaints represent the expressions of dissatisfaction with these issues.As the timeconsuming of managing a large number of complaints,text mining may be useful for automatically extracting information on stakeholder priorities and concerns.The paper used text mining and semantic network analysis to crawl relevant keywords about environmental complaints from two online complaint submission systems:online claim submission system of Regional Agency for Prevention,Environment and Energy(Arpae)(“Contact Arpae”);and Arpae's internal platform for environmental pollution(“Environmental incident reporting portal”)in the Emilia-Romagna Region,Italy.We evaluated the total of 2477 records and classified this information based on the claim topic(air pollution,water pollution,noise pollution,waste,odor,soil,weather-climate,sea-coast,and electromagnetic radiation)and geographical distribution.Then,this paper used natural language processing to extract keywords from the dataset,and classified keywords ranking higher in Term Frequency-Inverse Document Frequency(TF-IDF)based on the driver,pressure,state,impact,and response(DPSIR)framework.This study provided a systemic approach to understanding the interaction between people and environment in different geographical contexts and builds sustainable and healthy communities.The results showed that most complaints are from the public and associated with air pollution and odor.Factories(particularly foundries and ceramic industries)and farms are identified as the drivers of environmental issues.Citizen believed that environmental issues mainly affect human well-being.Moreover,the keywords of“odor”,“report”,“request”,“presence”,“municipality”,and“hours”were the most influential and meaningful concepts,as demonstrated by their high degree and betweenness centrality values.Keywords connecting odor(classified as impacts)and air pollution(classified as state)were the most important(such as“odor-burnt plastic”and“odor-acrid”).Complainants perceived odor annoyance as a primary environmental concern,possibly related to two main drivers:“odor-factory”and“odorsfarms”.The proposed approach has several theoretical and practical implications:text mining may quickly and efficiently address citizen needs,providing the basis toward automating(even partially)the complaint process;and the DPSIR framework might support the planning and organization of information and the identification of stakeholder concerns and priorities,as well as metrics and indicators for their assessment.Therefore,integration of the DPSIR framework with the text mining of environmental complaints might generate a comprehensive environmental knowledge base as a prerequisite for a wider exploitation of analysis to support decision-making processes and environmental management activities.展开更多
Android has been dominating the smartphone market for more than a decade and has managed to capture 87.8%of the market share.Such popularity of Android has drawn the attention of cybercriminals and malware developers....Android has been dominating the smartphone market for more than a decade and has managed to capture 87.8%of the market share.Such popularity of Android has drawn the attention of cybercriminals and malware developers.The malicious applications can steal sensitive information like contacts,read personal messages,record calls,send messages to premium-rate numbers,cause financial loss,gain access to the gallery and can access the user’s geographic location.Numerous surveys on Android security have primarily focused on types of malware attack,their propagation,and techniques to mitigate them.To the best of our knowledge,Android malware literature has never been explored using information modelling techniques.Further,promulgation of contemporary research trends in Android malware research has never been done from semantic point of view.This paper intends to identify intellectual core from Android malware literature using Latent Semantic Analysis(LSA).An extensive corpus of 843 articles on Android malware and security,published during 2009–2019,were processed using LSA.Subsequently,the truncated singular Value Decomposition(SVD)technique was used for dimensionality reduction.Later,machine learning methods were deployed to effectively segregate prominent topic solutions with minimal bias.Apropos to observed term and document loading matrix values,this five core research areas and twenty research trends were identified.Further,potential future research directions have been detailed to offer a quick reference for information scientists.The study concludes to the fact that Android security is crucial for pervasive Android devices.Static analysis is the most widely investigated core area within Android security research and is expected to remain in trend in near future.Research trends indicate the need for a faster yet effective model to detect Android applications causing obfuscation,financial attacks and stealing user information.展开更多
A novel method based on interval temporal syntactic model was proposed to recognize human activities in video flow. The method is composed of two parts: feature extract and activities recognition. Trajectory shape des...A novel method based on interval temporal syntactic model was proposed to recognize human activities in video flow. The method is composed of two parts: feature extract and activities recognition. Trajectory shape descriptor, speeded up robust features(SURF) and histograms of optical flow(HOF) were proposed to represent human activities, which provide more exhaustive information to describe human activities on shape, structure and motion. In the process of recognition, a probabilistic latent semantic analysis model(PLSA) was used to recognize sample activities at the first step. Then, an interval temporal syntactic model, which combines the syntactic model with the interval algebra to model the temporal dependencies of activities explicitly, was introduced to recognize the complex activities with a time relationship. Experiments results show the effectiveness of the proposed method in comparison with other state-of-the-art methods on the public databases for the recognition of complex activities.展开更多
Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based...Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based on hybrid strategies.Design/methodology/approach: We analyzed the factors influencing the successful matching between a user's question and a question-answer(QA) pair in the FAQ database. Our approach is based on a combination of multiple factors. Experiments were conducted to test the performance of our method.Findings: Experiments show that this proposed method has higher accuracy. Compared with similarity calculation based on TF-IDF,the sentence surface forms and the semantic relations,the proposed method based on hybrid strategies has a superior performance in precision,recall and F-measure value.Research limitations: The FAQ answering system is only capable of meeting users' demand for text retrieval at present. In the future,the system needs to be improved to meet users' demand for retrieving images and videos.Practical implications: This FAQ answering system will help farmers utilize agricultural information resources more efficiently.Originality/value: We design the algorithms for calculating similarity of Chinese sentences based on hybrid strategies,which integrate the question surface similarity,the question semantic similarity and the question-answer similarity based on latent semantic analysis(LSA) to find answers to a user's question.展开更多
Detection of personality using emotions is a research domain in artificial intelligence.At present,some agents can keep the human’s profile for interaction and adapts themselves according to their preferences.However...Detection of personality using emotions is a research domain in artificial intelligence.At present,some agents can keep the human’s profile for interaction and adapts themselves according to their preferences.However,the effective method for interaction is to detect the person’s personality by understanding the emotions and context of the subject.The idea behind adding personality in cognitive agents begins an attempt to maximize adaptability on the basis of behavior.In our daily life,humans socially interact with each other by analyzing the emotions and context of interaction from audio or visual input.This paper presents a conceptual personality model in cognitive agents that can determine personality and behavior based on some text input,using the context subjectivity of the given data and emotions obtained from a particular situation/context.The proposed work consists of Jumbo Chatbot,which can chat with humans.In this social interaction,the chatbot predicts human personality by understanding the emotions and context of interactive humans.Currently,the Jumbo chatbot is using the BFI technique to interact with a human.The accuracy of proposed work varies and improve through getting more experiences of interaction.展开更多
This paper focuses on the problem of automatic image classification (AIC) by proposing a framework based on latent semantic analysis (LSA) and image region pairs. The novel framework employs relative spatial arran...This paper focuses on the problem of automatic image classification (AIC) by proposing a framework based on latent semantic analysis (LSA) and image region pairs. The novel framework employs relative spatial arrangements for region pairs as the primary feature to capture semantics. The significance of this paper is twofold. Firstly, to the best our knowledge, this is the first study of the influence of region pairs as well as their relative spatial information in latent semantic analysis as applied to automatic image classification. Secondly, our proposed method for using the relative spatial information of region pairs show great promise in improving image semantic classi- fication compared with the classical latent semantic analysis method and 2D string representation algorithm.展开更多
Philosophical analysis is commonly assumed to involve decomposing the meaning of a sentence or an expression into a set of conceptually basic constituent parts. This essay challenges this traditional view by examining...Philosophical analysis is commonly assumed to involve decomposing the meaning of a sentence or an expression into a set of conceptually basic constituent parts. This essay challenges this traditional view by examining the potential semantic roles that classifier phrases play in Chinese. It is suggested that the conceptual resources necessary for justifying claims about the semantical status of natural language classifier phrases should be informed in part by methods that accommodate ontogenic and evolutionary contexts. Evidence is provided for the view that many Chinese classifiers (but not all) are features regimented in the grammar of Chinese that have no functional role in normal adult communication, but which play an ontogenetic role in the child's development of linguistic competency. Furthermore, it is suggested that this ontogenetic role has features in common with the phylogenetic processes by which Chinese or its classical variants came about, as a later product of mechanisms that evolved in the species in accordance with varying demands for successful communication.展开更多
Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the sema...Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models.展开更多
This study introduces the Orbit Weighting Scheme(OWS),a novel approach aimed at enhancing the precision and efficiency of Vector Space information retrieval(IR)models,which have traditionally relied on weighting schem...This study introduces the Orbit Weighting Scheme(OWS),a novel approach aimed at enhancing the precision and efficiency of Vector Space information retrieval(IR)models,which have traditionally relied on weighting schemes like tf-idf and BM25.These conventional methods often struggle with accurately capturing document relevance,leading to inefficiencies in both retrieval performance and index size management.OWS proposes a dynamic weighting mechanism that evaluates the significance of terms based on their orbital position within the vector space,emphasizing term relationships and distribution patterns overlooked by existing models.Our research focuses on evaluating OWS’s impact on model accuracy using Information Retrieval metrics like Recall,Precision,InterpolatedAverage Precision(IAP),andMeanAverage Precision(MAP).Additionally,we assessOWS’s effectiveness in reducing the inverted index size,crucial for model efficiency.We compare OWS-based retrieval models against others using different schemes,including tf-idf variations and BM25Delta.Results reveal OWS’s superiority,achieving a 54%Recall and 81%MAP,and a notable 38%reduction in the inverted index size.This highlights OWS’s potential in optimizing retrieval processes and underscores the need for further research in this underrepresented area to fully leverage OWS’s capabilities in information retrieval methodologies.展开更多
Urban system is shaped by the interactions between different regions and regions planned by the government,then reshaped by human activities and residents’needs.Understanding the changes of regional structure and dyn...Urban system is shaped by the interactions between different regions and regions planned by the government,then reshaped by human activities and residents’needs.Understanding the changes of regional structure and dynamics of city function based on the residents’movement demand are important to evaluate and adjust the planning and management of urban services and internal structures.This paper constructed a probabilistic factor model on the basis of probabilistic latent semantic analysis and tensor decomposition,for purpose of understanding the higher order interactive population mobility and its impact on urban structure changes.First,a four-dimensional tensor of time(T)×week(W)×origin(O)×destination(D)was constructed to identify the day-to-day activities in three time modes and weekly regularity of weekday/weekend pattern.Then we reclassified the urban regions based on the space clustering formed by the space factor matrix and core tensor.Finally,we further analysed the space–time interaction on different time scales to deduce the actual function and connection strength of each region.Our research shows that the application of individual-based spatial–temporal data in human mobility and space–time interaction study can help to analyse urban spatial structure and understand the actual regional function from a new perspective.展开更多
The social functionality of places(e.g.school,restaurant)partly determines human behaviors and reflects a region’s functional configuration.Semantic descriptions of places are thus valuable to a range of studies of h...The social functionality of places(e.g.school,restaurant)partly determines human behaviors and reflects a region’s functional configuration.Semantic descriptions of places are thus valuable to a range of studies of humans and geographic spaces.Assuming their potential impacts on human verbalization behaviors,one possibility is to link the functions of places to verbal representations such as users’postings in location-based social networks(LBSNs).In this study,we examine whether the heterogeneous user-generated text snippets found in LBSNs reliably reflect the semantic concepts attached with check-in places.We investigate Foursquare because its available categorization hierarchy provides rich a-priori semantic knowledge about its check-in places,which enables a reliable verification of the semantic concepts identified fromuser-generated text snippets.A latent semantic analysis is conducted on a large Foursquare check-in dataset.The results confirm that attached text messages can represent semantic concepts by demonstrating their large correspondence to the official Foursquare venue categorization.To further elaborate the representativeness of text messages,this work also performs an investigation on the textual terms to quantify their abilities of representing semantic concepts(i.e.,representativeness),and another investigation on semantic concepts to quantify how well they can be represented by text messages(i.e.,representability).The results shed light on featured terms with strong locational characteristics,as well as on distinctive semantic concepts with potentially strong impacts on human verbalizations.展开更多
基金Supported by the National Program on Key Basic Research Project(No.2013CB329502)the National Natural Science Foundation of China(No.61202212)+1 种基金the Special Research Project of the Educational Department of Shaanxi Province of China(No.15JK1038)the Key Research Project of Baoji University of Arts and Sciences(No.ZK16047)
文摘In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.
基金This work was supported by Taif University Researchers Supporting Project number(TURSP-2020/292),Taif University,Taif,Saudi Arabia.This research was funded by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University through the fast-track Research Funding Program.
文摘Social media platforms provide new value for markets and research companies.This article explores the use of social media data to enhance customer value propositions.The case study involves a company that develops wearable Internet of Things(IoT)devices and services for stress management.Netnography and semantic annotation for recognizing and categorizing the context of tweets are conducted to gain a better understanding of users’stress management practices.The aim is to analyze the tweets about stress management practices and to identify the context from the tweets.Thereafter,we map the tweets on pleasure and arousal to elicit customer insights.We analyzed a case study of a marketing strategy on the Twitter platform.Participants in the marketing campaign shared photos and texts about their stress management practices.Machine learning techniques were used to evaluate and estimate the emotions and contexts of the tweets posted by the campaign participants.The computational semantic analysis of the tweets was compared to the text analysis of the tweets.The content analysis of only tweet images resulted in 96%accuracy in detecting tweet context,while that of the textual content of tweets yielded an accuracy of 91%.Semantic tagging by Ontotext was able to detect correct tweet context with an accuracy of 50%.
基金Fundamental Research Funds for the Central Universities of Ministry of Education of China(No.19D111201)。
文摘Current research on metaphor analysis is generally knowledge-based and corpus-based,which calls for methods of automatic feature extraction and weight calculation.Combining natural language processing(NLP),latent semantic analysis(LSA),and Pearson correlation coefficient,this paper proposes a metaphor analysis method for extracting the content words from both literal and metaphorical corpus,calculating correlation degree,and analyzing their relationships.The value of the proposed method was demonstrated through a case study by using a corpus with keyword“飞翔(fly)”.When compared with the method of Pearson correlation coefficient,the experiment shows that the LSA can produce better results with greater significance in correlation degree.It is also found that the number of common words that appeared in both literal and metaphorical word bags decreased with the correlation degree.The case study also revealed that there are more nouns appear in literal corpus,and more adjectives and adverbs appear in metaphorical corpus.The method proposed will benefit NLP researchers to develop the required step-by-step calculation tools for accurate quantitative analysis.
文摘Chinese Color Words two words have a higher degree turn and turn-grade class degree,they are "white"(white) and "black"(black),these two sets of words are generally located on human color perception of the system the top three,we believe that the typical basic color terms most likely to turn and turn-grade class,but different history,culture and other aspects of cognition,cross-grammatical category they are different order.Based on this,in English and Chinese Basic Color Terms "Black" and "White" Cognitive Semantic Analysis of the research topic,this in-depth study of this aspect of the study hope to be beneficial to help.
基金This research was supported and funded by KAU Scientific Endowment,King Abdulaziz University,Jeddah,Saudi Arabia.
文摘A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.
文摘Probabilistic latent semantic analysis (PLSA) is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, the required memory size grows linearly with the data size, and handling massive data streams is very difficult. To process big data streams, we propose an online belief propagation (OBP) algorithm based on the improved factor graph representation for PLSA. The factor graph of PLSA facilitates the classic belief propagation (BP) algorithm. Furthermore, OBP splits the data stream into a set of small segments, and uses the estimated parameters of previous segments to calculate the gradient descent of the current segment. Because OBP removes each segment from memory after processing, it is memoryefficient for big data streams. We examine the performance of OBP on four document data sets, and demonstrate that OBP is competitive in both speed and accuracy for online ex- pectation maximization (OEM) in PLSA, and can also give a more accurate topic evolution. Experiments on massive data streams from Baidu further confirm the effectiveness of the OBP algorithm.
基金The research leading to these results has received founding from the Advanced Industrial Internet Security Platform Program of Zhijiang Laboratory(No.2018FD0ZX01)the National Natural Science Foundation of China(Nos.61802435)the Key Research Projects of Henan college(No.21A520054).
文摘The study on binary code evolution is very crucial for understanding vulnerability repair and malicious code variants.Researchers on code evolution focus on the source code level,whereas very few works have been done to tackle this problem at the binary code level.In this paper,a binary code evolution analysis framework is proposed to automatically locate evolution area and identify evolution semantic with concrete semantic difference.Difference of binary function domain was applied based on function similarity.Trace alignment was used to find evolution blocks,instruction classification semantic was utilized to identify evolution operation,and evolution semantic was extracted combined with function domain elements.The experimental results show that binary code evolution analysis framework can correctly locate binary code evolution area and identify all concrete semantic evolution.
基金Supported by the National Natural Science Foundation of China (No. 61072110)the Industrial Tackling Project of Shaanxi Province (2010K06-20)the Natural Science Foundation of Shaanxi Province (SJ08F15)
文摘Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Semantic Weighted Sum(NSWS) rule is established by defining a new feature of shots,semantic observation weight.The test video is detected based on the HMM and the NSWS rule,respectively.Finally,a fusion scheme based on logic distance is proposed and the detection results of the HMM and the NSWS rule are fused by optimal weights in the decision level,obtaining the final result.Experimental results indicate that the proposed method achieves 96.43% precision and 100% recall,which shows the effectiveness of this letter.
文摘The main focus of the article is the semantic analysis and genesis of the words that create the lexical base of the modern Azerbaijani language to a certain extent and belong to the roots system of the language.The goal is to restore the words which have gone through deformation and flexion for thousands of years to their initial forms.The concept of stem cells in genetics has also been utilized as an analogy method because the author believes that languages are living organisms too and they have words and elements functioning as stem cells.Thus,the principal idea is that the linguistic units and words entering the organic system of a language are deprivations of the aforementioned linguistic stem cells.The stem words and concepts-the original elements of a language are determined in the first place and all the following analyses are built upon them.Such studies contain a wide range of comparativist investigations as well.Examples from the Ancient Greek and Latin languages have also been used as comparativism objects.Discovery of such words will not only give us linguistic information but also objective historical information on different aspects.This fact can be considered one of the main reasons making this kind of study very significant.
文摘Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,water contamination,and general pollution.Environmental complaints represent the expressions of dissatisfaction with these issues.As the timeconsuming of managing a large number of complaints,text mining may be useful for automatically extracting information on stakeholder priorities and concerns.The paper used text mining and semantic network analysis to crawl relevant keywords about environmental complaints from two online complaint submission systems:online claim submission system of Regional Agency for Prevention,Environment and Energy(Arpae)(“Contact Arpae”);and Arpae's internal platform for environmental pollution(“Environmental incident reporting portal”)in the Emilia-Romagna Region,Italy.We evaluated the total of 2477 records and classified this information based on the claim topic(air pollution,water pollution,noise pollution,waste,odor,soil,weather-climate,sea-coast,and electromagnetic radiation)and geographical distribution.Then,this paper used natural language processing to extract keywords from the dataset,and classified keywords ranking higher in Term Frequency-Inverse Document Frequency(TF-IDF)based on the driver,pressure,state,impact,and response(DPSIR)framework.This study provided a systemic approach to understanding the interaction between people and environment in different geographical contexts and builds sustainable and healthy communities.The results showed that most complaints are from the public and associated with air pollution and odor.Factories(particularly foundries and ceramic industries)and farms are identified as the drivers of environmental issues.Citizen believed that environmental issues mainly affect human well-being.Moreover,the keywords of“odor”,“report”,“request”,“presence”,“municipality”,and“hours”were the most influential and meaningful concepts,as demonstrated by their high degree and betweenness centrality values.Keywords connecting odor(classified as impacts)and air pollution(classified as state)were the most important(such as“odor-burnt plastic”and“odor-acrid”).Complainants perceived odor annoyance as a primary environmental concern,possibly related to two main drivers:“odor-factory”and“odorsfarms”.The proposed approach has several theoretical and practical implications:text mining may quickly and efficiently address citizen needs,providing the basis toward automating(even partially)the complaint process;and the DPSIR framework might support the planning and organization of information and the identification of stakeholder concerns and priorities,as well as metrics and indicators for their assessment.Therefore,integration of the DPSIR framework with the text mining of environmental complaints might generate a comprehensive environmental knowledge base as a prerequisite for a wider exploitation of analysis to support decision-making processes and environmental management activities.
基金National Research Foundation of Korea-Grant funded by Korean Government(Ministry of Science&ICT)-NRF-2020R1A2B5B02002478 through Dr.Kyung-sup Kwak.
文摘Android has been dominating the smartphone market for more than a decade and has managed to capture 87.8%of the market share.Such popularity of Android has drawn the attention of cybercriminals and malware developers.The malicious applications can steal sensitive information like contacts,read personal messages,record calls,send messages to premium-rate numbers,cause financial loss,gain access to the gallery and can access the user’s geographic location.Numerous surveys on Android security have primarily focused on types of malware attack,their propagation,and techniques to mitigate them.To the best of our knowledge,Android malware literature has never been explored using information modelling techniques.Further,promulgation of contemporary research trends in Android malware research has never been done from semantic point of view.This paper intends to identify intellectual core from Android malware literature using Latent Semantic Analysis(LSA).An extensive corpus of 843 articles on Android malware and security,published during 2009–2019,were processed using LSA.Subsequently,the truncated singular Value Decomposition(SVD)technique was used for dimensionality reduction.Later,machine learning methods were deployed to effectively segregate prominent topic solutions with minimal bias.Apropos to observed term and document loading matrix values,this five core research areas and twenty research trends were identified.Further,potential future research directions have been detailed to offer a quick reference for information scientists.The study concludes to the fact that Android security is crucial for pervasive Android devices.Static analysis is the most widely investigated core area within Android security research and is expected to remain in trend in near future.Research trends indicate the need for a faster yet effective model to detect Android applications causing obfuscation,financial attacks and stealing user information.
基金Project(50808025)supported by the National Natural Science Foundation of ChinaProject(20090162110057)supported by the Doctoral Fund of Ministry of Education,China
文摘A novel method based on interval temporal syntactic model was proposed to recognize human activities in video flow. The method is composed of two parts: feature extract and activities recognition. Trajectory shape descriptor, speeded up robust features(SURF) and histograms of optical flow(HOF) were proposed to represent human activities, which provide more exhaustive information to describe human activities on shape, structure and motion. In the process of recognition, a probabilistic latent semantic analysis model(PLSA) was used to recognize sample activities at the first step. Then, an interval temporal syntactic model, which combines the syntactic model with the interval algebra to model the temporal dependencies of activities explicitly, was introduced to recognize the complex activities with a time relationship. Experiments results show the effectiveness of the proposed method in comparison with other state-of-the-art methods on the public databases for the recognition of complex activities.
基金jointly supported by the National Social Science Foundation of China(Grant Nos.:08ATQ003 and 10&ZD134)
文摘Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based on hybrid strategies.Design/methodology/approach: We analyzed the factors influencing the successful matching between a user's question and a question-answer(QA) pair in the FAQ database. Our approach is based on a combination of multiple factors. Experiments were conducted to test the performance of our method.Findings: Experiments show that this proposed method has higher accuracy. Compared with similarity calculation based on TF-IDF,the sentence surface forms and the semantic relations,the proposed method based on hybrid strategies has a superior performance in precision,recall and F-measure value.Research limitations: The FAQ answering system is only capable of meeting users' demand for text retrieval at present. In the future,the system needs to be improved to meet users' demand for retrieving images and videos.Practical implications: This FAQ answering system will help farmers utilize agricultural information resources more efficiently.Originality/value: We design the algorithms for calculating similarity of Chinese sentences based on hybrid strategies,which integrate the question surface similarity,the question semantic similarity and the question-answer similarity based on latent semantic analysis(LSA) to find answers to a user's question.
文摘Detection of personality using emotions is a research domain in artificial intelligence.At present,some agents can keep the human’s profile for interaction and adapts themselves according to their preferences.However,the effective method for interaction is to detect the person’s personality by understanding the emotions and context of the subject.The idea behind adding personality in cognitive agents begins an attempt to maximize adaptability on the basis of behavior.In our daily life,humans socially interact with each other by analyzing the emotions and context of interaction from audio or visual input.This paper presents a conceptual personality model in cognitive agents that can determine personality and behavior based on some text input,using the context subjectivity of the given data and emotions obtained from a particular situation/context.The proposed work consists of Jumbo Chatbot,which can chat with humans.In this social interaction,the chatbot predicts human personality by understanding the emotions and context of interactive humans.Currently,the Jumbo chatbot is using the BFI technique to interact with a human.The accuracy of proposed work varies and improve through getting more experiences of interaction.
文摘This paper focuses on the problem of automatic image classification (AIC) by proposing a framework based on latent semantic analysis (LSA) and image region pairs. The novel framework employs relative spatial arrangements for region pairs as the primary feature to capture semantics. The significance of this paper is twofold. Firstly, to the best our knowledge, this is the first study of the influence of region pairs as well as their relative spatial information in latent semantic analysis as applied to automatic image classification. Secondly, our proposed method for using the relative spatial information of region pairs show great promise in improving image semantic classi- fication compared with the classical latent semantic analysis method and 2D string representation algorithm.
文摘Philosophical analysis is commonly assumed to involve decomposing the meaning of a sentence or an expression into a set of conceptually basic constituent parts. This essay challenges this traditional view by examining the potential semantic roles that classifier phrases play in Chinese. It is suggested that the conceptual resources necessary for justifying claims about the semantical status of natural language classifier phrases should be informed in part by methods that accommodate ontogenic and evolutionary contexts. Evidence is provided for the view that many Chinese classifiers (but not all) are features regimented in the grammar of Chinese that have no functional role in normal adult communication, but which play an ontogenetic role in the child's development of linguistic competency. Furthermore, it is suggested that this ontogenetic role has features in common with the phylogenetic processes by which Chinese or its classical variants came about, as a later product of mechanisms that evolved in the species in accordance with varying demands for successful communication.
基金supported by the Foundation of the State Key Laboratory of Software Development Environment(No.SKLSDE-2015ZX-04)
文摘Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models.
文摘This study introduces the Orbit Weighting Scheme(OWS),a novel approach aimed at enhancing the precision and efficiency of Vector Space information retrieval(IR)models,which have traditionally relied on weighting schemes like tf-idf and BM25.These conventional methods often struggle with accurately capturing document relevance,leading to inefficiencies in both retrieval performance and index size management.OWS proposes a dynamic weighting mechanism that evaluates the significance of terms based on their orbital position within the vector space,emphasizing term relationships and distribution patterns overlooked by existing models.Our research focuses on evaluating OWS’s impact on model accuracy using Information Retrieval metrics like Recall,Precision,InterpolatedAverage Precision(IAP),andMeanAverage Precision(MAP).Additionally,we assessOWS’s effectiveness in reducing the inverted index size,crucial for model efficiency.We compare OWS-based retrieval models against others using different schemes,including tf-idf variations and BM25Delta.Results reveal OWS’s superiority,achieving a 54%Recall and 81%MAP,and a notable 38%reduction in the inverted index size.This highlights OWS’s potential in optimizing retrieval processes and underscores the need for further research in this underrepresented area to fully leverage OWS’s capabilities in information retrieval methodologies.
基金National Natural Science Foundation(grant number 41371499)Guangdong Province Natural Science Foundation research team project(2014A030312010).
文摘Urban system is shaped by the interactions between different regions and regions planned by the government,then reshaped by human activities and residents’needs.Understanding the changes of regional structure and dynamics of city function based on the residents’movement demand are important to evaluate and adjust the planning and management of urban services and internal structures.This paper constructed a probabilistic factor model on the basis of probabilistic latent semantic analysis and tensor decomposition,for purpose of understanding the higher order interactive population mobility and its impact on urban structure changes.First,a four-dimensional tensor of time(T)×week(W)×origin(O)×destination(D)was constructed to identify the day-to-day activities in three time modes and weekly regularity of weekday/weekend pattern.Then we reclassified the urban regions based on the space clustering formed by the space factor matrix and core tensor.Finally,we further analysed the space–time interaction on different time scales to deduce the actual function and connection strength of each region.Our research shows that the application of individual-based spatial–temporal data in human mobility and space–time interaction study can help to analyse urban spatial structure and understand the actual regional function from a new perspective.
基金supported by the German Research Foundation(DFG)through the priority program“Volunteered Geographic Information:Interpretation,Visualisation and Social Computing”(SPP 1894).
文摘The social functionality of places(e.g.school,restaurant)partly determines human behaviors and reflects a region’s functional configuration.Semantic descriptions of places are thus valuable to a range of studies of humans and geographic spaces.Assuming their potential impacts on human verbalization behaviors,one possibility is to link the functions of places to verbal representations such as users’postings in location-based social networks(LBSNs).In this study,we examine whether the heterogeneous user-generated text snippets found in LBSNs reliably reflect the semantic concepts attached with check-in places.We investigate Foursquare because its available categorization hierarchy provides rich a-priori semantic knowledge about its check-in places,which enables a reliable verification of the semantic concepts identified fromuser-generated text snippets.A latent semantic analysis is conducted on a large Foursquare check-in dataset.The results confirm that attached text messages can represent semantic concepts by demonstrating their large correspondence to the official Foursquare venue categorization.To further elaborate the representativeness of text messages,this work also performs an investigation on the textual terms to quantify their abilities of representing semantic concepts(i.e.,representativeness),and another investigation on semantic concepts to quantify how well they can be represented by text messages(i.e.,representability).The results shed light on featured terms with strong locational characteristics,as well as on distinctive semantic concepts with potentially strong impacts on human verbalizations.