To further enhance the efficiencies of search engines,achieving capabilities of searching,indexing and locating the information in the deep web,latent semantic analysis is a simple and effective way.Through the latent...To further enhance the efficiencies of search engines,achieving capabilities of searching,indexing and locating the information in the deep web,latent semantic analysis is a simple and effective way.Through the latent semantic analysis of the attributes in the query interfaces and the unique entrances of the deep web sites,the hidden semantic structure information can be retrieved and dimension reduction can be achieved to a certain extent.Using this semantic structure information,the contents in the site can be inferred and the similarity measures among sites in deep web can be revised.Experimental results show that latent semantic analysis revises and improves the semantic understanding of the query form in the deep web,which overcomes the shortcomings of the keyword-based methods.This approach can be used to effectively search the most similar site for any given site and to obtain a site list which conforms to the restrictions one specifies.展开更多
In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficie...In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.展开更多
Because of everyone's involvement in social networks, social networks are full of massive multimedia data, and events are got released and disseminated through social networks in the form of multi-modal and multi-att...Because of everyone's involvement in social networks, social networks are full of massive multimedia data, and events are got released and disseminated through social networks in the form of multi-modal and multi-attribute heterogeneous data. There have been numerous researches on social network search. Considering the spatio-temporal feature of messages and social relationships among users, we summarized an overall social network search framework from the perspective of semantics based on existing researches. For social network search, the acquisition and representation of spatio-temporal data is the basis, the semantic analysis and modeling of social network cross-media big data is an important component, deep semantic learning of social networks is the key research field, and the indexing and ranking mechanism is the indispensable part. This paper reviews the current studies in these fields, and then main challenges of social network search are given. Finally, we give an outlook to the prospect and further work of social network search.展开更多
Social media platforms provide new value for markets and research companies.This article explores the use of social media data to enhance customer value propositions.The case study involves a company that develops wea...Social media platforms provide new value for markets and research companies.This article explores the use of social media data to enhance customer value propositions.The case study involves a company that develops wearable Internet of Things(IoT)devices and services for stress management.Netnography and semantic annotation for recognizing and categorizing the context of tweets are conducted to gain a better understanding of users’stress management practices.The aim is to analyze the tweets about stress management practices and to identify the context from the tweets.Thereafter,we map the tweets on pleasure and arousal to elicit customer insights.We analyzed a case study of a marketing strategy on the Twitter platform.Participants in the marketing campaign shared photos and texts about their stress management practices.Machine learning techniques were used to evaluate and estimate the emotions and contexts of the tweets posted by the campaign participants.The computational semantic analysis of the tweets was compared to the text analysis of the tweets.The content analysis of only tweet images resulted in 96%accuracy in detecting tweet context,while that of the textual content of tweets yielded an accuracy of 91%.Semantic tagging by Ontotext was able to detect correct tweet context with an accuracy of 50%.展开更多
Current research on metaphor analysis is generally knowledge-based and corpus-based,which calls for methods of automatic feature extraction and weight calculation.Combining natural language processing(NLP),latent sema...Current research on metaphor analysis is generally knowledge-based and corpus-based,which calls for methods of automatic feature extraction and weight calculation.Combining natural language processing(NLP),latent semantic analysis(LSA),and Pearson correlation coefficient,this paper proposes a metaphor analysis method for extracting the content words from both literal and metaphorical corpus,calculating correlation degree,and analyzing their relationships.The value of the proposed method was demonstrated through a case study by using a corpus with keyword“飞翔(fly)”.When compared with the method of Pearson correlation coefficient,the experiment shows that the LSA can produce better results with greater significance in correlation degree.It is also found that the number of common words that appeared in both literal and metaphorical word bags decreased with the correlation degree.The case study also revealed that there are more nouns appear in literal corpus,and more adjectives and adverbs appear in metaphorical corpus.The method proposed will benefit NLP researchers to develop the required step-by-step calculation tools for accurate quantitative analysis.展开更多
Chinese Color Words two words have a higher degree turn and turn-grade class degree,they are "white"(white) and "black"(black),these two sets of words are generally located on human color perceptio...Chinese Color Words two words have a higher degree turn and turn-grade class degree,they are "white"(white) and "black"(black),these two sets of words are generally located on human color perception of the system the top three,we believe that the typical basic color terms most likely to turn and turn-grade class,but different history,culture and other aspects of cognition,cross-grammatical category they are different order.Based on this,in English and Chinese Basic Color Terms "Black" and "White" Cognitive Semantic Analysis of the research topic,this in-depth study of this aspect of the study hope to be beneficial to help.展开更多
A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed docume...A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.展开更多
Probabilistic latent semantic analysis (PLSA) is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, th...Probabilistic latent semantic analysis (PLSA) is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, the required memory size grows linearly with the data size, and handling massive data streams is very difficult. To process big data streams, we propose an online belief propagation (OBP) algorithm based on the improved factor graph representation for PLSA. The factor graph of PLSA facilitates the classic belief propagation (BP) algorithm. Furthermore, OBP splits the data stream into a set of small segments, and uses the estimated parameters of previous segments to calculate the gradient descent of the current segment. Because OBP removes each segment from memory after processing, it is memoryefficient for big data streams. We examine the performance of OBP on four document data sets, and demonstrate that OBP is competitive in both speed and accuracy for online ex- pectation maximization (OEM) in PLSA, and can also give a more accurate topic evolution. Experiments on massive data streams from Baidu further confirm the effectiveness of the OBP algorithm.展开更多
Semantic video analysis plays an important role in the field of machine intelligence and pattern recognition. In this paper, based on the Hidden Markov Model (HMM), a semantic recognition framework on compressed video...Semantic video analysis plays an important role in the field of machine intelligence and pattern recognition. In this paper, based on the Hidden Markov Model (HMM), a semantic recognition framework on compressed videos is proposed to analyze the video events according to six low-level features. After the detailed analysis of video events, the pattern of global motion and five features in foreground—the principal parts of videos, are employed as the observations of the Hidden Markov Model to classify events in videos. The applications of the proposed framework in some video event detections demonstrate the promising success of the proposed framework on semantic video analysis.展开更多
Since the 1930s, the semantic analysis method has been of great methodological significance in the historical evolution and transmission of the philosophy of science, serving as a platform for debate, mutual reference...Since the 1930s, the semantic analysis method has been of great methodological significance in the historical evolution and transmission of the philosophy of science, serving as a platform for debate, mutual reference and interpenetration between scientific realism and anti-realism. Today, in historical retrospect, we see that we need to gain a new understanding of the importance of the semantic analysis method, appreciate the fundamental nature of contextualized semantic structure in interpreting scientific theories, and make it clear that determining and expressing reference is at the heart of the semantic analysis methodology of theoretical interpretation and that "two-dimensional" semantic analysis represents a necessary trend in current research on semantic analysis methodology. On this basis, we can reestablish the methodological position of the semantic analysis method in research on scientific realism in terms of form, content, structure and system, an achievement that will be of great strategic significance for the development of contemporary scientific realism.展开更多
As a different type of research in the philosophy of science, biological theory has always attracted scholarly attention. One reason for this phenomenon is that it differs from the traditional scientific paradigm. To ...As a different type of research in the philosophy of science, biological theory has always attracted scholarly attention. One reason for this phenomenon is that it differs from the traditional scientific paradigm. To explore the root of this difference, we need to examine closely the theoretical foundations and structure of biology. The complexity of the multiple contexts and semantic structures expressed in the theoretical structure specific to biology requires that research on the theoretical foundation of biology incorporate semantic analysis. It is particularly necessary to conduct semantic decomposition of the theories themselves and study their semantic correlations. In doing so, we can discern the rationality of the biological model as a special form of scientific explanation distinct from that of physics and chemistry.展开更多
The study on binary code evolution is very crucial for understanding vulnerability repair and malicious code variants.Researchers on code evolution focus on the source code level,whereas very few works have been done ...The study on binary code evolution is very crucial for understanding vulnerability repair and malicious code variants.Researchers on code evolution focus on the source code level,whereas very few works have been done to tackle this problem at the binary code level.In this paper,a binary code evolution analysis framework is proposed to automatically locate evolution area and identify evolution semantic with concrete semantic difference.Difference of binary function domain was applied based on function similarity.Trace alignment was used to find evolution blocks,instruction classification semantic was utilized to identify evolution operation,and evolution semantic was extracted combined with function domain elements.The experimental results show that binary code evolution analysis framework can correctly locate binary code evolution area and identify all concrete semantic evolution.展开更多
Software testing is a critical phase due to misconceptions about ambiguities in the requirements during specification,which affect the testing process.Therefore,it is difficult to identify all faults in software.As re...Software testing is a critical phase due to misconceptions about ambiguities in the requirements during specification,which affect the testing process.Therefore,it is difficult to identify all faults in software.As requirement changes continuously,it increases the irrelevancy and redundancy during testing.Due to these challenges;fault detection capability decreases and there arises a need to improve the testing process,which is based on changes in requirements specification.In this research,we have developed a model to resolve testing challenges through requirement prioritization and prediction in an agile-based environment.The research objective is to identify the most relevant and meaningful requirements through semantic analysis for correct change analysis.Then compute the similarity of requirements through case-based reasoning,which predicted the requirements for reuse and restricted to error-based requirements.Afterward,the apriori algorithm mapped out requirement frequency to select relevant test cases based on frequently reused or not reused test cases to increase the fault detection rate.Furthermore,the proposed model was evaluated by conducting experiments.The results showed that requirement redundancy and irrelevancy improved due to semantic analysis,which correctly predicted the requirements,increasing the fault detection rate and resulting in high user satisfaction.The predicted requirements are mapped into test cases,increasing the fault detection rate after changes to achieve higher user satisfaction.Therefore,the model improves the redundancy and irrelevancy of requirements by more than 90%compared to other clustering methods and the analytical hierarchical process,achieving an 80%fault detection rate at an earlier stage.Hence,it provides guidelines for practitioners and researchers in the modern era.In the future,we will provide the working prototype of this model for proof of concept.展开更多
This study introduces the Orbit Weighting Scheme(OWS),a novel approach aimed at enhancing the precision and efficiency of Vector Space information retrieval(IR)models,which have traditionally relied on weighting schem...This study introduces the Orbit Weighting Scheme(OWS),a novel approach aimed at enhancing the precision and efficiency of Vector Space information retrieval(IR)models,which have traditionally relied on weighting schemes like tf-idf and BM25.These conventional methods often struggle with accurately capturing document relevance,leading to inefficiencies in both retrieval performance and index size management.OWS proposes a dynamic weighting mechanism that evaluates the significance of terms based on their orbital position within the vector space,emphasizing term relationships and distribution patterns overlooked by existing models.Our research focuses on evaluating OWS’s impact on model accuracy using Information Retrieval metrics like Recall,Precision,InterpolatedAverage Precision(IAP),andMeanAverage Precision(MAP).Additionally,we assessOWS’s effectiveness in reducing the inverted index size,crucial for model efficiency.We compare OWS-based retrieval models against others using different schemes,including tf-idf variations and BM25Delta.Results reveal OWS’s superiority,achieving a 54%Recall and 81%MAP,and a notable 38%reduction in the inverted index size.This highlights OWS’s potential in optimizing retrieval processes and underscores the need for further research in this underrepresented area to fully leverage OWS’s capabilities in information retrieval methodologies.展开更多
Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Seman...Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Semantic Weighted Sum(NSWS) rule is established by defining a new feature of shots,semantic observation weight.The test video is detected based on the HMM and the NSWS rule,respectively.Finally,a fusion scheme based on logic distance is proposed and the detection results of the HMM and the NSWS rule are fused by optimal weights in the decision level,obtaining the final result.Experimental results indicate that the proposed method achieves 96.43% precision and 100% recall,which shows the effectiveness of this letter.展开更多
The main focus of the article is the semantic analysis and genesis of the words that create the lexical base of the modern Azerbaijani language to a certain extent and belong to the roots system of the language.The go...The main focus of the article is the semantic analysis and genesis of the words that create the lexical base of the modern Azerbaijani language to a certain extent and belong to the roots system of the language.The goal is to restore the words which have gone through deformation and flexion for thousands of years to their initial forms.The concept of stem cells in genetics has also been utilized as an analogy method because the author believes that languages are living organisms too and they have words and elements functioning as stem cells.Thus,the principal idea is that the linguistic units and words entering the organic system of a language are deprivations of the aforementioned linguistic stem cells.The stem words and concepts-the original elements of a language are determined in the first place and all the following analyses are built upon them.Such studies contain a wide range of comparativist investigations as well.Examples from the Ancient Greek and Latin languages have also been used as comparativism objects.Discovery of such words will not only give us linguistic information but also objective historical information on different aspects.This fact can be considered one of the main reasons making this kind of study very significant.展开更多
The global view of firewall policy conflict is important for administrators to optimize the policy.It has been lack of appropriate firewall policy global conflict analysis,existing methods focus on local conflict dete...The global view of firewall policy conflict is important for administrators to optimize the policy.It has been lack of appropriate firewall policy global conflict analysis,existing methods focus on local conflict detection.We research the global conflict detection algorithm in this paper.We presented a semantic model that captures more complete classifications of the policy using knowledge concept in rough set.Based on this model,we presented the global conflict formal model,and represent it with OBDD(Ordered Binary Decision Diagram).Then we developed GFPCDA(Global Firewall Policy Conflict Detection Algorithm) algorithm to detect global conflict.In experiment,we evaluated the usability of our semantic model by eliminating the false positives and false negatives caused by incomplete policy semantic model,of a classical algorithm.We compared this algorithm with GFPCDA algorithm.The results show that GFPCDA detects conflicts more precisely and independently,and has better performance.展开更多
To improve motion graph based motion synthesis,semantic control was introduced.Hybrid motion features including both numerical and user-defined semantic relational features were extracted to encode the characteristic ...To improve motion graph based motion synthesis,semantic control was introduced.Hybrid motion features including both numerical and user-defined semantic relational features were extracted to encode the characteristic aspects contained in the character's poses of the given motion sequences.Motion templates were then automatically derived from the training motions for capturing the spatio-temporal characteristics of an entire given class of semantically related motions.The data streams of motion documents were automatically annotated with semantic motion class labels by matching their respective motion class templates.Finally,the semantic control was introduced into motion graph based human motion synthesis.Experiments of motion synthesis demonstrate the effectiveness of the approach which enables users higher level of semantically intuitive control and high quality in human motion synthesis from motion capture database.展开更多
Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,wa...Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,water contamination,and general pollution.Environmental complaints represent the expressions of dissatisfaction with these issues.As the timeconsuming of managing a large number of complaints,text mining may be useful for automatically extracting information on stakeholder priorities and concerns.The paper used text mining and semantic network analysis to crawl relevant keywords about environmental complaints from two online complaint submission systems:online claim submission system of Regional Agency for Prevention,Environment and Energy(Arpae)(“Contact Arpae”);and Arpae's internal platform for environmental pollution(“Environmental incident reporting portal”)in the Emilia-Romagna Region,Italy.We evaluated the total of 2477 records and classified this information based on the claim topic(air pollution,water pollution,noise pollution,waste,odor,soil,weather-climate,sea-coast,and electromagnetic radiation)and geographical distribution.Then,this paper used natural language processing to extract keywords from the dataset,and classified keywords ranking higher in Term Frequency-Inverse Document Frequency(TF-IDF)based on the driver,pressure,state,impact,and response(DPSIR)framework.This study provided a systemic approach to understanding the interaction between people and environment in different geographical contexts and builds sustainable and healthy communities.The results showed that most complaints are from the public and associated with air pollution and odor.Factories(particularly foundries and ceramic industries)and farms are identified as the drivers of environmental issues.Citizen believed that environmental issues mainly affect human well-being.Moreover,the keywords of“odor”,“report”,“request”,“presence”,“municipality”,and“hours”were the most influential and meaningful concepts,as demonstrated by their high degree and betweenness centrality values.Keywords connecting odor(classified as impacts)and air pollution(classified as state)were the most important(such as“odor-burnt plastic”and“odor-acrid”).Complainants perceived odor annoyance as a primary environmental concern,possibly related to two main drivers:“odor-factory”and“odorsfarms”.The proposed approach has several theoretical and practical implications:text mining may quickly and efficiently address citizen needs,providing the basis toward automating(even partially)the complaint process;and the DPSIR framework might support the planning and organization of information and the identification of stakeholder concerns and priorities,as well as metrics and indicators for their assessment.Therefore,integration of the DPSIR framework with the text mining of environmental complaints might generate a comprehensive environmental knowledge base as a prerequisite for a wider exploitation of analysis to support decision-making processes and environmental management activities.展开更多
In order to solve the problem that current search engines provide query-oriented searches rather than user-oriented ones, and that this improper orientation leads to the search engines' inability to meet the personal...In order to solve the problem that current search engines provide query-oriented searches rather than user-oriented ones, and that this improper orientation leads to the search engines' inability to meet the personalized requirements of users, a novel method based on probabilistic latent semantic analysis (PLSA) is proposed to convert query-oriented web search to user-oriented web search. First, a user profile represented as a user' s topics of interest vector is created by analyzing the user' s click through data based on PLSA, then the user' s queries are mapped into categories based on the user' s preferences, and finally the result list is re-ranked according to the user' s interests based on the new proposed method named user-oriented PageRank (UOPR). Experiments on real life datasets show that the user-oriented search system that adopts PLSA takes considerable consideration of user preferences and better satisfies a user' s personalized information needs.展开更多
文摘To further enhance the efficiencies of search engines,achieving capabilities of searching,indexing and locating the information in the deep web,latent semantic analysis is a simple and effective way.Through the latent semantic analysis of the attributes in the query interfaces and the unique entrances of the deep web sites,the hidden semantic structure information can be retrieved and dimension reduction can be achieved to a certain extent.Using this semantic structure information,the contents in the site can be inferred and the similarity measures among sites in deep web can be revised.Experimental results show that latent semantic analysis revises and improves the semantic understanding of the query form in the deep web,which overcomes the shortcomings of the keyword-based methods.This approach can be used to effectively search the most similar site for any given site and to obtain a site list which conforms to the restrictions one specifies.
基金Supported by the National Program on Key Basic Research Project(No.2013CB329502)the National Natural Science Foundation of China(No.61202212)+1 种基金the Special Research Project of the Educational Department of Shaanxi Province of China(No.15JK1038)the Key Research Project of Baoji University of Arts and Sciences(No.ZK16047)
文摘In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.
文摘Because of everyone's involvement in social networks, social networks are full of massive multimedia data, and events are got released and disseminated through social networks in the form of multi-modal and multi-attribute heterogeneous data. There have been numerous researches on social network search. Considering the spatio-temporal feature of messages and social relationships among users, we summarized an overall social network search framework from the perspective of semantics based on existing researches. For social network search, the acquisition and representation of spatio-temporal data is the basis, the semantic analysis and modeling of social network cross-media big data is an important component, deep semantic learning of social networks is the key research field, and the indexing and ranking mechanism is the indispensable part. This paper reviews the current studies in these fields, and then main challenges of social network search are given. Finally, we give an outlook to the prospect and further work of social network search.
基金This work was supported by Taif University Researchers Supporting Project number(TURSP-2020/292),Taif University,Taif,Saudi Arabia.This research was funded by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University through the fast-track Research Funding Program.
文摘Social media platforms provide new value for markets and research companies.This article explores the use of social media data to enhance customer value propositions.The case study involves a company that develops wearable Internet of Things(IoT)devices and services for stress management.Netnography and semantic annotation for recognizing and categorizing the context of tweets are conducted to gain a better understanding of users’stress management practices.The aim is to analyze the tweets about stress management practices and to identify the context from the tweets.Thereafter,we map the tweets on pleasure and arousal to elicit customer insights.We analyzed a case study of a marketing strategy on the Twitter platform.Participants in the marketing campaign shared photos and texts about their stress management practices.Machine learning techniques were used to evaluate and estimate the emotions and contexts of the tweets posted by the campaign participants.The computational semantic analysis of the tweets was compared to the text analysis of the tweets.The content analysis of only tweet images resulted in 96%accuracy in detecting tweet context,while that of the textual content of tweets yielded an accuracy of 91%.Semantic tagging by Ontotext was able to detect correct tweet context with an accuracy of 50%.
基金Fundamental Research Funds for the Central Universities of Ministry of Education of China(No.19D111201)。
文摘Current research on metaphor analysis is generally knowledge-based and corpus-based,which calls for methods of automatic feature extraction and weight calculation.Combining natural language processing(NLP),latent semantic analysis(LSA),and Pearson correlation coefficient,this paper proposes a metaphor analysis method for extracting the content words from both literal and metaphorical corpus,calculating correlation degree,and analyzing their relationships.The value of the proposed method was demonstrated through a case study by using a corpus with keyword“飞翔(fly)”.When compared with the method of Pearson correlation coefficient,the experiment shows that the LSA can produce better results with greater significance in correlation degree.It is also found that the number of common words that appeared in both literal and metaphorical word bags decreased with the correlation degree.The case study also revealed that there are more nouns appear in literal corpus,and more adjectives and adverbs appear in metaphorical corpus.The method proposed will benefit NLP researchers to develop the required step-by-step calculation tools for accurate quantitative analysis.
文摘Chinese Color Words two words have a higher degree turn and turn-grade class degree,they are "white"(white) and "black"(black),these two sets of words are generally located on human color perception of the system the top three,we believe that the typical basic color terms most likely to turn and turn-grade class,but different history,culture and other aspects of cognition,cross-grammatical category they are different order.Based on this,in English and Chinese Basic Color Terms "Black" and "White" Cognitive Semantic Analysis of the research topic,this in-depth study of this aspect of the study hope to be beneficial to help.
基金This research was supported and funded by KAU Scientific Endowment,King Abdulaziz University,Jeddah,Saudi Arabia.
文摘A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.
文摘Probabilistic latent semantic analysis (PLSA) is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, the required memory size grows linearly with the data size, and handling massive data streams is very difficult. To process big data streams, we propose an online belief propagation (OBP) algorithm based on the improved factor graph representation for PLSA. The factor graph of PLSA facilitates the classic belief propagation (BP) algorithm. Furthermore, OBP splits the data stream into a set of small segments, and uses the estimated parameters of previous segments to calculate the gradient descent of the current segment. Because OBP removes each segment from memory after processing, it is memoryefficient for big data streams. We examine the performance of OBP on four document data sets, and demonstrate that OBP is competitive in both speed and accuracy for online ex- pectation maximization (OEM) in PLSA, and can also give a more accurate topic evolution. Experiments on massive data streams from Baidu further confirm the effectiveness of the OBP algorithm.
基金Supported in part by the National Natural Science Foundation of China (No. 60572045)the Ministry of Education of China Ph.D. Program Foundation (No.20050698033)Cooperation Project (2005.7-2007.6) with Microsoft Research Asia.
文摘Semantic video analysis plays an important role in the field of machine intelligence and pattern recognition. In this paper, based on the Hidden Markov Model (HMM), a semantic recognition framework on compressed videos is proposed to analyze the video events according to six low-level features. After the detailed analysis of video events, the pattern of global motion and five features in foreground—the principal parts of videos, are employed as the observations of the Hidden Markov Model to classify events in videos. The applications of the proposed framework in some video event detections demonstrate the promising success of the proposed framework on semantic video analysis.
文摘Since the 1930s, the semantic analysis method has been of great methodological significance in the historical evolution and transmission of the philosophy of science, serving as a platform for debate, mutual reference and interpenetration between scientific realism and anti-realism. Today, in historical retrospect, we see that we need to gain a new understanding of the importance of the semantic analysis method, appreciate the fundamental nature of contextualized semantic structure in interpreting scientific theories, and make it clear that determining and expressing reference is at the heart of the semantic analysis methodology of theoretical interpretation and that "two-dimensional" semantic analysis represents a necessary trend in current research on semantic analysis methodology. On this basis, we can reestablish the methodological position of the semantic analysis method in research on scientific realism in terms of form, content, structure and system, an achievement that will be of great strategic significance for the development of contemporary scientific realism.
文摘As a different type of research in the philosophy of science, biological theory has always attracted scholarly attention. One reason for this phenomenon is that it differs from the traditional scientific paradigm. To explore the root of this difference, we need to examine closely the theoretical foundations and structure of biology. The complexity of the multiple contexts and semantic structures expressed in the theoretical structure specific to biology requires that research on the theoretical foundation of biology incorporate semantic analysis. It is particularly necessary to conduct semantic decomposition of the theories themselves and study their semantic correlations. In doing so, we can discern the rationality of the biological model as a special form of scientific explanation distinct from that of physics and chemistry.
基金The research leading to these results has received founding from the Advanced Industrial Internet Security Platform Program of Zhijiang Laboratory(No.2018FD0ZX01)the National Natural Science Foundation of China(Nos.61802435)the Key Research Projects of Henan college(No.21A520054).
文摘The study on binary code evolution is very crucial for understanding vulnerability repair and malicious code variants.Researchers on code evolution focus on the source code level,whereas very few works have been done to tackle this problem at the binary code level.In this paper,a binary code evolution analysis framework is proposed to automatically locate evolution area and identify evolution semantic with concrete semantic difference.Difference of binary function domain was applied based on function similarity.Trace alignment was used to find evolution blocks,instruction classification semantic was utilized to identify evolution operation,and evolution semantic was extracted combined with function domain elements.The experimental results show that binary code evolution analysis framework can correctly locate binary code evolution area and identify all concrete semantic evolution.
文摘Software testing is a critical phase due to misconceptions about ambiguities in the requirements during specification,which affect the testing process.Therefore,it is difficult to identify all faults in software.As requirement changes continuously,it increases the irrelevancy and redundancy during testing.Due to these challenges;fault detection capability decreases and there arises a need to improve the testing process,which is based on changes in requirements specification.In this research,we have developed a model to resolve testing challenges through requirement prioritization and prediction in an agile-based environment.The research objective is to identify the most relevant and meaningful requirements through semantic analysis for correct change analysis.Then compute the similarity of requirements through case-based reasoning,which predicted the requirements for reuse and restricted to error-based requirements.Afterward,the apriori algorithm mapped out requirement frequency to select relevant test cases based on frequently reused or not reused test cases to increase the fault detection rate.Furthermore,the proposed model was evaluated by conducting experiments.The results showed that requirement redundancy and irrelevancy improved due to semantic analysis,which correctly predicted the requirements,increasing the fault detection rate and resulting in high user satisfaction.The predicted requirements are mapped into test cases,increasing the fault detection rate after changes to achieve higher user satisfaction.Therefore,the model improves the redundancy and irrelevancy of requirements by more than 90%compared to other clustering methods and the analytical hierarchical process,achieving an 80%fault detection rate at an earlier stage.Hence,it provides guidelines for practitioners and researchers in the modern era.In the future,we will provide the working prototype of this model for proof of concept.
文摘This study introduces the Orbit Weighting Scheme(OWS),a novel approach aimed at enhancing the precision and efficiency of Vector Space information retrieval(IR)models,which have traditionally relied on weighting schemes like tf-idf and BM25.These conventional methods often struggle with accurately capturing document relevance,leading to inefficiencies in both retrieval performance and index size management.OWS proposes a dynamic weighting mechanism that evaluates the significance of terms based on their orbital position within the vector space,emphasizing term relationships and distribution patterns overlooked by existing models.Our research focuses on evaluating OWS’s impact on model accuracy using Information Retrieval metrics like Recall,Precision,InterpolatedAverage Precision(IAP),andMeanAverage Precision(MAP).Additionally,we assessOWS’s effectiveness in reducing the inverted index size,crucial for model efficiency.We compare OWS-based retrieval models against others using different schemes,including tf-idf variations and BM25Delta.Results reveal OWS’s superiority,achieving a 54%Recall and 81%MAP,and a notable 38%reduction in the inverted index size.This highlights OWS’s potential in optimizing retrieval processes and underscores the need for further research in this underrepresented area to fully leverage OWS’s capabilities in information retrieval methodologies.
基金Supported by the National Natural Science Foundation of China (No. 61072110)the Industrial Tackling Project of Shaanxi Province (2010K06-20)the Natural Science Foundation of Shaanxi Province (SJ08F15)
文摘Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Semantic Weighted Sum(NSWS) rule is established by defining a new feature of shots,semantic observation weight.The test video is detected based on the HMM and the NSWS rule,respectively.Finally,a fusion scheme based on logic distance is proposed and the detection results of the HMM and the NSWS rule are fused by optimal weights in the decision level,obtaining the final result.Experimental results indicate that the proposed method achieves 96.43% precision and 100% recall,which shows the effectiveness of this letter.
文摘The main focus of the article is the semantic analysis and genesis of the words that create the lexical base of the modern Azerbaijani language to a certain extent and belong to the roots system of the language.The goal is to restore the words which have gone through deformation and flexion for thousands of years to their initial forms.The concept of stem cells in genetics has also been utilized as an analogy method because the author believes that languages are living organisms too and they have words and elements functioning as stem cells.Thus,the principal idea is that the linguistic units and words entering the organic system of a language are deprivations of the aforementioned linguistic stem cells.The stem words and concepts-the original elements of a language are determined in the first place and all the following analyses are built upon them.Such studies contain a wide range of comparativist investigations as well.Examples from the Ancient Greek and Latin languages have also been used as comparativism objects.Discovery of such words will not only give us linguistic information but also objective historical information on different aspects.This fact can be considered one of the main reasons making this kind of study very significant.
基金supported by the National Nature Science Foundation of China under Grant No.61170295 the Project of National ministry under Grant No.A2120110006+2 种基金 the Co-Funding Project of Beijing Municipal Education Commission under Grant No.JD100060630 the Beijing Education Committee General Program under Grant No. KM201211232010 the National Nature Science Foundation of China under Grant NO. 61370065
文摘The global view of firewall policy conflict is important for administrators to optimize the policy.It has been lack of appropriate firewall policy global conflict analysis,existing methods focus on local conflict detection.We research the global conflict detection algorithm in this paper.We presented a semantic model that captures more complete classifications of the policy using knowledge concept in rough set.Based on this model,we presented the global conflict formal model,and represent it with OBDD(Ordered Binary Decision Diagram).Then we developed GFPCDA(Global Firewall Policy Conflict Detection Algorithm) algorithm to detect global conflict.In experiment,we evaluated the usability of our semantic model by eliminating the false positives and false negatives caused by incomplete policy semantic model,of a classical algorithm.We compared this algorithm with GFPCDA algorithm.The results show that GFPCDA detects conflicts more precisely and independently,and has better performance.
基金Project(60801053) supported by the National Natural Science Foundation of ChinaProject(4082025) supported by the Beijing Natural Science Foundation,China+4 种基金Project(20070004037) supported by the Doctoral Foundation of ChinaProject(2009JBM135,2011JBM023) supported by the Fundamental Research Funds for the Central Universities of ChinaProject(151139522) supported by the Hongguoyuan Innovative Talent Program of Beijing Jiaotong University,ChinaProject(YB20081000401) supported by the Beijing Excellent Doctoral Thesis Program,ChinaProject (2006CB303105) supported by the National Basic Research Program of China
文摘To improve motion graph based motion synthesis,semantic control was introduced.Hybrid motion features including both numerical and user-defined semantic relational features were extracted to encode the characteristic aspects contained in the character's poses of the given motion sequences.Motion templates were then automatically derived from the training motions for capturing the spatio-temporal characteristics of an entire given class of semantically related motions.The data streams of motion documents were automatically annotated with semantic motion class labels by matching their respective motion class templates.Finally,the semantic control was introduced into motion graph based human motion synthesis.Experiments of motion synthesis demonstrate the effectiveness of the approach which enables users higher level of semantically intuitive control and high quality in human motion synthesis from motion capture database.
文摘Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,water contamination,and general pollution.Environmental complaints represent the expressions of dissatisfaction with these issues.As the timeconsuming of managing a large number of complaints,text mining may be useful for automatically extracting information on stakeholder priorities and concerns.The paper used text mining and semantic network analysis to crawl relevant keywords about environmental complaints from two online complaint submission systems:online claim submission system of Regional Agency for Prevention,Environment and Energy(Arpae)(“Contact Arpae”);and Arpae's internal platform for environmental pollution(“Environmental incident reporting portal”)in the Emilia-Romagna Region,Italy.We evaluated the total of 2477 records and classified this information based on the claim topic(air pollution,water pollution,noise pollution,waste,odor,soil,weather-climate,sea-coast,and electromagnetic radiation)and geographical distribution.Then,this paper used natural language processing to extract keywords from the dataset,and classified keywords ranking higher in Term Frequency-Inverse Document Frequency(TF-IDF)based on the driver,pressure,state,impact,and response(DPSIR)framework.This study provided a systemic approach to understanding the interaction between people and environment in different geographical contexts and builds sustainable and healthy communities.The results showed that most complaints are from the public and associated with air pollution and odor.Factories(particularly foundries and ceramic industries)and farms are identified as the drivers of environmental issues.Citizen believed that environmental issues mainly affect human well-being.Moreover,the keywords of“odor”,“report”,“request”,“presence”,“municipality”,and“hours”were the most influential and meaningful concepts,as demonstrated by their high degree and betweenness centrality values.Keywords connecting odor(classified as impacts)and air pollution(classified as state)were the most important(such as“odor-burnt plastic”and“odor-acrid”).Complainants perceived odor annoyance as a primary environmental concern,possibly related to two main drivers:“odor-factory”and“odorsfarms”.The proposed approach has several theoretical and practical implications:text mining may quickly and efficiently address citizen needs,providing the basis toward automating(even partially)the complaint process;and the DPSIR framework might support the planning and organization of information and the identification of stakeholder concerns and priorities,as well as metrics and indicators for their assessment.Therefore,integration of the DPSIR framework with the text mining of environmental complaints might generate a comprehensive environmental knowledge base as a prerequisite for a wider exploitation of analysis to support decision-making processes and environmental management activities.
基金The National Natural Science Foundation of China(No60573090,60673139)
文摘In order to solve the problem that current search engines provide query-oriented searches rather than user-oriented ones, and that this improper orientation leads to the search engines' inability to meet the personalized requirements of users, a novel method based on probabilistic latent semantic analysis (PLSA) is proposed to convert query-oriented web search to user-oriented web search. First, a user profile represented as a user' s topics of interest vector is created by analyzing the user' s click through data based on PLSA, then the user' s queries are mapped into categories based on the user' s preferences, and finally the result list is re-ranked according to the user' s interests based on the new proposed method named user-oriented PageRank (UOPR). Experiments on real life datasets show that the user-oriented search system that adopts PLSA takes considerable consideration of user preferences and better satisfies a user' s personalized information needs.