This paper focuses on semantic knowl- edge acquisition from blogs with the proposed tag- topic model. The model extends the Latent Dirichlet Allocation (LDA) model by adding a tag layer be- tween the document and th...This paper focuses on semantic knowl- edge acquisition from blogs with the proposed tag- topic model. The model extends the Latent Dirichlet Allocation (LDA) model by adding a tag layer be- tween the document and the topic. Each document is represented by a mixture of tags; each tag is as- sociated with a multinomial distribution over topics and each topic is associated with a multinomial dis- trNution over words. After parameter estimation, the tags are used to descrNe the underlying topics. Thus the latent semantic knowledge within the top- ics could be represented explicitly. The tags are treated as concepts, and the top-N words from the top topics are selected as related words of the con- cepts. Then PMI-IR is employed to compute the re- latedness between each tag-word pair and noisy words with low correlation removed to improve the quality of the semantic knowledge. Experiment re- sults show that the proposed method can effectively capture semantic knowledge, especially the polyse- me and synonym.展开更多
Mining penetration testing semantic knowledge hidden in vast amounts of raw penetration testing data is of vital importance for automated penetration testing.Associative rule mining,a data mining technique,has been st...Mining penetration testing semantic knowledge hidden in vast amounts of raw penetration testing data is of vital importance for automated penetration testing.Associative rule mining,a data mining technique,has been studied and explored for a long time.However,few studies have focused on knowledge discovery in the penetration testing area.The experimental result reveals that the long-tail distribution of penetration testing data nullifies the effectiveness of associative rule mining algorithms that are based on frequent pattern.To address this problem,a Bayesian inference based penetration semantic knowledge mining algorithm is proposed.First,a directed bipartite graph model,a kind of Bayesian network,is constructed to formalize penetration testing data.Then,we adopt the maximum likelihood estimate method to optimize the model parameters and decompose a large Bayesian network into smaller networks based on conditional independence of variables for improved solution efficiency.Finally,irrelevant variable elimination is adopted to extract penetration semantic knowledge from the conditional probability distribution of the model.The experimental results show that the proposed method can discover penetration semantic knowledge from raw penetration testing data effectively and efficiently.展开更多
A concept-based approach is expected to resolve the word sense ambiguities in information retrieval and apply the semantic importance of the concepts, instead of the term frequency, to representing the contents of a d...A concept-based approach is expected to resolve the word sense ambiguities in information retrieval and apply the semantic importance of the concepts, instead of the term frequency, to representing the contents of a document. Consequently, a formalized document framework is proposed. The document framework is used to express the meaning of a document with the concepts which are expressed by high semantic importance. The framework consists of two parts: the "domain" information and the "situation & background" information of a document. A document-extracting algorithm and a two-stage smoothing method are also proposed. The quantification of the similarity between the query and the document framework depends on the smoothing method. The experiments on the TREC6 collection demonstrate the feasibility and effectiveness of the proposed approach in information retrieval tasks. The average recall level precision of the model using the proposed approach is about 10% higher than that of traditional ones.展开更多
Engineering and research teams often develop new products and technologies by referring to inventions described in patent databases. Efficient patent analysis builds R&D knowledge, reduces new product development tim...Engineering and research teams often develop new products and technologies by referring to inventions described in patent databases. Efficient patent analysis builds R&D knowledge, reduces new product development time, increases market success, and reduces potential patent infringement. Thus, it is beneficial to automatically and systematically extract information from patent documents in order to improve knowledge sharing and collaboration among R&D team members. In this research, patents are summarized using a combined ontology based and TF-IDF concept clustering approach. The ontology captures the general knowledge and core meaning of patents in a given domain. Then, the proposed methodology extracts, clusters, and integrates the content of a patent to derive a summary and a cluster tree diagram of key terms. Patents from the International Patent Classification (IPC) codes B25C, B25D, B25F (categories for power hand tools) and B24B, C09G and H011 (categories for chemical mechanical polishing) are used as case studies to evaluate the compression ratio, retention ratio, and classification accuracy of the summarization results. The evaluation uses statistics to represent the summary generation and its compression ratio, the ontology based keyword extraction retention ratio, and the summary classification accuracy. The results show that the ontology based approach yields about the same compression ratio as previous non-ontology based research but yields on average an 11% improvement for the retention ratio and a 14% improvement for classification accuracy.展开更多
With the development of artificial intelligence and robotics, the study on service robot has made a significant progress in recent years. Service robot is required to perceive users and environment in unstructured dom...With the development of artificial intelligence and robotics, the study on service robot has made a significant progress in recent years. Service robot is required to perceive users and environment in unstructured domestic environment. Based on the perception,service robot should be capable of understanding the situation and discover service task. So robot can assist humans for home service or health care more accurately and with initiative. Human can focus on the salient things from the mass observation information. Humans are capable of utilizing semantic knowledge to make some plans based on their understanding of the environment. Through intelligent space platform, we are trying to apply this process to service robot. A selective attention guided initiatively semantic cognition algorithm in intelligent space is proposed in this paper. It is specifically designed to provide robots with the cognition needed for performing service tasks. At first, an attention selection model is built based on saliency computing and key area. The area which is highly relevant to service task could be located and referred as focus of attention(FOA). Second, a recognition algorithm for FOA is proposed based on a neural network. Some common objects and user behavior are recognized in this step. At last, a unified semantic knowledge base and corresponding reasoning engine is proposed using recognition result. Related experiments in a real life scenario demonstrated that our approach is able to mimic the recognition process in humans, make robots understand the environment and discover service task based on its own cognition. In this way, service robots can act smarter and achieve better service efficiency in their daily work.展开更多
The BRAIN project recently announced by the president Obama is the reflection of unrelenting human quest for cracking the brain code, the patterns of neuronal activity that define who we are and what we are. While the...The BRAIN project recently announced by the president Obama is the reflection of unrelenting human quest for cracking the brain code, the patterns of neuronal activity that define who we are and what we are. While the Brain Activity Mapping proposal has rightly emphasized on the need to develop new technologies for measuring every spike from every neuron, it might be helpful to consider both the theoretical and experimental aspects that would accelerate our search for the organizing principles of the brain code. Here we share several insights and lessons from the similar proposal, namely, Brain Decoding Project that we initiated since 2007. We provide a specific example in our initial mapping of real-time memory traces from one part of the memory circuit, namely, the CA1 region of the mouse hippocampus. We show how innovative behavioral tasks and appropriate mathematical analyses of large datasets can play equally, if not more, important roles in uncovering the specific-to-general feature-coding cell assembly mechanism by which episodic memory, semantic knowledge, and imagination are generated and organized. Our own experiences suggest that the bottleneck of the Brain Project is not only at merely developing additional new technologies, but also the lack of efficient avenues to disseminate cutting edge platforms and decoding expertise to neuroscience community. Therefore, we propose that in order to harness unique insights and extensive knowledge from various investigators working in diverse neuroscience subfields, ranging from perception and emotion to memory and social behaviors, the BRAIN project should create a set of International and National Brain Decoding Centers at which cutting-edge recording technologies and expertise on analyzing large datasets analyses can be made readily available to the entire community of neuroscientists who can apply and schedule to perform cutting-edge research.展开更多
基金supported by the National Natural Science Foundation of China under Grants No.90920005,No.61003192the Key Project of Philosophy and Social Sciences Research,Ministry of Education under Grant No.08JZD0032+3 种基金the Program of Introducing Talents of Discipline to Universities under Grant No.B07042the Natural Science Foundation of Hubei Province under Grants No.2011CDA034,No.2009CDB145Chenguang Program of Wuhan Municipality under Grant No.201050231067the selfdetermined research funds of CCNU from the colleges' basic research and operation of MOE under Grants No.CCNU10A02009,No.CCNU10C01005
文摘This paper focuses on semantic knowl- edge acquisition from blogs with the proposed tag- topic model. The model extends the Latent Dirichlet Allocation (LDA) model by adding a tag layer be- tween the document and the topic. Each document is represented by a mixture of tags; each tag is as- sociated with a multinomial distribution over topics and each topic is associated with a multinomial dis- trNution over words. After parameter estimation, the tags are used to descrNe the underlying topics. Thus the latent semantic knowledge within the top- ics could be represented explicitly. The tags are treated as concepts, and the top-N words from the top topics are selected as related words of the con- cepts. Then PMI-IR is employed to compute the re- latedness between each tag-word pair and noisy words with low correlation removed to improve the quality of the semantic knowledge. Experiment re- sults show that the proposed method can effectively capture semantic knowledge, especially the polyse- me and synonym.
基金the National Natural Science Foundation of China No.61502528.
文摘Mining penetration testing semantic knowledge hidden in vast amounts of raw penetration testing data is of vital importance for automated penetration testing.Associative rule mining,a data mining technique,has been studied and explored for a long time.However,few studies have focused on knowledge discovery in the penetration testing area.The experimental result reveals that the long-tail distribution of penetration testing data nullifies the effectiveness of associative rule mining algorithms that are based on frequent pattern.To address this problem,a Bayesian inference based penetration semantic knowledge mining algorithm is proposed.First,a directed bipartite graph model,a kind of Bayesian network,is constructed to formalize penetration testing data.Then,we adopt the maximum likelihood estimate method to optimize the model parameters and decompose a large Bayesian network into smaller networks based on conditional independence of variables for improved solution efficiency.Finally,irrelevant variable elimination is adopted to extract penetration semantic knowledge from the conditional probability distribution of the model.The experimental results show that the proposed method can discover penetration semantic knowledge from raw penetration testing data effectively and efficiently.
基金The National Basic Research Program of China(973Program)(No.2004CB318104),the Knowledge Innovation Pro-gram of Chinese Academy of Sciences (No.13CX04).
文摘A concept-based approach is expected to resolve the word sense ambiguities in information retrieval and apply the semantic importance of the concepts, instead of the term frequency, to representing the contents of a document. Consequently, a formalized document framework is proposed. The document framework is used to express the meaning of a document with the concepts which are expressed by high semantic importance. The framework consists of two parts: the "domain" information and the "situation & background" information of a document. A document-extracting algorithm and a two-stage smoothing method are also proposed. The quantification of the similarity between the query and the document framework depends on the smoothing method. The experiments on the TREC6 collection demonstrate the feasibility and effectiveness of the proposed approach in information retrieval tasks. The average recall level precision of the model using the proposed approach is about 10% higher than that of traditional ones.
基金supported by National Science Council research grants
文摘Engineering and research teams often develop new products and technologies by referring to inventions described in patent databases. Efficient patent analysis builds R&D knowledge, reduces new product development time, increases market success, and reduces potential patent infringement. Thus, it is beneficial to automatically and systematically extract information from patent documents in order to improve knowledge sharing and collaboration among R&D team members. In this research, patents are summarized using a combined ontology based and TF-IDF concept clustering approach. The ontology captures the general knowledge and core meaning of patents in a given domain. Then, the proposed methodology extracts, clusters, and integrates the content of a patent to derive a summary and a cluster tree diagram of key terms. Patents from the International Patent Classification (IPC) codes B25C, B25D, B25F (categories for power hand tools) and B24B, C09G and H011 (categories for chemical mechanical polishing) are used as case studies to evaluate the compression ratio, retention ratio, and classification accuracy of the summarization results. The evaluation uses statistics to represent the summary generation and its compression ratio, the ontology based keyword extraction retention ratio, and the summary classification accuracy. The results show that the ontology based approach yields about the same compression ratio as previous non-ontology based research but yields on average an 11% improvement for the retention ratio and a 14% improvement for classification accuracy.
基金supported by National Natural Science Foundation of China (Nos. 61773239, 91748115 and 61603213)Natural Science Foundation of Shandong Province (No. ZR2015FM007)Taishan Scholars Program of Shandong Province
文摘With the development of artificial intelligence and robotics, the study on service robot has made a significant progress in recent years. Service robot is required to perceive users and environment in unstructured domestic environment. Based on the perception,service robot should be capable of understanding the situation and discover service task. So robot can assist humans for home service or health care more accurately and with initiative. Human can focus on the salient things from the mass observation information. Humans are capable of utilizing semantic knowledge to make some plans based on their understanding of the environment. Through intelligent space platform, we are trying to apply this process to service robot. A selective attention guided initiatively semantic cognition algorithm in intelligent space is proposed in this paper. It is specifically designed to provide robots with the cognition needed for performing service tasks. At first, an attention selection model is built based on saliency computing and key area. The area which is highly relevant to service task could be located and referred as focus of attention(FOA). Second, a recognition algorithm for FOA is proposed based on a neural network. Some common objects and user behavior are recognized in this step. At last, a unified semantic knowledge base and corresponding reasoning engine is proposed using recognition result. Related experiments in a real life scenario demonstrated that our approach is able to mimic the recognition process in humans, make robots understand the environment and discover service task based on its own cognition. In this way, service robots can act smarter and achieve better service efficiency in their daily work.
基金Georgia Research Alliance for funding the Brain Decoding Initiative (2007 present)Yunnan Province Department of Science and Technology for the support of our work
文摘The BRAIN project recently announced by the president Obama is the reflection of unrelenting human quest for cracking the brain code, the patterns of neuronal activity that define who we are and what we are. While the Brain Activity Mapping proposal has rightly emphasized on the need to develop new technologies for measuring every spike from every neuron, it might be helpful to consider both the theoretical and experimental aspects that would accelerate our search for the organizing principles of the brain code. Here we share several insights and lessons from the similar proposal, namely, Brain Decoding Project that we initiated since 2007. We provide a specific example in our initial mapping of real-time memory traces from one part of the memory circuit, namely, the CA1 region of the mouse hippocampus. We show how innovative behavioral tasks and appropriate mathematical analyses of large datasets can play equally, if not more, important roles in uncovering the specific-to-general feature-coding cell assembly mechanism by which episodic memory, semantic knowledge, and imagination are generated and organized. Our own experiences suggest that the bottleneck of the Brain Project is not only at merely developing additional new technologies, but also the lack of efficient avenues to disseminate cutting edge platforms and decoding expertise to neuroscience community. Therefore, we propose that in order to harness unique insights and extensive knowledge from various investigators working in diverse neuroscience subfields, ranging from perception and emotion to memory and social behaviors, the BRAIN project should create a set of International and National Brain Decoding Centers at which cutting-edge recording technologies and expertise on analyzing large datasets analyses can be made readily available to the entire community of neuroscientists who can apply and schedule to perform cutting-edge research.