An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic rela...An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic relativity of ontology concepts is used to describe complicated relationships of domains in the method.Candidate item sets with less semantic relativity are filtered to reduce the number of candidate item sets in association rules mining.An ontology hierarchy relationship is regarded as a directed acyclic graph rather than a hierarchy tree in the semantic relativity computation.Not only direct hierarchy relationships,but also non-direct hierarchy relationships and other typical semantic relationships are taken into account.Experimental results show that the proposed method can reduce the number of candidate item sets effectively and improve the efficiency of association rules mining.展开更多
Computational techniques have been adopted in medi-cal and biological systems for a long time. There is no doubt that the development and application of computational methods will render great help in better understan...Computational techniques have been adopted in medi-cal and biological systems for a long time. There is no doubt that the development and application of computational methods will render great help in better understanding biomedical and biological functions. Large amounts of datasets have been produced by biomedical and biological experiments and simulations. In order for researchers to gain knowledge from origi- nal data, nontrivial transformation is necessary, which is regarded as a critical link in the chain of knowledge acquisition, sharing, and reuse. Challenges that have been encountered include: how to efficiently and effectively represent human knowledge in formal computing models, how to take advantage of semantic text mining techniques rather than traditional syntactic text mining, and how to handle security issues during the knowledge sharing and reuse. This paper summarizes the state-of-the-art in these research directions. We aim to provide readers with an introduction of major computing themes to be applied to the medical and biological research.展开更多
This paper presents a cross-media semantic mining model (CSMM) based on object semantic. This model obtains object-level semantic information in terms of maximum probability principle. Then semantic templates are tr...This paper presents a cross-media semantic mining model (CSMM) based on object semantic. This model obtains object-level semantic information in terms of maximum probability principle. Then semantic templates are trained and constructed with STTS (Semantic Template Training System), which are taken as the bridge to realize the transition from various low-level media feature to object semantic. Furthermore, we put forward a kind of double layers metadata structure to efficaciously store and manage mined low-level feature and high-level semantic. This model has broad application in lots of domains such as intelligent retrieval engine, medical diagnoses, multimedia design and so on.展开更多
The integration of the two fast-developing scientific research areas Semantic Web and Web Mining is known as Semantic Web Mining. The huge increase in the amount of Semantic Web data became a perfect target for many r...The integration of the two fast-developing scientific research areas Semantic Web and Web Mining is known as Semantic Web Mining. The huge increase in the amount of Semantic Web data became a perfect target for many researchers to apply Data Mining techniques on it. This paper gives a detailed state-of-the-art survey of on-going research in this new area. It shows the positive effects of Semantic Web Mining, the obstacles faced by researchers and propose number of approaches to deal with the very complex and heterogeneous information and knowledge which are produced by the technologies of Semantic Web.展开更多
Based on the definition of component ontology, an effective component classification mechanism and a facet named component relationship are proposed. Then an application domain oriented, hierarchical component organiz...Based on the definition of component ontology, an effective component classification mechanism and a facet named component relationship are proposed. Then an application domain oriented, hierarchical component organization model is established. At last a hierarchical component semantic network (HCSN) described by ontology interchange language(OIL) is presented and then its function is described. Using HCSN and cooperating with other components retrieving algorithms based on component description, other components information and their assembly or composite modes related to the key component can be found. Based on HCSN, component directory library is catalogued and a prototype system is constructed. The prototype system proves that component library organization based on this model gives guarantee to the reliability of component assembly during program mining.展开更多
In this paper, a finite state machine approach is followed in order to find the semantic similarity of two sentences. The approach exploits the concept of bi-directional logic along with a semantic ordering approach. ...In this paper, a finite state machine approach is followed in order to find the semantic similarity of two sentences. The approach exploits the concept of bi-directional logic along with a semantic ordering approach. The core part of this approach is bi-directional logic of artificial intelligence. The bi-directional logic is implemented using Finite State Machine algorithm with slight modification. For finding the semantic similarity, keyword has played climactic importance. With the help of the keyword approach, it can be found easily at the sentence level according to this algorithm. The algorithm is proposed especially for Nepali texts. With the polarity of the individual keywords, the finite state machine is made and its final state determines its polarity. If two sentences are negatively polarized, they are said to be coherent, otherwise not. Similarly, if two sentences are of a positive nature, they are said to be coherence. For measuring the coherence (similarity), contextual concept is taken into consideration. The semantic approach, in this research, is a totally contextual based method. Two sentences are said to be semantically similar if they bear the same context. The total accuracy obtained in this algorithm is 90.16%.展开更多
随着大数据、人工智能技术的不断发展,大语言模型(Large Language Model,LLM)在知识挖掘、文档整合等领域显示出巨大的潜力。该文通过知识图谱构建、文本分类、信息检索等方法,对大语言模型的架构及其在不同场景下的应用进行探讨,并对...随着大数据、人工智能技术的不断发展,大语言模型(Large Language Model,LLM)在知识挖掘、文档整合等领域显示出巨大的潜力。该文通过知识图谱构建、文本分类、信息检索等方法,对大语言模型的架构及其在不同场景下的应用进行探讨,并对知识的提炼和整合进行深入探讨。研究如何提高多文档协同处理的效率,通过标准化的结构和语义的融合技术。并结合实际案例分析,展示大语言模型在复杂知识体系中的应用效果,以供实际运用大语言模型时参考。展开更多
Digital mine is the only way for the development of mining industry in China. Due to lack of appropriate standards and norms, and different awareness in the field of digital mine among academia and industry insiders, ...Digital mine is the only way for the development of mining industry in China. Due to lack of appropriate standards and norms, and different awareness in the field of digital mine among academia and industry insiders, the meaning for digital mine is still unclear. Starting from the nature of mining and removing of views of specialized fields, this paper constructs formal ontology for digital mine and proposes the four levels for it. The ontology clarifies the concept world for digital mine, defines the meaning of concepts and relations clearly, provides a reference for the standard construction for digital mine and provides a unified semantic framework for the integration of heterogeneous mine data. Meanwhile, it can provide formal reasoning knowledge for expert system of digital mine and improve the intelligence and automation while the machine automatically interpreting and processing mine spatial data.展开更多
The goal of this project is to use the Semantic Web Technologies and Data Mining for disease diagnosis to assist health care professionals regarding the possible medication and drug to prescribe (Drug recommendation) ...The goal of this project is to use the Semantic Web Technologies and Data Mining for disease diagnosis to assist health care professionals regarding the possible medication and drug to prescribe (Drug recommendation) according to the features of the patient. Numerous Decision Support Systems (DSS) and Expert Systems allow medical collaboration, like in the differential diagnosis specific or general. But, a medical recommendation system using both Semantic Web technologies and Data mining has not yet been developed which initiated this work. However, it should be mentioned that there are several system references about medicine or active ingredient interactions, but their final goal is not the Drug recommendation which uses above technologies. With this project we try to provide an assistant to the doctor for better recommendations. The patient will also able to use this system for explanation of drugs, food interaction and side effects of corresponding drugs.展开更多
Global changes took place at a neck-breaking speed in lots of fields along with the Web 2.0 era, which can be stated as the new Internet trend. Web pages which once were a statical structure that can be said to become...Global changes took place at a neck-breaking speed in lots of fields along with the Web 2.0 era, which can be stated as the new Internet trend. Web pages which once were a statical structure that can be said to become dynamic pages created by users, and in this regard they can be said to have been democratized by evolving. Social media, which were structured alongside with this era, by providing a large data flow for businesses, present new and improvable opportunities in the field of creating effective strategies. There are lots of blogs in today's Internet environment which includes customer ideas regarding the products/services that they possess. This environment, which in a way globalizes the customer ideas, is a new medium suitable for examination in terms of its increasing the business-customer interaction and due to its transporter nature; it provides the text data that may be analyzed in the field of Customer Relationship Management to businesses. Thus, businesses should follow blog environments to see how the product/service they provide is greeted in terms of the customer focus and it should be seen as an important job on which they can conduct effective analyses. For this purpose, a model proposal that will assign the ideas to the Turkish blogs was given in the study. Opinion mining methods were used in the model, and so to perceive a general look-on about products/services, a methodology was devised, which will assign the text based opinion data on the Turkish blogs to the poles. Success of the pole assignment of the model is evaluated with the precision measure.展开更多
Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,wa...Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,water contamination,and general pollution.Environmental complaints represent the expressions of dissatisfaction with these issues.As the timeconsuming of managing a large number of complaints,text mining may be useful for automatically extracting information on stakeholder priorities and concerns.The paper used text mining and semantic network analysis to crawl relevant keywords about environmental complaints from two online complaint submission systems:online claim submission system of Regional Agency for Prevention,Environment and Energy(Arpae)(“Contact Arpae”);and Arpae's internal platform for environmental pollution(“Environmental incident reporting portal”)in the Emilia-Romagna Region,Italy.We evaluated the total of 2477 records and classified this information based on the claim topic(air pollution,water pollution,noise pollution,waste,odor,soil,weather-climate,sea-coast,and electromagnetic radiation)and geographical distribution.Then,this paper used natural language processing to extract keywords from the dataset,and classified keywords ranking higher in Term Frequency-Inverse Document Frequency(TF-IDF)based on the driver,pressure,state,impact,and response(DPSIR)framework.This study provided a systemic approach to understanding the interaction between people and environment in different geographical contexts and builds sustainable and healthy communities.The results showed that most complaints are from the public and associated with air pollution and odor.Factories(particularly foundries and ceramic industries)and farms are identified as the drivers of environmental issues.Citizen believed that environmental issues mainly affect human well-being.Moreover,the keywords of“odor”,“report”,“request”,“presence”,“municipality”,and“hours”were the most influential and meaningful concepts,as demonstrated by their high degree and betweenness centrality values.Keywords connecting odor(classified as impacts)and air pollution(classified as state)were the most important(such as“odor-burnt plastic”and“odor-acrid”).Complainants perceived odor annoyance as a primary environmental concern,possibly related to two main drivers:“odor-factory”and“odorsfarms”.The proposed approach has several theoretical and practical implications:text mining may quickly and efficiently address citizen needs,providing the basis toward automating(even partially)the complaint process;and the DPSIR framework might support the planning and organization of information and the identification of stakeholder concerns and priorities,as well as metrics and indicators for their assessment.Therefore,integration of the DPSIR framework with the text mining of environmental complaints might generate a comprehensive environmental knowledge base as a prerequisite for a wider exploitation of analysis to support decision-making processes and environmental management activities.展开更多
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Science and Technology Fund of China University of Mining and Technology(No.2007B016)
文摘An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic relativity of ontology concepts is used to describe complicated relationships of domains in the method.Candidate item sets with less semantic relativity are filtered to reduce the number of candidate item sets in association rules mining.An ontology hierarchy relationship is regarded as a directed acyclic graph rather than a hierarchy tree in the semantic relativity computation.Not only direct hierarchy relationships,but also non-direct hierarchy relationships and other typical semantic relationships are taken into account.Experimental results show that the proposed method can reduce the number of candidate item sets effectively and improve the efficiency of association rules mining.
文摘Computational techniques have been adopted in medi-cal and biological systems for a long time. There is no doubt that the development and application of computational methods will render great help in better understanding biomedical and biological functions. Large amounts of datasets have been produced by biomedical and biological experiments and simulations. In order for researchers to gain knowledge from origi- nal data, nontrivial transformation is necessary, which is regarded as a critical link in the chain of knowledge acquisition, sharing, and reuse. Challenges that have been encountered include: how to efficiently and effectively represent human knowledge in formal computing models, how to take advantage of semantic text mining techniques rather than traditional syntactic text mining, and how to handle security issues during the knowledge sharing and reuse. This paper summarizes the state-of-the-art in these research directions. We aim to provide readers with an introduction of major computing themes to be applied to the medical and biological research.
基金Supported by the National Basic Research Program of China 973 Program (2007CB310801)the Specialized Research Fund for the Doctoral Program of Higer Education of China (20070486064)+1 种基金the Natural Science Foundation of Hubei Province (2007ABA038)the Programme of Introducing Talents of Discipline to Universities (B07037)
文摘This paper presents a cross-media semantic mining model (CSMM) based on object semantic. This model obtains object-level semantic information in terms of maximum probability principle. Then semantic templates are trained and constructed with STTS (Semantic Template Training System), which are taken as the bridge to realize the transition from various low-level media feature to object semantic. Furthermore, we put forward a kind of double layers metadata structure to efficaciously store and manage mined low-level feature and high-level semantic. This model has broad application in lots of domains such as intelligent retrieval engine, medical diagnoses, multimedia design and so on.
文摘The integration of the two fast-developing scientific research areas Semantic Web and Web Mining is known as Semantic Web Mining. The huge increase in the amount of Semantic Web data became a perfect target for many researchers to apply Data Mining techniques on it. This paper gives a detailed state-of-the-art survey of on-going research in this new area. It shows the positive effects of Semantic Web Mining, the obstacles faced by researchers and propose number of approaches to deal with the very complex and heterogeneous information and knowledge which are produced by the technologies of Semantic Web.
文摘Based on the definition of component ontology, an effective component classification mechanism and a facet named component relationship are proposed. Then an application domain oriented, hierarchical component organization model is established. At last a hierarchical component semantic network (HCSN) described by ontology interchange language(OIL) is presented and then its function is described. Using HCSN and cooperating with other components retrieving algorithms based on component description, other components information and their assembly or composite modes related to the key component can be found. Based on HCSN, component directory library is catalogued and a prototype system is constructed. The prototype system proves that component library organization based on this model gives guarantee to the reliability of component assembly during program mining.
文摘In this paper, a finite state machine approach is followed in order to find the semantic similarity of two sentences. The approach exploits the concept of bi-directional logic along with a semantic ordering approach. The core part of this approach is bi-directional logic of artificial intelligence. The bi-directional logic is implemented using Finite State Machine algorithm with slight modification. For finding the semantic similarity, keyword has played climactic importance. With the help of the keyword approach, it can be found easily at the sentence level according to this algorithm. The algorithm is proposed especially for Nepali texts. With the polarity of the individual keywords, the finite state machine is made and its final state determines its polarity. If two sentences are negatively polarized, they are said to be coherent, otherwise not. Similarly, if two sentences are of a positive nature, they are said to be coherence. For measuring the coherence (similarity), contextual concept is taken into consideration. The semantic approach, in this research, is a totally contextual based method. Two sentences are said to be semantically similar if they bear the same context. The total accuracy obtained in this algorithm is 90.16%.
文摘随着大数据、人工智能技术的不断发展,大语言模型(Large Language Model,LLM)在知识挖掘、文档整合等领域显示出巨大的潜力。该文通过知识图谱构建、文本分类、信息检索等方法,对大语言模型的架构及其在不同场景下的应用进行探讨,并对知识的提炼和整合进行深入探讨。研究如何提高多文档协同处理的效率,通过标准化的结构和语义的融合技术。并结合实际案例分析,展示大语言模型在复杂知识体系中的应用效果,以供实际运用大语言模型时参考。
基金Project(41001226)supported by the National Natural Science Foundation of ChinaProject(2009CB226107)supported by the National Basic Research Program of China+1 种基金Project(2010B170006)supported by the Natural Science Foundation of Education Department of Henan Province,ChinaProject(KLM201007)supported by Key Laboratory of Mine Spatial Information Technologies,National Administration of Surveying,Mapping and Geoinformation
文摘Digital mine is the only way for the development of mining industry in China. Due to lack of appropriate standards and norms, and different awareness in the field of digital mine among academia and industry insiders, the meaning for digital mine is still unclear. Starting from the nature of mining and removing of views of specialized fields, this paper constructs formal ontology for digital mine and proposes the four levels for it. The ontology clarifies the concept world for digital mine, defines the meaning of concepts and relations clearly, provides a reference for the standard construction for digital mine and provides a unified semantic framework for the integration of heterogeneous mine data. Meanwhile, it can provide formal reasoning knowledge for expert system of digital mine and improve the intelligence and automation while the machine automatically interpreting and processing mine spatial data.
文摘The goal of this project is to use the Semantic Web Technologies and Data Mining for disease diagnosis to assist health care professionals regarding the possible medication and drug to prescribe (Drug recommendation) according to the features of the patient. Numerous Decision Support Systems (DSS) and Expert Systems allow medical collaboration, like in the differential diagnosis specific or general. But, a medical recommendation system using both Semantic Web technologies and Data mining has not yet been developed which initiated this work. However, it should be mentioned that there are several system references about medicine or active ingredient interactions, but their final goal is not the Drug recommendation which uses above technologies. With this project we try to provide an assistant to the doctor for better recommendations. The patient will also able to use this system for explanation of drugs, food interaction and side effects of corresponding drugs.
文摘Global changes took place at a neck-breaking speed in lots of fields along with the Web 2.0 era, which can be stated as the new Internet trend. Web pages which once were a statical structure that can be said to become dynamic pages created by users, and in this regard they can be said to have been democratized by evolving. Social media, which were structured alongside with this era, by providing a large data flow for businesses, present new and improvable opportunities in the field of creating effective strategies. There are lots of blogs in today's Internet environment which includes customer ideas regarding the products/services that they possess. This environment, which in a way globalizes the customer ideas, is a new medium suitable for examination in terms of its increasing the business-customer interaction and due to its transporter nature; it provides the text data that may be analyzed in the field of Customer Relationship Management to businesses. Thus, businesses should follow blog environments to see how the product/service they provide is greeted in terms of the customer focus and it should be seen as an important job on which they can conduct effective analyses. For this purpose, a model proposal that will assign the ideas to the Turkish blogs was given in the study. Opinion mining methods were used in the model, and so to perceive a general look-on about products/services, a methodology was devised, which will assign the text based opinion data on the Turkish blogs to the poles. Success of the pole assignment of the model is evaluated with the precision measure.
文摘Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,water contamination,and general pollution.Environmental complaints represent the expressions of dissatisfaction with these issues.As the timeconsuming of managing a large number of complaints,text mining may be useful for automatically extracting information on stakeholder priorities and concerns.The paper used text mining and semantic network analysis to crawl relevant keywords about environmental complaints from two online complaint submission systems:online claim submission system of Regional Agency for Prevention,Environment and Energy(Arpae)(“Contact Arpae”);and Arpae's internal platform for environmental pollution(“Environmental incident reporting portal”)in the Emilia-Romagna Region,Italy.We evaluated the total of 2477 records and classified this information based on the claim topic(air pollution,water pollution,noise pollution,waste,odor,soil,weather-climate,sea-coast,and electromagnetic radiation)and geographical distribution.Then,this paper used natural language processing to extract keywords from the dataset,and classified keywords ranking higher in Term Frequency-Inverse Document Frequency(TF-IDF)based on the driver,pressure,state,impact,and response(DPSIR)framework.This study provided a systemic approach to understanding the interaction between people and environment in different geographical contexts and builds sustainable and healthy communities.The results showed that most complaints are from the public and associated with air pollution and odor.Factories(particularly foundries and ceramic industries)and farms are identified as the drivers of environmental issues.Citizen believed that environmental issues mainly affect human well-being.Moreover,the keywords of“odor”,“report”,“request”,“presence”,“municipality”,and“hours”were the most influential and meaningful concepts,as demonstrated by their high degree and betweenness centrality values.Keywords connecting odor(classified as impacts)and air pollution(classified as state)were the most important(such as“odor-burnt plastic”and“odor-acrid”).Complainants perceived odor annoyance as a primary environmental concern,possibly related to two main drivers:“odor-factory”and“odorsfarms”.The proposed approach has several theoretical and practical implications:text mining may quickly and efficiently address citizen needs,providing the basis toward automating(even partially)the complaint process;and the DPSIR framework might support the planning and organization of information and the identification of stakeholder concerns and priorities,as well as metrics and indicators for their assessment.Therefore,integration of the DPSIR framework with the text mining of environmental complaints might generate a comprehensive environmental knowledge base as a prerequisite for a wider exploitation of analysis to support decision-making processes and environmental management activities.