In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantag...In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantages of our system are threefold. First, it aims at proposing a strong and unified rule form based on first order structure that permits the representation and management of all the types of rules and their quality via some characteristics. Second, it leads to increase the quality of rules which conditions the quality of data cleaning. Third, it uses an appropriate knowledge acquisition process, which is the weakest task in the current rule and knowledge based systems. As several research works have shown that data cleaning is rather driven by domain knowledge than by data, we have identified and analyzed the properties that distinguish knowledge and rules from data for better determining the most components of the proposed system. In order to illustrate our system, we also present a first experiment with a case study at health sector where we demonstrate how the system is useful for the improvement of data quality. The autonomy, extensibility and platform-independency of the proposed rule management system facilitate its incorporation in any system that is interested in data quality management.展开更多
Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural...Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural network(RNN)model is proposed,which works for both simple and complex questions.First,the vector representations of questions are learned by the bidirectional long short-term memory(Bi-LSTM)model at the word and character levels,and named entities in questions are labeled by the conditional random field(CRF)model.Candidate entities are generated based on a dictionary,the disambiguation of candidate entities is realized based on predefined rules,and named entities mentioned in questions are linked to entities in knowledge base.Next,questions are classified into simple or complex questions by the machine learning method.Starting from the identified entities,for simple questions,one-hop relations are collected in the knowledge base as candidate relations;for complex questions,two-hop relations are collected as candidates.Finally,the multi-attention Bi-LSTM model is used to encode questions and candidate relations,compare their similarity,and return the candidate relation with the highest similarity as the result of relation linking.It is worth noting that the Bi-LSTM model with one attentions is adopted for simple questions,and the Bi-LSTM model with two attentions is adopted for complex questions.The experimental results show that,based on the effective entity linking method,the Bi-LSTM model with the attention mechanism improves the relation linking effectiveness of both simple and complex questions,which outperforms the existing relation linking methods based on graph algorithm or linguistics understanding.展开更多
Analysis results of the average annual sea levels in the Caspian Sea obtained from ground and satellite observations, corresponding to solar activity characteristics, magnetic field data, and length of day are present...Analysis results of the average annual sea levels in the Caspian Sea obtained from ground and satellite observations, corresponding to solar activity characteristics, magnetic field data, and length of day are presented. Spectra of the indicated processes were investigated and their approximation models were also built. Previously assumed statistical relationships between space-geophysical processes and Caspian Sea level(CSL) changes were confirmed. A close connection was revealed between the low-frequency models of the solar and geomagnetic activity parameters and the CSL changes. Predictions extending into the next decades showed a high probability of an increase in the CSL and a decrease of the compared space-geophysical parameters.展开更多
In 2015,it was adopted the 2030 Agenda for Sustainable Development to end poverty,protect the planet and ensure that all people enjoy peace and prosperity.The year after,17 Sustainable Development Goals(SDGs)officiall...In 2015,it was adopted the 2030 Agenda for Sustainable Development to end poverty,protect the planet and ensure that all people enjoy peace and prosperity.The year after,17 Sustainable Development Goals(SDGs)officially came into force.In 2015,GEO(Group on Earth Observation)declared to support the implementation of SDGs.The GEO Global Earth Observation System of Systems(GEOSS)required a change of paradigm,moving from a data-centric approach to a more knowledge-driven one.To this end,the GEO System-of-Systems(SoS)framework may refer to the well-known Data-Information-Knowledge-Wisdom(DIKW)paradigm.In the context of an Earth Observation(EO)SoS,a set of main elements are recognized as connecting links for generating knowledge from EO and non-EO data–e.g.social and economic datasets.These elements are:Essential Variables(EVs),Indicators and Indexes,Goals and Targets.Their generation and use requires the development of a SoS KB whose management process has evolved the GEOSS Software Ecosystem into a GEOSS Social Ecosystem.This includes:collect,formalize,publish,access,use,and update knowledge.ConnectinGEO project analysed the knowledge necessary to recognize,formalize,access,and use EVs.The analysis recognized GEOSS gaps providing recommendations on supporting global decision-making within and across different domains.展开更多
1 Introduction Information technology has been playing an ever-increasing role in geoscience.Sphisicated database platforms are essential for geological data storage,analysis and exchange of Big Data(Feblowitz,2013;Zh...1 Introduction Information technology has been playing an ever-increasing role in geoscience.Sphisicated database platforms are essential for geological data storage,analysis and exchange of Big Data(Feblowitz,2013;Zhang et al.,2016;Teng et al.,2016;Tian and Li,2018).The United States has built an information-sharing platform for state-owned scientific data as a national strategy.展开更多
In this paper we address the problem related to determination of the most suitable candidates for an M&A (Merger &Acquisition) scenario of Banks/Financial Institutions. During the pre-merger period of ...In this paper we address the problem related to determination of the most suitable candidates for an M&A (Merger &Acquisition) scenario of Banks/Financial Institutions. During the pre-merger period of an M&A, a number of candidates may be available to undergo the Merger/Acquisition, but all of them may not be suitable. The normal practice is to carry out a due diligence exercise to identify the candidates that should lead to optimum increase in shareholder value and customer satisfaction, post-merger. The due diligence ought to be able to determine those candidates that are unsuitable for merger, those candidates that are relatively suitable, and those that are most suitable. Towards achieving the above objective, we propose a Fuzzy Data Mining Framework wherein Fuzzy Cluster Analysis concept is used for advisability of merger of two banks and other Financial Institutions. Subsequently, we propose orchestration/composition of business processes of two banks into consolidated business process during Merger &Acquisition (M&A) scenario. Our paper discusses modeling of individual business process with UML, and the consolidation of the individual business process models by means of our proposed Knowledge Based approach.展开更多
Freebase is a large collaborative knowledge base and database of general, structured information for public use. Its structured data had been harvested from many sources, including individual, user-submitted wiki cont...Freebase is a large collaborative knowledge base and database of general, structured information for public use. Its structured data had been harvested from many sources, including individual, user-submitted wiki contributions. Its aim is to create a global resource so that people (and machines) can access common information more effectively which is mostly available in English. In this research work, we have tried to build the technique of creating the Freebase for Bengali language. Today the number of Bengali articles on the internet is growing day by day. So it has become a necessary to have a structured data store in Bengali. It consists of different types of concepts (topics) and relationships between those topics. These include different types of areas like popular culture (e.g. films, music, books, sports, television), location information (restaurants, geolocations, businesses), scholarly information (linguistics, biology, astronomy), birth place of (poets, politicians, actor, actress) and general knowledge (Wikipedia). It will be much more helpful for relation extraction or any kind of Natural Language Processing (NLP) works on Bengali language. In this work, we identified the technique of creating the Bengali Freebase and made a collection of Bengali data. We applied SPARQL query language to extract information from natural language (Bengali) documents such as Wikidata which is typically in RDF (Resource Description Format) triple format.展开更多
Based on the concept and research status of big data,we analyze and examine the importance of constructing the knowledge system of nursing science for the development of the nursing discipline in the context of big da...Based on the concept and research status of big data,we analyze and examine the importance of constructing the knowledge system of nursing science for the development of the nursing discipline in the context of big data and propose that it is necessary to establish big data centers for nursing science to share resources,unify language standards,improve professional nursing databases,and establish a knowledge system structure.展开更多
Recent text generation methods frequently learn node representations from graph‐based data via global or local aggregation,such as knowledge graphs.Since all nodes are connected directly,node global representation en...Recent text generation methods frequently learn node representations from graph‐based data via global or local aggregation,such as knowledge graphs.Since all nodes are connected directly,node global representation encoding enables direct communication between two distant nodes while disregarding graph topology.Node local representation encoding,which captures the graph structure,considers the connections between nearby nodes but misses out onlong‐range relations.A quantum‐like approach to learning bettercontextualised node embeddings is proposed using a fusion model that combines both encoding strategies.Our methods significantly improve on two graph‐to‐text datasets compared to state‐of‐the‐art models in various experiments.展开更多
文摘In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantages of our system are threefold. First, it aims at proposing a strong and unified rule form based on first order structure that permits the representation and management of all the types of rules and their quality via some characteristics. Second, it leads to increase the quality of rules which conditions the quality of data cleaning. Third, it uses an appropriate knowledge acquisition process, which is the weakest task in the current rule and knowledge based systems. As several research works have shown that data cleaning is rather driven by domain knowledge than by data, we have identified and analyzed the properties that distinguish knowledge and rules from data for better determining the most components of the proposed system. In order to illustrate our system, we also present a first experiment with a case study at health sector where we demonstrate how the system is useful for the improvement of data quality. The autonomy, extensibility and platform-independency of the proposed rule management system facilitate its incorporation in any system that is interested in data quality management.
基金The National Natural Science Foundation of China(No.61502095).
文摘Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural network(RNN)model is proposed,which works for both simple and complex questions.First,the vector representations of questions are learned by the bidirectional long short-term memory(Bi-LSTM)model at the word and character levels,and named entities in questions are labeled by the conditional random field(CRF)model.Candidate entities are generated based on a dictionary,the disambiguation of candidate entities is realized based on predefined rules,and named entities mentioned in questions are linked to entities in knowledge base.Next,questions are classified into simple or complex questions by the machine learning method.Starting from the identified entities,for simple questions,one-hop relations are collected in the knowledge base as candidate relations;for complex questions,two-hop relations are collected as candidates.Finally,the multi-attention Bi-LSTM model is used to encode questions and candidate relations,compare their similarity,and return the candidate relation with the highest similarity as the result of relation linking.It is worth noting that the Bi-LSTM model with one attentions is adopted for simple questions,and the Bi-LSTM model with two attentions is adopted for complex questions.The experimental results show that,based on the effective entity linking method,the Bi-LSTM model with the attention mechanism improves the relation linking effectiveness of both simple and complex questions,which outperforms the existing relation linking methods based on graph algorithm or linguistics understanding.
文摘Analysis results of the average annual sea levels in the Caspian Sea obtained from ground and satellite observations, corresponding to solar activity characteristics, magnetic field data, and length of day are presented. Spectra of the indicated processes were investigated and their approximation models were also built. Previously assumed statistical relationships between space-geophysical processes and Caspian Sea level(CSL) changes were confirmed. A close connection was revealed between the low-frequency models of the solar and geomagnetic activity parameters and the CSL changes. Predictions extending into the next decades showed a high probability of an increase in the CSL and a decrease of the compared space-geophysical parameters.
基金This work was supported by the European Commission,Directorate-General for Research and Innovation[ConnectinGEO grant#641538,ECOPOTENTIAL grant#641762,ERA-PLANET/GEOEssential grant#689443].
文摘In 2015,it was adopted the 2030 Agenda for Sustainable Development to end poverty,protect the planet and ensure that all people enjoy peace and prosperity.The year after,17 Sustainable Development Goals(SDGs)officially came into force.In 2015,GEO(Group on Earth Observation)declared to support the implementation of SDGs.The GEO Global Earth Observation System of Systems(GEOSS)required a change of paradigm,moving from a data-centric approach to a more knowledge-driven one.To this end,the GEO System-of-Systems(SoS)framework may refer to the well-known Data-Information-Knowledge-Wisdom(DIKW)paradigm.In the context of an Earth Observation(EO)SoS,a set of main elements are recognized as connecting links for generating knowledge from EO and non-EO data–e.g.social and economic datasets.These elements are:Essential Variables(EVs),Indicators and Indexes,Goals and Targets.Their generation and use requires the development of a SoS KB whose management process has evolved the GEOSS Software Ecosystem into a GEOSS Social Ecosystem.This includes:collect,formalize,publish,access,use,and update knowledge.ConnectinGEO project analysed the knowledge necessary to recognize,formalize,access,and use EVs.The analysis recognized GEOSS gaps providing recommendations on supporting global decision-making within and across different domains.
基金granted by the National Science&Technology Major Projects of China(Grant No.2016ZX05033).
文摘1 Introduction Information technology has been playing an ever-increasing role in geoscience.Sphisicated database platforms are essential for geological data storage,analysis and exchange of Big Data(Feblowitz,2013;Zhang et al.,2016;Teng et al.,2016;Tian and Li,2018).The United States has built an information-sharing platform for state-owned scientific data as a national strategy.
文摘In this paper we address the problem related to determination of the most suitable candidates for an M&A (Merger &Acquisition) scenario of Banks/Financial Institutions. During the pre-merger period of an M&A, a number of candidates may be available to undergo the Merger/Acquisition, but all of them may not be suitable. The normal practice is to carry out a due diligence exercise to identify the candidates that should lead to optimum increase in shareholder value and customer satisfaction, post-merger. The due diligence ought to be able to determine those candidates that are unsuitable for merger, those candidates that are relatively suitable, and those that are most suitable. Towards achieving the above objective, we propose a Fuzzy Data Mining Framework wherein Fuzzy Cluster Analysis concept is used for advisability of merger of two banks and other Financial Institutions. Subsequently, we propose orchestration/composition of business processes of two banks into consolidated business process during Merger &Acquisition (M&A) scenario. Our paper discusses modeling of individual business process with UML, and the consolidation of the individual business process models by means of our proposed Knowledge Based approach.
文摘Freebase is a large collaborative knowledge base and database of general, structured information for public use. Its structured data had been harvested from many sources, including individual, user-submitted wiki contributions. Its aim is to create a global resource so that people (and machines) can access common information more effectively which is mostly available in English. In this research work, we have tried to build the technique of creating the Freebase for Bengali language. Today the number of Bengali articles on the internet is growing day by day. So it has become a necessary to have a structured data store in Bengali. It consists of different types of concepts (topics) and relationships between those topics. These include different types of areas like popular culture (e.g. films, music, books, sports, television), location information (restaurants, geolocations, businesses), scholarly information (linguistics, biology, astronomy), birth place of (poets, politicians, actor, actress) and general knowledge (Wikipedia). It will be much more helpful for relation extraction or any kind of Natural Language Processing (NLP) works on Bengali language. In this work, we identified the technique of creating the Bengali Freebase and made a collection of Bengali data. We applied SPARQL query language to extract information from natural language (Bengali) documents such as Wikidata which is typically in RDF (Resource Description Format) triple format.
基金This work was supported by National Natural Science Foundation of China(No.71573162)
文摘Based on the concept and research status of big data,we analyze and examine the importance of constructing the knowledge system of nursing science for the development of the nursing discipline in the context of big data and propose that it is necessary to establish big data centers for nursing science to share resources,unify language standards,improve professional nursing databases,and establish a knowledge system structure.
基金supported by the National Natural Science Foundation of China under Grant(62077015)the Key Laboratory of Intelligent Education Technology and Application of Zhejiang Province,Zhejiang Normal University,Zhejiang,China,the Key Research and Development Program of Zhejiang Province(No.2021C03141)the National Key R&D Program of China under Grant(2022YFC3303600).
文摘Recent text generation methods frequently learn node representations from graph‐based data via global or local aggregation,such as knowledge graphs.Since all nodes are connected directly,node global representation encoding enables direct communication between two distant nodes while disregarding graph topology.Node local representation encoding,which captures the graph structure,considers the connections between nearby nodes but misses out onlong‐range relations.A quantum‐like approach to learning bettercontextualised node embeddings is proposed using a fusion model that combines both encoding strategies.Our methods significantly improve on two graph‐to‐text datasets compared to state‐of‐the‐art models in various experiments.