Effective data communication is a crucial aspect of the Social Internet of Things(SIoT)and continues to be a significant research focus.This paper proposes a data forwarding algorithm based on Multidimensional Social ...Effective data communication is a crucial aspect of the Social Internet of Things(SIoT)and continues to be a significant research focus.This paper proposes a data forwarding algorithm based on Multidimensional Social Relations(MSRR)in SIoT to solve this problem.The proposed algorithm separates message forwarding into intra-and cross-community forwarding by analyzing interest traits and social connections among nodes.Three new metrics are defined:the intensity of node social relationships,node activity,and community connectivity.Within the community,messages are sent by determining which node is most similar to the sender by weighing the strength of social connections and node activity.When a node performs cross-community forwarding,the message is forwarded to the most reasonable relay community by measuring the node activity and the connection between communities.The proposed algorithm was compared to three existing routing algorithms in simulation experiments.Results indicate that the proposed algorithmsubstantially improves message delivery efficiency while lessening network overhead and enhancing connectivity and coordination in the SIoT context.展开更多
In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple e...In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.展开更多
To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when ...Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when data are incomplete.The existing grey relational models have some disadvantages in measuring the correlation between categorical data sequences.To this end,this paper introduces a new grey relational model to analyze heterogeneous data.In this study,a set of security risk factors for small reservoirs was first constructed based on theoretical analysis,and heterogeneous data of these factors were recorded as sequences.The sequences were regarded as random variables,and the information entropy and conditional entropy between sequences were measured to analyze the relational degree between risk factors.Then,a new grey relational analysis model for heterogeneous data was constructed,and a comprehensive security risk factor identification method was developed.A case study of small reservoirs in Guangxi Zhuang Autonomous Region in China shows that the model constructed in this study is applicable to security risk factor identification for small reservoirs with heterogeneous and sparse data.展开更多
For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and dura...For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and durable) properties. Moreover, database partition and migration tools can help transplanting conventional relational database systems to the cloud environment rather than rebuilding a new system. This paper proposes a database distribution management (DBDM) system, which partitions or replicates the data according to the transaction behaviors of the application system. The principle strategy of DBDM is to keep together the data used in a single transaction, and thus, avoiding massive transmission of records in join operations. The proposed system has been implemented successfully. The preliminary experiments show that the DBDM performs the database partition and migration effectively. Also, the DBDM system is modularly designed to adapt to different database management system (DBMS) or different partition algorithms.展开更多
We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel que...We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.展开更多
Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relationa...Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.展开更多
In this paper,the entity_relation data model for integrating spatio_temporal data is designed.In the design,spatio_temporal data can be effectively stored and spatiao_temporal analysis can be easily realized.
In this paper, the authors present the development of a data modelling tool that visualizes the transformation process of an "Entity-Relationship" Diagram (ERD) into a relational database schema. The authors' foc...In this paper, the authors present the development of a data modelling tool that visualizes the transformation process of an "Entity-Relationship" Diagram (ERD) into a relational database schema. The authors' focus is the design of a tool for educational purposes and its implementation on e-learning database course. The tool presents two stages of database design. The first stage is to draw ERD graphically and validate it. The drawing is done by a learner. Then at second stage, the system enables automatically transformation of ERD to relational database schema by using common rules. Thus, the learner could understand more easily how to apply the theoretical material. A detailed description of system functionalities and algorithm for the conversion are proposed. Finally, a user interface and usage aspects are exposed.展开更多
The definitions of cone-subconvexlike set-valued maps and generalized cone-subconvexlike set-valued maps in topological vector spaces are defined by using the relative interiors of ordering cone. The relationships bet...The definitions of cone-subconvexlike set-valued maps and generalized cone-subconvexlike set-valued maps in topological vector spaces are defined by using the relative interiors of ordering cone. The relationships between the two classes of set-valued maps are investigated, and some properties of them are shown. A Gordan type alternative theorem under the assumption of generalized cone-subconvexlikeness of set-valued maps is proved by applying convex separation theorems involving the relative interiors in infinite dimensional spaces. Finally a necessary optimality condition theorem is shown for a general kind of set-valued vector optimization in a sense of weak E-minimizer.展开更多
In order to improve the utilization of the residential electricity consumption data which contains the information on the user’s electricity consumption habits, a residential electricity consumption behaviors mining ...In order to improve the utilization of the residential electricity consumption data which contains the information on the user’s electricity consumption habits, a residential electricity consumption behaviors mining algorithm model is constructed. Firstly, according to the attribute, the collected data can be divided into the global data and the phase data, then the appropriate global variables are selected to mine the user’s electricity consumption patterns in the near future on the system clustering algorithm. Based on the theory of grey relational analysis, combing phase data with the power modes to analyze the potential characteristics of residential electricity consumption behaviors deeply that verify the ability of latest power mode to predict household electricity consumption situation in the coming few days and the effect of dominant phase variables on the peak load shifting. Finally, from the actual data of a certain family, the proposed data mining algorithm is testified that it can effectively explore the electricity consumption behavior habits and characteristics of the family.展开更多
An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during t...An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.展开更多
City regions often have great diversity in form and function. To better understand the role of each region, the relations between city regions need to be carefully studied. In this work, the human mobility relations b...City regions often have great diversity in form and function. To better understand the role of each region, the relations between city regions need to be carefully studied. In this work, the human mobility relations between regions of Shanghai based on mobile phone data is explored. By formulating the regions as nodes in a network and the commuting between each pair of regions as link weights, the distribution of nodes degree, and spatial structures of communities in this relation network are studied. Statistics show that regions locate in urban centers and traffic hubs have significantly larger degrees. Moreover, two kinds of spatial structures of communities are found. In most communities, nodes are spatially neighboring. However, in the communities that cover traffic hubs, nodes often locate along corridors.展开更多
An important factor in the course of daily medical diagnosis and treatment is understanding patients’ emotional states by the caregiver physicians. However, patients usually avoid speaking out their emotions when exp...An important factor in the course of daily medical diagnosis and treatment is understanding patients’ emotional states by the caregiver physicians. However, patients usually avoid speaking out their emotions when expressing their somatic symptoms and complaints to their non-psychiatrist doctor. On the other hand, clinicians usually lack the required expertise(or time) and have a deficit in mining various verbal and non-verbal emotional signals of the patients. As a result, in many cases, there is an emotion recognition barrier between the clinician and the patients making all patients seem the same except for their different somatic symptoms. In particular, we aim to identify and combine three major disciplines(psychology, linguistics, and data science) approaches for detecting emotions from verbal communication and propose an integrated solution for emotion recognition support. Such a platform may give emotional guides and indices to the clinician based on verbal communication at the consultation time.展开更多
In many practical situation, some of the attribute values for an object may be interval and set-valued. This paper introduces the interval and set-valued information systems and decision systems. According to the sema...In many practical situation, some of the attribute values for an object may be interval and set-valued. This paper introduces the interval and set-valued information systems and decision systems. According to the semantic relation of attribute values, interval and set-valued information systems can be classified into two categories: disjunctive (Type 1) and conjunctive (Type 2) systems. In this paper, we mainly focus on semantic interpretation of Type 1. Then, we define a new fuzzy preference relation and construct a fuzzy rough set model for interval and set-valued information systems. Moreover, based on the new fuzzy preference relation, the concepts of the significance measure of condition attributes and the relative significance measure of condition attributes are given in interval and set-valued decision information systems by the introduction of fuzzy positive region and the dependency degree. And on this basis, a heuristic algorithm for calculating fuzzy positive region reduction in interval and set-valued decision information systems is given. Finally, we give an illustrative example to substantiate the theoretical arguments. The results will help us to gain much more insights into the meaning of fuzzy rough set theory. Furthermore, it has provided a new perspective to study the attribute reduction problem in decision systems.展开更多
Developing and optimizing fuzzy relation equations are of great relevance in system modeling,which involves analysis of numerous fuzzy rules.As each rule varies with respect to its level of influence,it is advocated t...Developing and optimizing fuzzy relation equations are of great relevance in system modeling,which involves analysis of numerous fuzzy rules.As each rule varies with respect to its level of influence,it is advocated that the performance of a fuzzy relation equation is strongly related to a subset of fuzzy rules obtained by removing those without significant relevance.In this study,we establish a novel framework of developing granular fuzzy relation equations that concerns the determination of an optimal subset of fuzzy rules.The subset of rules is selected by maximizing their performance of the obtained solutions.The originality of this study is conducted in the following ways.Starting with developing granular fuzzy relation equations,an interval-valued fuzzy relation is determined based on the selected subset of fuzzy rules(the subset of rules is transformed to interval-valued fuzzy sets and subsequently the interval-valued fuzzy sets are utilized to form interval-valued fuzzy relations),which can be used to represent the fuzzy relation of the entire rule base with high performance and efficiency.Then,the particle swarm optimization(PSO)is implemented to solve a multi-objective optimization problem,in which not only an optimal subset of rules is selected but also a parameterεfor specifying a level of information granularity is determined.A series of experimental studies are performed to verify the feasibility of this framework and quantify its performance.A visible improvement of particle swarm optimization(about 78.56%of the encoding mechanism of particle swarm optimization,or 90.42%of particle swarm optimization with an exploration operator)is gained over the method conducted without using the particle swarm optimization algorithm.展开更多
This paper concentrates on the problem of data redundancy under the extended-possibility-based model. Based on the information gain in data classification, a measure - relation redundancy - is proposed to evaluate the...This paper concentrates on the problem of data redundancy under the extended-possibility-based model. Based on the information gain in data classification, a measure - relation redundancy - is proposed to evaluate the degree of a given relation being redundant in whole. The properties of relation redundancy are also investigated. This new measure is useful in dealing with data redundancy.展开更多
As there is datum redundancy in tradition database and temporal database in existence and the quantities of temporal database are increasing fleetly. We put forward compress storage tactics for temporal datum which co...As there is datum redundancy in tradition database and temporal database in existence and the quantities of temporal database are increasing fleetly. We put forward compress storage tactics for temporal datum which combine compress technology in existence in order to settle datum redundancy in the course of temporal datum storage and temporal datum of slow acting domain and momentary acting domain are accessed by using each from independence clock method and mutual clock method .We also bring forward strategy of gridding storage to resolve the problems of temporal datum rising rapidly.展开更多
MatBase is a prototype data and knowledge base management expert intelligent system based on the Relational,Entity-Relationship,and(Elementary)Mathematical Data Models.Dyadic relationships are quite common in data mod...MatBase is a prototype data and knowledge base management expert intelligent system based on the Relational,Entity-Relationship,and(Elementary)Mathematical Data Models.Dyadic relationships are quite common in data modeling.Besides their relational-type constraints,they often exhibit mathematical properties that are not covered by the Relational Data Model.This paper presents and discusses the MatBase algorithm that assists database designers in discovering all non-relational constraints associated to them,as well as its algorithm for enforcing them,thus providing a significantly higher degree of data quality.展开更多
基金supported by the NationalNatural Science Foundation of China(61972136)the Hubei Provincial Department of Education Outstanding Youth Scientific Innovation Team Support Foundation(T201410,T2020017)+1 种基金the Natural Science Foundation of Xiaogan City(XGKJ2022010095,XGKJ2022010094)the Science and Technology Research Project of Education Department of Hubei Province(No.Q20222704).
文摘Effective data communication is a crucial aspect of the Social Internet of Things(SIoT)and continues to be a significant research focus.This paper proposes a data forwarding algorithm based on Multidimensional Social Relations(MSRR)in SIoT to solve this problem.The proposed algorithm separates message forwarding into intra-and cross-community forwarding by analyzing interest traits and social connections among nodes.Three new metrics are defined:the intensity of node social relationships,node activity,and community connectivity.Within the community,messages are sent by determining which node is most similar to the sender by weighing the strength of social connections and node activity.When a node performs cross-community forwarding,the message is forwarded to the most reasonable relay community by measuring the node activity and the connection between communities.The proposed algorithm was compared to three existing routing algorithms in simulation experiments.Results indicate that the proposed algorithmsubstantially improves message delivery efficiency while lessening network overhead and enhancing connectivity and coordination in the SIoT context.
基金Science and Technology Innovation 2030-Major Project of“New Generation Artificial Intelligence”granted by Ministry of Science and Technology,Grant Number 2020AAA0109300.
文摘In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金supported by the National Nature Science Foundation of China(Grant No.71401052)the National Social Science Foundation of China(Grant No.17BGL156)the Key Project of the National Social Science Foundation of China(Grant No.14AZD024)
文摘Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when data are incomplete.The existing grey relational models have some disadvantages in measuring the correlation between categorical data sequences.To this end,this paper introduces a new grey relational model to analyze heterogeneous data.In this study,a set of security risk factors for small reservoirs was first constructed based on theoretical analysis,and heterogeneous data of these factors were recorded as sequences.The sequences were regarded as random variables,and the information entropy and conditional entropy between sequences were measured to analyze the relational degree between risk factors.Then,a new grey relational analysis model for heterogeneous data was constructed,and a comprehensive security risk factor identification method was developed.A case study of small reservoirs in Guangxi Zhuang Autonomous Region in China shows that the model constructed in this study is applicable to security risk factor identification for small reservoirs with heterogeneous and sparse data.
基金supported by the Taiwan Ministry of Economic Affairs and Institute for Information Industry under the project titled "Fundamental Industrial Technology Development Program (1/4)"
文摘For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and durable) properties. Moreover, database partition and migration tools can help transplanting conventional relational database systems to the cloud environment rather than rebuilding a new system. This paper proposes a database distribution management (DBDM) system, which partitions or replicates the data according to the transaction behaviors of the application system. The principle strategy of DBDM is to keep together the data used in a single transaction, and thus, avoiding massive transmission of records in join operations. The proposed system has been implemented successfully. The preliminary experiments show that the DBDM performs the database partition and migration effectively. Also, the DBDM system is modularly designed to adapt to different database management system (DBMS) or different partition algorithms.
文摘We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.
基金supported by Universiti Putra Malaysia Grant Scheme(Putra Grant)(GP/2020/9692500).
文摘Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.
文摘In this paper,the entity_relation data model for integrating spatio_temporal data is designed.In the design,spatio_temporal data can be effectively stored and spatiao_temporal analysis can be easily realized.
文摘In this paper, the authors present the development of a data modelling tool that visualizes the transformation process of an "Entity-Relationship" Diagram (ERD) into a relational database schema. The authors' focus is the design of a tool for educational purposes and its implementation on e-learning database course. The tool presents two stages of database design. The first stage is to draw ERD graphically and validate it. The drawing is done by a learner. Then at second stage, the system enables automatically transformation of ERD to relational database schema by using common rules. Thus, the learner could understand more easily how to apply the theoretical material. A detailed description of system functionalities and algorithm for the conversion are proposed. Finally, a user interface and usage aspects are exposed.
文摘The definitions of cone-subconvexlike set-valued maps and generalized cone-subconvexlike set-valued maps in topological vector spaces are defined by using the relative interiors of ordering cone. The relationships between the two classes of set-valued maps are investigated, and some properties of them are shown. A Gordan type alternative theorem under the assumption of generalized cone-subconvexlikeness of set-valued maps is proved by applying convex separation theorems involving the relative interiors in infinite dimensional spaces. Finally a necessary optimality condition theorem is shown for a general kind of set-valued vector optimization in a sense of weak E-minimizer.
文摘In order to improve the utilization of the residential electricity consumption data which contains the information on the user’s electricity consumption habits, a residential electricity consumption behaviors mining algorithm model is constructed. Firstly, according to the attribute, the collected data can be divided into the global data and the phase data, then the appropriate global variables are selected to mine the user’s electricity consumption patterns in the near future on the system clustering algorithm. Based on the theory of grey relational analysis, combing phase data with the power modes to analyze the potential characteristics of residential electricity consumption behaviors deeply that verify the ability of latest power mode to predict household electricity consumption situation in the coming few days and the effect of dominant phase variables on the peak load shifting. Finally, from the actual data of a certain family, the proposed data mining algorithm is testified that it can effectively explore the electricity consumption behavior habits and characteristics of the family.
基金supported by the National Key Research and Development Program[2020YFB1006302].
文摘An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.
基金Project(71303269)supported by the National Natural Science Foundation of ChinaProject(14ZZD006)supported by the Economics Major Research Task of Fostering,China
文摘City regions often have great diversity in form and function. To better understand the role of each region, the relations between city regions need to be carefully studied. In this work, the human mobility relations between regions of Shanghai based on mobile phone data is explored. By formulating the regions as nodes in a network and the commuting between each pair of regions as link weights, the distribution of nodes degree, and spatial structures of communities in this relation network are studied. Statistics show that regions locate in urban centers and traffic hubs have significantly larger degrees. Moreover, two kinds of spatial structures of communities are found. In most communities, nodes are spatially neighboring. However, in the communities that cover traffic hubs, nodes often locate along corridors.
文摘An important factor in the course of daily medical diagnosis and treatment is understanding patients’ emotional states by the caregiver physicians. However, patients usually avoid speaking out their emotions when expressing their somatic symptoms and complaints to their non-psychiatrist doctor. On the other hand, clinicians usually lack the required expertise(or time) and have a deficit in mining various verbal and non-verbal emotional signals of the patients. As a result, in many cases, there is an emotion recognition barrier between the clinician and the patients making all patients seem the same except for their different somatic symptoms. In particular, we aim to identify and combine three major disciplines(psychology, linguistics, and data science) approaches for detecting emotions from verbal communication and propose an integrated solution for emotion recognition support. Such a platform may give emotional guides and indices to the clinician based on verbal communication at the consultation time.
文摘In many practical situation, some of the attribute values for an object may be interval and set-valued. This paper introduces the interval and set-valued information systems and decision systems. According to the semantic relation of attribute values, interval and set-valued information systems can be classified into two categories: disjunctive (Type 1) and conjunctive (Type 2) systems. In this paper, we mainly focus on semantic interpretation of Type 1. Then, we define a new fuzzy preference relation and construct a fuzzy rough set model for interval and set-valued information systems. Moreover, based on the new fuzzy preference relation, the concepts of the significance measure of condition attributes and the relative significance measure of condition attributes are given in interval and set-valued decision information systems by the introduction of fuzzy positive region and the dependency degree. And on this basis, a heuristic algorithm for calculating fuzzy positive region reduction in interval and set-valued decision information systems is given. Finally, we give an illustrative example to substantiate the theoretical arguments. The results will help us to gain much more insights into the meaning of fuzzy rough set theory. Furthermore, it has provided a new perspective to study the attribute reduction problem in decision systems.
基金supported by the National Natural Sci-ence Foundation of China(62006184,62076189,61873277).
文摘Developing and optimizing fuzzy relation equations are of great relevance in system modeling,which involves analysis of numerous fuzzy rules.As each rule varies with respect to its level of influence,it is advocated that the performance of a fuzzy relation equation is strongly related to a subset of fuzzy rules obtained by removing those without significant relevance.In this study,we establish a novel framework of developing granular fuzzy relation equations that concerns the determination of an optimal subset of fuzzy rules.The subset of rules is selected by maximizing their performance of the obtained solutions.The originality of this study is conducted in the following ways.Starting with developing granular fuzzy relation equations,an interval-valued fuzzy relation is determined based on the selected subset of fuzzy rules(the subset of rules is transformed to interval-valued fuzzy sets and subsequently the interval-valued fuzzy sets are utilized to form interval-valued fuzzy relations),which can be used to represent the fuzzy relation of the entire rule base with high performance and efficiency.Then,the particle swarm optimization(PSO)is implemented to solve a multi-objective optimization problem,in which not only an optimal subset of rules is selected but also a parameterεfor specifying a level of information granularity is determined.A series of experimental studies are performed to verify the feasibility of this framework and quantify its performance.A visible improvement of particle swarm optimization(about 78.56%of the encoding mechanism of particle swarm optimization,or 90.42%of particle swarm optimization with an exploration operator)is gained over the method conducted without using the particle swarm optimization algorithm.
基金Supported by the National Natural Science Foundation of China(No.70231010/70321001)the Bilateral Scientific and Technological Cooperation between China and Flanders (No.174B0201)
文摘This paper concentrates on the problem of data redundancy under the extended-possibility-based model. Based on the information gain in data classification, a measure - relation redundancy - is proposed to evaluate the degree of a given relation being redundant in whole. The properties of relation redundancy are also investigated. This new measure is useful in dealing with data redundancy.
文摘As there is datum redundancy in tradition database and temporal database in existence and the quantities of temporal database are increasing fleetly. We put forward compress storage tactics for temporal datum which combine compress technology in existence in order to settle datum redundancy in the course of temporal datum storage and temporal datum of slow acting domain and momentary acting domain are accessed by using each from independence clock method and mutual clock method .We also bring forward strategy of gridding storage to resolve the problems of temporal datum rising rapidly.
文摘MatBase is a prototype data and knowledge base management expert intelligent system based on the Relational,Entity-Relationship,and(Elementary)Mathematical Data Models.Dyadic relationships are quite common in data modeling.Besides their relational-type constraints,they often exhibit mathematical properties that are not covered by the Relational Data Model.This paper presents and discusses the MatBase algorithm that assists database designers in discovering all non-relational constraints associated to them,as well as its algorithm for enforcing them,thus providing a significantly higher degree of data quality.