Effective data communication is a crucial aspect of the Social Internet of Things(SIoT)and continues to be a significant research focus.This paper proposes a data forwarding algorithm based on Multidimensional Social ...Effective data communication is a crucial aspect of the Social Internet of Things(SIoT)and continues to be a significant research focus.This paper proposes a data forwarding algorithm based on Multidimensional Social Relations(MSRR)in SIoT to solve this problem.The proposed algorithm separates message forwarding into intra-and cross-community forwarding by analyzing interest traits and social connections among nodes.Three new metrics are defined:the intensity of node social relationships,node activity,and community connectivity.Within the community,messages are sent by determining which node is most similar to the sender by weighing the strength of social connections and node activity.When a node performs cross-community forwarding,the message is forwarded to the most reasonable relay community by measuring the node activity and the connection between communities.The proposed algorithm was compared to three existing routing algorithms in simulation experiments.Results indicate that the proposed algorithmsubstantially improves message delivery efficiency while lessening network overhead and enhancing connectivity and coordination in the SIoT context.展开更多
In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple e...In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.展开更多
An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during t...An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.展开更多
Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when ...Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when data are incomplete.The existing grey relational models have some disadvantages in measuring the correlation between categorical data sequences.To this end,this paper introduces a new grey relational model to analyze heterogeneous data.In this study,a set of security risk factors for small reservoirs was first constructed based on theoretical analysis,and heterogeneous data of these factors were recorded as sequences.The sequences were regarded as random variables,and the information entropy and conditional entropy between sequences were measured to analyze the relational degree between risk factors.Then,a new grey relational analysis model for heterogeneous data was constructed,and a comprehensive security risk factor identification method was developed.A case study of small reservoirs in Guangxi Zhuang Autonomous Region in China shows that the model constructed in this study is applicable to security risk factor identification for small reservoirs with heterogeneous and sparse data.展开更多
For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and dura...For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and durable) properties. Moreover, database partition and migration tools can help transplanting conventional relational database systems to the cloud environment rather than rebuilding a new system. This paper proposes a database distribution management (DBDM) system, which partitions or replicates the data according to the transaction behaviors of the application system. The principle strategy of DBDM is to keep together the data used in a single transaction, and thus, avoiding massive transmission of records in join operations. The proposed system has been implemented successfully. The preliminary experiments show that the DBDM performs the database partition and migration effectively. Also, the DBDM system is modularly designed to adapt to different database management system (DBMS) or different partition algorithms.展开更多
We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel que...We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.展开更多
Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relationa...Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.展开更多
Developing and optimizing fuzzy relation equations are of great relevance in system modeling,which involves analysis of numerous fuzzy rules.As each rule varies with respect to its level of influence,it is advocated t...Developing and optimizing fuzzy relation equations are of great relevance in system modeling,which involves analysis of numerous fuzzy rules.As each rule varies with respect to its level of influence,it is advocated that the performance of a fuzzy relation equation is strongly related to a subset of fuzzy rules obtained by removing those without significant relevance.In this study,we establish a novel framework of developing granular fuzzy relation equations that concerns the determination of an optimal subset of fuzzy rules.The subset of rules is selected by maximizing their performance of the obtained solutions.The originality of this study is conducted in the following ways.Starting with developing granular fuzzy relation equations,an interval-valued fuzzy relation is determined based on the selected subset of fuzzy rules(the subset of rules is transformed to interval-valued fuzzy sets and subsequently the interval-valued fuzzy sets are utilized to form interval-valued fuzzy relations),which can be used to represent the fuzzy relation of the entire rule base with high performance and efficiency.Then,the particle swarm optimization(PSO)is implemented to solve a multi-objective optimization problem,in which not only an optimal subset of rules is selected but also a parameterεfor specifying a level of information granularity is determined.A series of experimental studies are performed to verify the feasibility of this framework and quantify its performance.A visible improvement of particle swarm optimization(about 78.56%of the encoding mechanism of particle swarm optimization,or 90.42%of particle swarm optimization with an exploration operator)is gained over the method conducted without using the particle swarm optimization algorithm.展开更多
As there is datum redundancy in tradition database and temporal database in existence and the quantities of temporal database are increasing fleetly. We put forward compress storage tactics for temporal datum which co...As there is datum redundancy in tradition database and temporal database in existence and the quantities of temporal database are increasing fleetly. We put forward compress storage tactics for temporal datum which combine compress technology in existence in order to settle datum redundancy in the course of temporal datum storage and temporal datum of slow acting domain and momentary acting domain are accessed by using each from independence clock method and mutual clock method .We also bring forward strategy of gridding storage to resolve the problems of temporal datum rising rapidly.展开更多
This paper concentrates on the problem of data redundancy under the extended-possibility-based model. Based on the information gain in data classification, a measure - relation redundancy - is proposed to evaluate the...This paper concentrates on the problem of data redundancy under the extended-possibility-based model. Based on the information gain in data classification, a measure - relation redundancy - is proposed to evaluate the degree of a given relation being redundant in whole. The properties of relation redundancy are also investigated. This new measure is useful in dealing with data redundancy.展开更多
City regions often have great diversity in form and function. To better understand the role of each region, the relations between city regions need to be carefully studied. In this work, the human mobility relations b...City regions often have great diversity in form and function. To better understand the role of each region, the relations between city regions need to be carefully studied. In this work, the human mobility relations between regions of Shanghai based on mobile phone data is explored. By formulating the regions as nodes in a network and the commuting between each pair of regions as link weights, the distribution of nodes degree, and spatial structures of communities in this relation network are studied. Statistics show that regions locate in urban centers and traffic hubs have significantly larger degrees. Moreover, two kinds of spatial structures of communities are found. In most communities, nodes are spatially neighboring. However, in the communities that cover traffic hubs, nodes often locate along corridors.展开更多
In this paper,the entity_relation data model for integrating spatio_temporal data is designed.In the design,spatio_temporal data can be effectively stored and spatiao_temporal analysis can be easily realized.
Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only ...Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.展开更多
As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the tr...As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.展开更多
In order to improve the utilization of the residential electricity consumption data which contains the information on the user’s electricity consumption habits, a residential electricity consumption behaviors mining ...In order to improve the utilization of the residential electricity consumption data which contains the information on the user’s electricity consumption habits, a residential electricity consumption behaviors mining algorithm model is constructed. Firstly, according to the attribute, the collected data can be divided into the global data and the phase data, then the appropriate global variables are selected to mine the user’s electricity consumption patterns in the near future on the system clustering algorithm. Based on the theory of grey relational analysis, combing phase data with the power modes to analyze the potential characteristics of residential electricity consumption behaviors deeply that verify the ability of latest power mode to predict household electricity consumption situation in the coming few days and the effect of dominant phase variables on the peak load shifting. Finally, from the actual data of a certain family, the proposed data mining algorithm is testified that it can effectively explore the electricity consumption behavior habits and characteristics of the family.展开更多
On the basis of the temperature and salinity survey data of July, 1975 and the historical da-ta of relevant hydrological and meteorological stations, this paper discusses the change patternand cause of the upwelling a...On the basis of the temperature and salinity survey data of July, 1975 and the historical da-ta of relevant hydrological and meteorological stations, this paper discusses the change patternand cause of the upwelling and its relation to the fisheries in the Southern Fujian-Taiwan ShoalFishing Ground. Being important for the benefit of fisheries, the above points have received展开更多
MatBase is a prototype data and knowledge base management expert intelligent system based on the Relational,Entity-Relationship,and(Elementary)Mathematical Data Models.Dyadic relationships are quite common in data mod...MatBase is a prototype data and knowledge base management expert intelligent system based on the Relational,Entity-Relationship,and(Elementary)Mathematical Data Models.Dyadic relationships are quite common in data modeling.Besides their relational-type constraints,they often exhibit mathematical properties that are not covered by the Relational Data Model.This paper presents and discusses the MatBase algorithm that assists database designers in discovering all non-relational constraints associated to them,as well as its algorithm for enforcing them,thus providing a significantly higher degree of data quality.展开更多
We present the results of an investigation of the relation between space-weather parameters and cosmic ray(CR)intensity modulation using algorithm-selected Forbush decreases(FDs)from Moscow(MOSC)and Apatity(APTY)neutr...We present the results of an investigation of the relation between space-weather parameters and cosmic ray(CR)intensity modulation using algorithm-selected Forbush decreases(FDs)from Moscow(MOSC)and Apatity(APTY)neutron monitor(NM)stations during solar cycle 23.Our FD location program detected 408 and 383 FDs from MOSC and APTY NM stations respectively.A coincident computer code employed in this work detected 229 FDs that were observed at the same Universal Time(UT)at the two stations.Out of the 229 simultaneous FDs,we formed a subset of 139 large FDs(%)≤-4 at the MOSC station.We performed a two-dimensional regression analysis between the FD magnitudes and the space-weather data on the two samples.We find that there were significant space-weather disturbances at the time of the CR flux depressions.The correlation between the space-weather parameters and decreases in galactic cosmic ray(GCR)intensity at the two NM stations is statistically significant.The implications of the present space-weather data on CR intensity depressions are highlighted.展开更多
基金supported by the NationalNatural Science Foundation of China(61972136)the Hubei Provincial Department of Education Outstanding Youth Scientific Innovation Team Support Foundation(T201410,T2020017)+1 种基金the Natural Science Foundation of Xiaogan City(XGKJ2022010095,XGKJ2022010094)the Science and Technology Research Project of Education Department of Hubei Province(No.Q20222704).
文摘Effective data communication is a crucial aspect of the Social Internet of Things(SIoT)and continues to be a significant research focus.This paper proposes a data forwarding algorithm based on Multidimensional Social Relations(MSRR)in SIoT to solve this problem.The proposed algorithm separates message forwarding into intra-and cross-community forwarding by analyzing interest traits and social connections among nodes.Three new metrics are defined:the intensity of node social relationships,node activity,and community connectivity.Within the community,messages are sent by determining which node is most similar to the sender by weighing the strength of social connections and node activity.When a node performs cross-community forwarding,the message is forwarded to the most reasonable relay community by measuring the node activity and the connection between communities.The proposed algorithm was compared to three existing routing algorithms in simulation experiments.Results indicate that the proposed algorithmsubstantially improves message delivery efficiency while lessening network overhead and enhancing connectivity and coordination in the SIoT context.
基金Science and Technology Innovation 2030-Major Project of“New Generation Artificial Intelligence”granted by Ministry of Science and Technology,Grant Number 2020AAA0109300.
文摘In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.
基金supported by the National Key Research and Development Program[2020YFB1006302].
文摘An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.
基金supported by the National Nature Science Foundation of China(Grant No.71401052)the National Social Science Foundation of China(Grant No.17BGL156)the Key Project of the National Social Science Foundation of China(Grant No.14AZD024)
文摘Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when data are incomplete.The existing grey relational models have some disadvantages in measuring the correlation between categorical data sequences.To this end,this paper introduces a new grey relational model to analyze heterogeneous data.In this study,a set of security risk factors for small reservoirs was first constructed based on theoretical analysis,and heterogeneous data of these factors were recorded as sequences.The sequences were regarded as random variables,and the information entropy and conditional entropy between sequences were measured to analyze the relational degree between risk factors.Then,a new grey relational analysis model for heterogeneous data was constructed,and a comprehensive security risk factor identification method was developed.A case study of small reservoirs in Guangxi Zhuang Autonomous Region in China shows that the model constructed in this study is applicable to security risk factor identification for small reservoirs with heterogeneous and sparse data.
基金supported by the Taiwan Ministry of Economic Affairs and Institute for Information Industry under the project titled "Fundamental Industrial Technology Development Program (1/4)"
文摘For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and durable) properties. Moreover, database partition and migration tools can help transplanting conventional relational database systems to the cloud environment rather than rebuilding a new system. This paper proposes a database distribution management (DBDM) system, which partitions or replicates the data according to the transaction behaviors of the application system. The principle strategy of DBDM is to keep together the data used in a single transaction, and thus, avoiding massive transmission of records in join operations. The proposed system has been implemented successfully. The preliminary experiments show that the DBDM performs the database partition and migration effectively. Also, the DBDM system is modularly designed to adapt to different database management system (DBMS) or different partition algorithms.
文摘We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.
基金supported by Universiti Putra Malaysia Grant Scheme(Putra Grant)(GP/2020/9692500).
文摘Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.
基金supported by the National Natural Sci-ence Foundation of China(62006184,62076189,61873277).
文摘Developing and optimizing fuzzy relation equations are of great relevance in system modeling,which involves analysis of numerous fuzzy rules.As each rule varies with respect to its level of influence,it is advocated that the performance of a fuzzy relation equation is strongly related to a subset of fuzzy rules obtained by removing those without significant relevance.In this study,we establish a novel framework of developing granular fuzzy relation equations that concerns the determination of an optimal subset of fuzzy rules.The subset of rules is selected by maximizing their performance of the obtained solutions.The originality of this study is conducted in the following ways.Starting with developing granular fuzzy relation equations,an interval-valued fuzzy relation is determined based on the selected subset of fuzzy rules(the subset of rules is transformed to interval-valued fuzzy sets and subsequently the interval-valued fuzzy sets are utilized to form interval-valued fuzzy relations),which can be used to represent the fuzzy relation of the entire rule base with high performance and efficiency.Then,the particle swarm optimization(PSO)is implemented to solve a multi-objective optimization problem,in which not only an optimal subset of rules is selected but also a parameterεfor specifying a level of information granularity is determined.A series of experimental studies are performed to verify the feasibility of this framework and quantify its performance.A visible improvement of particle swarm optimization(about 78.56%of the encoding mechanism of particle swarm optimization,or 90.42%of particle swarm optimization with an exploration operator)is gained over the method conducted without using the particle swarm optimization algorithm.
文摘As there is datum redundancy in tradition database and temporal database in existence and the quantities of temporal database are increasing fleetly. We put forward compress storage tactics for temporal datum which combine compress technology in existence in order to settle datum redundancy in the course of temporal datum storage and temporal datum of slow acting domain and momentary acting domain are accessed by using each from independence clock method and mutual clock method .We also bring forward strategy of gridding storage to resolve the problems of temporal datum rising rapidly.
基金Supported by the National Natural Science Foundation of China(No.70231010/70321001)the Bilateral Scientific and Technological Cooperation between China and Flanders (No.174B0201)
文摘This paper concentrates on the problem of data redundancy under the extended-possibility-based model. Based on the information gain in data classification, a measure - relation redundancy - is proposed to evaluate the degree of a given relation being redundant in whole. The properties of relation redundancy are also investigated. This new measure is useful in dealing with data redundancy.
基金Project(71303269)supported by the National Natural Science Foundation of ChinaProject(14ZZD006)supported by the Economics Major Research Task of Fostering,China
文摘City regions often have great diversity in form and function. To better understand the role of each region, the relations between city regions need to be carefully studied. In this work, the human mobility relations between regions of Shanghai based on mobile phone data is explored. By formulating the regions as nodes in a network and the commuting between each pair of regions as link weights, the distribution of nodes degree, and spatial structures of communities in this relation network are studied. Statistics show that regions locate in urban centers and traffic hubs have significantly larger degrees. Moreover, two kinds of spatial structures of communities are found. In most communities, nodes are spatially neighboring. However, in the communities that cover traffic hubs, nodes often locate along corridors.
文摘In this paper,the entity_relation data model for integrating spatio_temporal data is designed.In the design,spatio_temporal data can be effectively stored and spatiao_temporal analysis can be easily realized.
基金Supported by the National Natural Science Foun-dation of China (70371015)
文摘Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.
文摘As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.
文摘In order to improve the utilization of the residential electricity consumption data which contains the information on the user’s electricity consumption habits, a residential electricity consumption behaviors mining algorithm model is constructed. Firstly, according to the attribute, the collected data can be divided into the global data and the phase data, then the appropriate global variables are selected to mine the user’s electricity consumption patterns in the near future on the system clustering algorithm. Based on the theory of grey relational analysis, combing phase data with the power modes to analyze the potential characteristics of residential electricity consumption behaviors deeply that verify the ability of latest power mode to predict household electricity consumption situation in the coming few days and the effect of dominant phase variables on the peak load shifting. Finally, from the actual data of a certain family, the proposed data mining algorithm is testified that it can effectively explore the electricity consumption behavior habits and characteristics of the family.
文摘On the basis of the temperature and salinity survey data of July, 1975 and the historical da-ta of relevant hydrological and meteorological stations, this paper discusses the change patternand cause of the upwelling and its relation to the fisheries in the Southern Fujian-Taiwan ShoalFishing Ground. Being important for the benefit of fisheries, the above points have received
文摘MatBase is a prototype data and knowledge base management expert intelligent system based on the Relational,Entity-Relationship,and(Elementary)Mathematical Data Models.Dyadic relationships are quite common in data modeling.Besides their relational-type constraints,they often exhibit mathematical properties that are not covered by the Relational Data Model.This paper presents and discusses the MatBase algorithm that assists database designers in discovering all non-relational constraints associated to them,as well as its algorithm for enforcing them,thus providing a significantly higher degree of data quality.
文摘We present the results of an investigation of the relation between space-weather parameters and cosmic ray(CR)intensity modulation using algorithm-selected Forbush decreases(FDs)from Moscow(MOSC)and Apatity(APTY)neutron monitor(NM)stations during solar cycle 23.Our FD location program detected 408 and 383 FDs from MOSC and APTY NM stations respectively.A coincident computer code employed in this work detected 229 FDs that were observed at the same Universal Time(UT)at the two stations.Out of the 229 simultaneous FDs,we formed a subset of 139 large FDs(%)≤-4 at the MOSC station.We performed a two-dimensional regression analysis between the FD magnitudes and the space-weather data on the two samples.We find that there were significant space-weather disturbances at the time of the CR flux depressions.The correlation between the space-weather parameters and decreases in galactic cosmic ray(GCR)intensity at the two NM stations is statistically significant.The implications of the present space-weather data on CR intensity depressions are highlighted.