In the realm of data privacy protection,federated learning aims to collaboratively train a global model.However,heterogeneous data between clients presents challenges,often resulting in slow convergence and inadequate...In the realm of data privacy protection,federated learning aims to collaboratively train a global model.However,heterogeneous data between clients presents challenges,often resulting in slow convergence and inadequate accuracy of the global model.Utilizing shared feature representations alongside customized classifiers for individual clients emerges as a promising personalized solution.Nonetheless,previous research has frequently neglected the integration of global knowledge into local representation learning and the synergy between global and local classifiers,thereby limiting model performance.To tackle these issues,this study proposes a hierarchical optimization method for federated learning with feature alignment and the fusion of classification decisions(FedFCD).FedFCD regularizes the relationship between global and local feature representations to achieve alignment and incorporates decision information from the global classifier,facilitating the late fusion of decision outputs from both global and local classifiers.Additionally,FedFCD employs a hierarchical optimization strategy to flexibly optimize model parameters.Through experiments on the Fashion-MNIST,CIFAR-10 and CIFAR-100 datasets,we demonstrate the effectiveness and superiority of FedFCD.For instance,on the CIFAR-100 dataset,FedFCD exhibited a significant improvement in average test accuracy by 6.83%compared to four outstanding personalized federated learning approaches.Furthermore,extended experiments confirm the robustness of FedFCD across various hyperparameter values.展开更多
Many approaches have been proposed to pre-compute data cubes in order to efficiently respond to OLAP queries in data warehouses. However, few have proposed solutions integrating all of the possible outcomes, and it is...Many approaches have been proposed to pre-compute data cubes in order to efficiently respond to OLAP queries in data warehouses. However, few have proposed solutions integrating all of the possible outcomes, and it is this idea that leads the integration of hierarchical dimensions into these responses. To meet this need, we propose, in this paper, a complete redefinition of the framework and the formal definition of traditional database analysis through the prism of hierarchical dimensions. After characterizing the hierarchical data cube lattice, we introduce the hierarchical data cube and its most concise reduced representation, the closed hierarchical data cube. It offers compact replication so as to optimize storage space by removing redundancies of strongly correlated data. Such data are typical of data warehouses, and in particular in video games, our field of study and experimentation, where hierarchical dimension attributes are widely represented.展开更多
Real-time health data monitoring is pivotal for bolstering road services’safety,intelligence,and efficiency within the Internet of Health Things(IoHT)framework.Yet,delays in data retrieval can markedly hinder the eff...Real-time health data monitoring is pivotal for bolstering road services’safety,intelligence,and efficiency within the Internet of Health Things(IoHT)framework.Yet,delays in data retrieval can markedly hinder the efficacy of big data awareness detection systems.We advocate for a collaborative caching approach involving edge devices and cloud networks to combat this.This strategy is devised to streamline the data retrieval path,subsequently diminishing network strain.Crafting an adept cache processing scheme poses its own set of challenges,especially given the transient nature of monitoring data and the imperative for swift data transmission,intertwined with resource allocation tactics.This paper unveils a novel mobile healthcare solution that harnesses the power of our collaborative caching approach,facilitating nuanced health monitoring via edge devices.The system capitalizes on cloud computing for intricate health data analytics,especially in pinpointing health anomalies.Given the dynamic locational shifts and possible connection disruptions,we have architected a hierarchical detection system,particularly during crises.This system caches data efficiently and incorporates a detection utility to assess data freshness and potential lag in response times.Furthermore,we introduce the Cache-Assisted Real-Time Detection(CARD)model,crafted to optimize utility.Addressing the inherent complexity of the NP-hard CARD model,we have championed a greedy algorithm as a solution.Simulations reveal that our collaborative caching technique markedly elevates the Cache Hit Ratio(CHR)and data freshness,outshining its contemporaneous benchmark algorithms.The empirical results underscore the strength and efficiency of our innovative IoHT-based health monitoring solution.To encapsulate,this paper tackles the nuances of real-time health data monitoring in the IoHT landscape,presenting a joint edge-cloud caching strategy paired with a hierarchical detection system.Our methodology yields enhanced cache efficiency and data freshness.The corroborative numerical data accentuates the feasibility and relevance of our model,casting a beacon for the future trajectory of real-time health data monitoring systems.展开更多
The zero_failure data research is a new field in the recent years, but it is required urgently in practical projects, so the work has more theory and practical values. In this paper, for zero_failure data (t i,n i...The zero_failure data research is a new field in the recent years, but it is required urgently in practical projects, so the work has more theory and practical values. In this paper, for zero_failure data (t i,n i) at moment t i , if the prior distribution of the failure probability p i=p{T【t i} is quasi_exponential distribution, the author gives the p i Bayesian estimation and hierarchical Bayesian estimation and the reliability under zero_failure date condition is also obtained.展开更多
Most semi-structured data are of certain structure regularity. Having beenstored as structured data in relational database (RDB), they can be effectively managed by databasemanagement system (DBMS). Some semi-structur...Most semi-structured data are of certain structure regularity. Having beenstored as structured data in relational database (RDB), they can be effectively managed by databasemanagement system (DBMS). Some semi-structured data are difficult to transform due to theirirregular structures. We design an efficient algorithm and data structure for ensuring losslesstransformation. We bring forward an approach of schema extraction through data mining, in whichdifferent kinds of elements are transformed respectively and lossless mapping from semi-structureddata to structured data can be achieved.展开更多
How to design a multicast key management system with high performance is a hot issue now. This paper will apply the idea of hierarchical data processing to construct a common analytic model based on directed logical k...How to design a multicast key management system with high performance is a hot issue now. This paper will apply the idea of hierarchical data processing to construct a common analytic model based on directed logical key tree and supply two important metrics to this problem: re-keying cost and key storage cost. The paper gives the basic theory to the hierarchical data processing and the analyzing model to multieast key management based on logical key tree. It has been proved that the 4-ray tree has the best performance in using these metrics. The key management problem is also investigated based on user probability model, and gives two evaluating parameters to re-keying and key storage cost.展开更多
Big data is becoming increasingly important because of the enormous information generation and storage in recent years.It has become a challenge to the data mining technique and management.Based on the characteristics...Big data is becoming increasingly important because of the enormous information generation and storage in recent years.It has become a challenge to the data mining technique and management.Based on the characteristics of geometric explosion of information in the era of big data,this paper studies the possible approaches to balance the maximum value and privacy of information,and disposes the Nine-Cells information matrix,hierarchical classification.Furthermore,the paper uses the rough sets theory to proceed from the two dimensions of value and privacy,establishes information classification method,puts forward the countermeasures for information security.Taking spam messages for example,the massive spam messages can be classified,and then targeted hierarchical management strategy was put forward.This paper proposes personal Information index system,Information management platform and possible solutions to protect information security and utilize information value in the age of big data.展开更多
Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse...Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.展开更多
A multilevel secure relation hierarchical data model for multilevel secure database is extended from the relation hierarchical data model in single level environment in this paper. Based on the model, an upper lowe...A multilevel secure relation hierarchical data model for multilevel secure database is extended from the relation hierarchical data model in single level environment in this paper. Based on the model, an upper lower layer relationalintegrity is presented after we analyze and eliminate the covert channels caused by the database integrity.Two SQL statements are extended to process polyinstantiation in the multilevel secure environment.The system based on the multilevel secure relation hierarchical data model is capable of integratively storing and manipulating complicated objects ( e.g. , multilevel spatial data) and conventional data ( e.g. , integer, real number and character string) in multilevel secure database.展开更多
This paper focuses on the methods and process of spatial aggregation based on semantic and geometric characteristics of spatial objects and relations among the objects with the help of spatial data structure (Formal D...This paper focuses on the methods and process of spatial aggregation based on semantic and geometric characteristics of spatial objects and relations among the objects with the help of spatial data structure (Formal Data Structure),the Local Constrained Delaunay Triangulations and semantic hierarchy.The adjacent relation among connected objects and unconnected objects has been studied through constrained triangle as elementary processing unit in aggregation operation.The hierarchical semantic analytical matrix is given for analyzing the similarity between objects types and between objects.Several different cases of aggregation have been presented in this paper.展开更多
Decision rules mining is an important issue in machine learning and data mining.However,most proposed algorithms mine categorical data at single level,and these rules are not easily understandable and really useful fo...Decision rules mining is an important issue in machine learning and data mining.However,most proposed algorithms mine categorical data at single level,and these rules are not easily understandable and really useful for users.Thus,a new approach to hierarchical decision rules mining is provided in this paper,in which similarity direction measure is introduced to deal with hybrid data.This approach can mine hierarchical decision rules by adjusting similarity measure parameters and the level of concept hierarchy trees.展开更多
A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to dis...A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to discover schema knowledge implicit in the semi structured data. This knowledge can make users understand the information structure on the web more deeply and thourouly. At the same time, it can also provide a kind of effective schema for the querying of web information.展开更多
The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized...The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized cloud server is not applicable due to data privacy and communication costs concerns,hindering artificial intelligence from empowering mobile devices.Moreover,these data are not identically and independently distributed(Non-IID)caused by their different context,which will deteriorate the performance of the model.To address these issues,we propose a novel Distributed Learning algorithm based on hierarchical clustering and Adaptive Dataset Condensation,named ADC-DL,which learns a shared model by collecting the synthetic samples generated on each device.To tackle the heterogeneity of data distribution,we propose an entropy topsis comprehensive tiering model for hierarchical clustering,which distinguishes clients in terms of their data characteristics.Subsequently,synthetic dummy samples are generated based on the hierarchical structure utilizing adaptive dataset condensation.The procedure of dataset condensation can be adjusted adaptively according to the tier of the client.Extensive experiments demonstrate that the performance of our ADC-DL is more outstanding in prediction accuracy and communication costs compared with existing algorithms.展开更多
Data mining has been a popular research area for more than a decade. There are several problems associated with data mining. Among them clustering is one of the most interesting problems. However, this problem becomes...Data mining has been a popular research area for more than a decade. There are several problems associated with data mining. Among them clustering is one of the most interesting problems. However, this problem becomes more challenging when dataset is distributed between different parties and they do not want to share their data. So, in this paper we propose a privacy preserving two party hierarchical clustering algorithm vertically partitioned data set. Each site only learns the final cluster centers, but nothing about the individual’s data.展开更多
. In this article, we describe the characteristics of large-scale modeling of the theme text of this site data and important progress in recent years. Topic modeling approach has attracted wide interest in the world, .... In this article, we describe the characteristics of large-scale modeling of the theme text of this site data and important progress in recent years. Topic modeling approach has attracted wide interest in the world, and promote a number of important data mining, development of computer vision and computational biology applications, including automatic text summaries, information retrieval, information recommendation, topic detection and tracking, natural scene understanding human action recognition and gene expression analysis. The main features of the model and the corresponding theme paper focuses on the text of this site data. Data with dynamic, high-end, multi-channel and distributed structure and the structure of the model is only part of the theme before modeling. The paper discussed in the framework of the unity of the three-dimensional Markov model of four structural features of the text of this site data modeling, and analysis of distributed computing and word combination of three-dimensional modeling topics Markov model and type fuzzy systems the possibility of applications. In addition to structural modeling for this site text data, also we discuss some of the three-dimensional Markov model energy minimization of machine learning algorithms.展开更多
To solve the problems of data sharing in social network,such as management of private data is too loose,access permissions are not clear,mode of data sharing is too single and soon on,we design a hierarchical access c...To solve the problems of data sharing in social network,such as management of private data is too loose,access permissions are not clear,mode of data sharing is too single and soon on,we design a hierarchical access control scheme of private data based on attribute encryption.First,we construct a new algorithm based on attribute encryption,which divides encryption into two phases,and we can design two types of attributes encryption strategy to make sure that different users could get their own decryption keys corresponding to their permissions.We encrypt the private data hierarchically with our algorithm to realize“precise”,“more accurate”,“fuzzy”and“private”four management modes,then users with higher permissions can access the private data inferior to their permissions.And we outsource some complex operations of decryption to DSP to ensure high efficiency on the premise of privacy protection.Finally,we analyze the efficiency and the security of our scheme.展开更多
An analysis has been conducted of the multi-hierarchical structure and jump of temperature variation for the globe, China and Yunnan Province over the past 100 years using an auto-adaptive, multi-resolution data filte...An analysis has been conducted of the multi-hierarchical structure and jump of temperature variation for the globe, China and Yunnan Province over the past 100 years using an auto-adaptive, multi-resolution data filter set up in You, Lin and Deng (1997). The result is shown below in three aspects. (l1 The variation of global temperature in this period is marked by warming on a large scale and can be divided into three stages of being cold (prior to 1919), warm (between 1920 and 1978) and warmer (since 1 979). Well-defined jumps are with the variation in correspondence with the hierarchical evolution on such scale, occurring in 1920 and 1979 when there is the most substantial jump towards warming. For the evolution on smaller scales, however, the variation has shown more of alternations of cold and warm temperatures. The preceding hierarchical structure and warming jump are added with new ones. (2) The trend in which temperature varies is much the same for China and the Yunnan Province, but it is not consistent with that globally, the largest difference being that a weak period of cold temperature in 1955 - 1978 across the globe was suspended in 1979 when it jumped to a significant warming,while a period of very cold temperature in 1955 - 1986 in China and Yunnan was not followed by warming in similar extent until 1987. (3) Though there are consistent hierarchical structure and jumping features throughout the year in Yunnan, significant changes with season are also present and the most striking difference is that temperature tends to vary consistently with China in winter and spring but with the globe in summer and fall.展开更多
Hierarchical clustering analysis based on statistic s is one of the most important mining algorithms, but the traditionary hierarchica l clustering method is based on global comparing, which only takes in Q clusteri n...Hierarchical clustering analysis based on statistic s is one of the most important mining algorithms, but the traditionary hierarchica l clustering method is based on global comparing, which only takes in Q clusteri ng while ignoring R clustering in practice, so it has some limitation especially when the number of sample and index is very large. Furthermore, because of igno ring the association between the different indexes, the clustering result is not good & true. In this paper, we present the model and the algorithm of two-level hierarchi cal clustering which integrates Q clustering with R clustering. Moreover, becaus e two-level hierarchical clustering is based on the respective clustering resul t of each class, the classification of the indexes directly effects on the a ccuracy of the final clustering result, how to appropriately classify the inde xes is the chief and difficult problem we must handle in advance. Although some literatures also have referred to the issue of the classificati on of the indexes, but the articles classify the indexes only according to their superficial signification, which is unscientific. The reasons are as follow s: First, the superficial signification of some indexes usually takes on different meanings and it is easy to be misapprehended by different person. Furthermore, t his classification method seldom make use of history data, the classification re sult is not so objective. Second, for some indexes, its superficial signification didn’t show any mean ings, so simply from the superficial signification, we can’t classify them to c ertain classes. Third, this classification method need the users have higher level knowledge of this field, otherwise it is difficult for the users to understand the signifi cation of some indexes, which sometimes is not available. So in this paper, to this question, we first use R clustering method to cluste ring indexes, dividing p dimension indexes into q classes, then adopt two-level clustering method to get the final result. Obviously, the classification result is more objective and accurate. Moreover, after the first step, we can get the relation of the different indexes and their interaction. We can also know under a certain class indexes, which samples can be clustering to a class. (These semi finished results sometimes are very useful.) The experiments also indicates the effective and accurate of the algorithms. And, the result of R clustering ca n be easily used for the later practice.展开更多
An Information-Centric Network(ICN)provides a promising paradigm for the upcoming internet architecture,which will struggle with steady growth in data and changes in accessmodels.Various ICN architectures have been de...An Information-Centric Network(ICN)provides a promising paradigm for the upcoming internet architecture,which will struggle with steady growth in data and changes in accessmodels.Various ICN architectures have been designed,including Named Data Networking(NDN),which is designed around content delivery instead of hosts.As data is the central part of the network.Therefore,NDN was developed to get rid of the dependency on IP addresses and provide content effectively.Mobility is one of the major research dimensions for this upcoming internet architecture.Some research has been carried out to solve the mobility issues,but it still has problems like handover delay and packet loss ratio during real-time video streaming in the case of consumer and producer mobility.To solve this issue,an efficient hierarchical Cluster Base Proactive Caching for Device Mobility Management(CB-PC-DMM)in NDN Vehicular Networks(NDN-VN)is proposed,through which the consumer receives the contents proactively after handover during the mobility of the consumer.When a consumer moves to the next destination,a handover interest is sent to the connected router,then the router multicasts the consumer’s desired data packet to the next hop of neighboring routers.Thus,once the handover process is completed,consumers can easily get the content to the newly connected router.A CB-PCDMM in NDN-VN is proposed that improves the packet delivery ratio and reduces the handover delay aswell as cluster overhead.Moreover,the intra and inter-domain handover handling procedures in CB-PC-DMM for NDN-VN have been described.For the validation of our proposed scheme,MATLAB simulations are conducted.The simulation results show that our proposed scheme reduces the handover delay and increases the consumer’s interest satisfaction ratio.The proposed scheme is compared with the existing stateof-the-art schemes,and the total percentage of handover delays is decreased by up to 0.1632%,0.3267%,2.3437%,2.3255%,and 3.7313%at the mobility speeds of 5 m/s,10 m/s,15 m/s,20 m/s,and 25 m/s,and the efficiency of the packet delivery ratio is improved by up to 1.2048%,5.0632%,6.4935%,6.943%,and 8.4507%.Furthermore,the simulation results of our proposed scheme show better efficiency in terms of Packet Delivery Ratio(PDR)from 0.071 to 0.077 and a decrease in the handover delay from 0.1334 to 0.129.展开更多
基金the National Natural Science Foundation of China(Grant No.62062001)Ningxia Youth Top Talent Project(2021).
文摘In the realm of data privacy protection,federated learning aims to collaboratively train a global model.However,heterogeneous data between clients presents challenges,often resulting in slow convergence and inadequate accuracy of the global model.Utilizing shared feature representations alongside customized classifiers for individual clients emerges as a promising personalized solution.Nonetheless,previous research has frequently neglected the integration of global knowledge into local representation learning and the synergy between global and local classifiers,thereby limiting model performance.To tackle these issues,this study proposes a hierarchical optimization method for federated learning with feature alignment and the fusion of classification decisions(FedFCD).FedFCD regularizes the relationship between global and local feature representations to achieve alignment and incorporates decision information from the global classifier,facilitating the late fusion of decision outputs from both global and local classifiers.Additionally,FedFCD employs a hierarchical optimization strategy to flexibly optimize model parameters.Through experiments on the Fashion-MNIST,CIFAR-10 and CIFAR-100 datasets,we demonstrate the effectiveness and superiority of FedFCD.For instance,on the CIFAR-100 dataset,FedFCD exhibited a significant improvement in average test accuracy by 6.83%compared to four outstanding personalized federated learning approaches.Furthermore,extended experiments confirm the robustness of FedFCD across various hyperparameter values.
文摘Many approaches have been proposed to pre-compute data cubes in order to efficiently respond to OLAP queries in data warehouses. However, few have proposed solutions integrating all of the possible outcomes, and it is this idea that leads the integration of hierarchical dimensions into these responses. To meet this need, we propose, in this paper, a complete redefinition of the framework and the formal definition of traditional database analysis through the prism of hierarchical dimensions. After characterizing the hierarchical data cube lattice, we introduce the hierarchical data cube and its most concise reduced representation, the closed hierarchical data cube. It offers compact replication so as to optimize storage space by removing redundancies of strongly correlated data. Such data are typical of data warehouses, and in particular in video games, our field of study and experimentation, where hierarchical dimension attributes are widely represented.
基金supported by National Natural Science Foundation of China(NSFC)under Grant Number T2350710232.
文摘Real-time health data monitoring is pivotal for bolstering road services’safety,intelligence,and efficiency within the Internet of Health Things(IoHT)framework.Yet,delays in data retrieval can markedly hinder the efficacy of big data awareness detection systems.We advocate for a collaborative caching approach involving edge devices and cloud networks to combat this.This strategy is devised to streamline the data retrieval path,subsequently diminishing network strain.Crafting an adept cache processing scheme poses its own set of challenges,especially given the transient nature of monitoring data and the imperative for swift data transmission,intertwined with resource allocation tactics.This paper unveils a novel mobile healthcare solution that harnesses the power of our collaborative caching approach,facilitating nuanced health monitoring via edge devices.The system capitalizes on cloud computing for intricate health data analytics,especially in pinpointing health anomalies.Given the dynamic locational shifts and possible connection disruptions,we have architected a hierarchical detection system,particularly during crises.This system caches data efficiently and incorporates a detection utility to assess data freshness and potential lag in response times.Furthermore,we introduce the Cache-Assisted Real-Time Detection(CARD)model,crafted to optimize utility.Addressing the inherent complexity of the NP-hard CARD model,we have championed a greedy algorithm as a solution.Simulations reveal that our collaborative caching technique markedly elevates the Cache Hit Ratio(CHR)and data freshness,outshining its contemporaneous benchmark algorithms.The empirical results underscore the strength and efficiency of our innovative IoHT-based health monitoring solution.To encapsulate,this paper tackles the nuances of real-time health data monitoring in the IoHT landscape,presenting a joint edge-cloud caching strategy paired with a hierarchical detection system.Our methodology yields enhanced cache efficiency and data freshness.The corroborative numerical data accentuates the feasibility and relevance of our model,casting a beacon for the future trajectory of real-time health data monitoring systems.
文摘The zero_failure data research is a new field in the recent years, but it is required urgently in practical projects, so the work has more theory and practical values. In this paper, for zero_failure data (t i,n i) at moment t i , if the prior distribution of the failure probability p i=p{T【t i} is quasi_exponential distribution, the author gives the p i Bayesian estimation and hierarchical Bayesian estimation and the reliability under zero_failure date condition is also obtained.
文摘Most semi-structured data are of certain structure regularity. Having beenstored as structured data in relational database (RDB), they can be effectively managed by databasemanagement system (DBMS). Some semi-structured data are difficult to transform due to theirirregular structures. We design an efficient algorithm and data structure for ensuring losslesstransformation. We bring forward an approach of schema extraction through data mining, in whichdifferent kinds of elements are transformed respectively and lossless mapping from semi-structureddata to structured data can be achieved.
基金Supported by the National High-Technology Re-search and Development Programof China(2001AA115300) the Na-tional Natural Science Foundation of China (69874038) ,the Nat-ural Science Foundation of Liaoning Province(20031018)
文摘How to design a multicast key management system with high performance is a hot issue now. This paper will apply the idea of hierarchical data processing to construct a common analytic model based on directed logical key tree and supply two important metrics to this problem: re-keying cost and key storage cost. The paper gives the basic theory to the hierarchical data processing and the analyzing model to multieast key management based on logical key tree. It has been proved that the 4-ray tree has the best performance in using these metrics. The key management problem is also investigated based on user probability model, and gives two evaluating parameters to re-keying and key storage cost.
文摘Big data is becoming increasingly important because of the enormous information generation and storage in recent years.It has become a challenge to the data mining technique and management.Based on the characteristics of geometric explosion of information in the era of big data,this paper studies the possible approaches to balance the maximum value and privacy of information,and disposes the Nine-Cells information matrix,hierarchical classification.Furthermore,the paper uses the rough sets theory to proceed from the two dimensions of value and privacy,establishes information classification method,puts forward the countermeasures for information security.Taking spam messages for example,the massive spam messages can be classified,and then targeted hierarchical management strategy was put forward.This paper proposes personal Information index system,Information management platform and possible solutions to protect information security and utilize information value in the age of big data.
基金supported in part by NIH grants R01NS39600,U01MH114829RF1MH128693(to GAA)。
文摘Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.
文摘A multilevel secure relation hierarchical data model for multilevel secure database is extended from the relation hierarchical data model in single level environment in this paper. Based on the model, an upper lower layer relationalintegrity is presented after we analyze and eliminate the covert channels caused by the database integrity.Two SQL statements are extended to process polyinstantiation in the multilevel secure environment.The system based on the multilevel secure relation hierarchical data model is capable of integratively storing and manipulating complicated objects ( e.g. , multilevel spatial data) and conventional data ( e.g. , integer, real number and character string) in multilevel secure database.
文摘This paper focuses on the methods and process of spatial aggregation based on semantic and geometric characteristics of spatial objects and relations among the objects with the help of spatial data structure (Formal Data Structure),the Local Constrained Delaunay Triangulations and semantic hierarchy.The adjacent relation among connected objects and unconnected objects has been studied through constrained triangle as elementary processing unit in aggregation operation.The hierarchical semantic analytical matrix is given for analyzing the similarity between objects types and between objects.Several different cases of aggregation have been presented in this paper.
基金The research was supported by the National Natural Science Foundation of China under grant No:60775036, 60970061the Higher Education Nature Science Research Fund Project of Jiangsu Province under grant No: 09KJD520004.
文摘Decision rules mining is an important issue in machine learning and data mining.However,most proposed algorithms mine categorical data at single level,and these rules are not easily understandable and really useful for users.Thus,a new approach to hierarchical decision rules mining is provided in this paper,in which similarity direction measure is introduced to deal with hybrid data.This approach can mine hierarchical decision rules by adjusting similarity measure parameters and the level of concept hierarchy trees.
文摘A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to discover schema knowledge implicit in the semi structured data. This knowledge can make users understand the information structure on the web more deeply and thourouly. At the same time, it can also provide a kind of effective schema for the querying of web information.
基金the General Program of National Natural Science Foundation of China(62072049).
文摘The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized cloud server is not applicable due to data privacy and communication costs concerns,hindering artificial intelligence from empowering mobile devices.Moreover,these data are not identically and independently distributed(Non-IID)caused by their different context,which will deteriorate the performance of the model.To address these issues,we propose a novel Distributed Learning algorithm based on hierarchical clustering and Adaptive Dataset Condensation,named ADC-DL,which learns a shared model by collecting the synthetic samples generated on each device.To tackle the heterogeneity of data distribution,we propose an entropy topsis comprehensive tiering model for hierarchical clustering,which distinguishes clients in terms of their data characteristics.Subsequently,synthetic dummy samples are generated based on the hierarchical structure utilizing adaptive dataset condensation.The procedure of dataset condensation can be adjusted adaptively according to the tier of the client.Extensive experiments demonstrate that the performance of our ADC-DL is more outstanding in prediction accuracy and communication costs compared with existing algorithms.
文摘Data mining has been a popular research area for more than a decade. There are several problems associated with data mining. Among them clustering is one of the most interesting problems. However, this problem becomes more challenging when dataset is distributed between different parties and they do not want to share their data. So, in this paper we propose a privacy preserving two party hierarchical clustering algorithm vertically partitioned data set. Each site only learns the final cluster centers, but nothing about the individual’s data.
文摘. In this article, we describe the characteristics of large-scale modeling of the theme text of this site data and important progress in recent years. Topic modeling approach has attracted wide interest in the world, and promote a number of important data mining, development of computer vision and computational biology applications, including automatic text summaries, information retrieval, information recommendation, topic detection and tracking, natural scene understanding human action recognition and gene expression analysis. The main features of the model and the corresponding theme paper focuses on the text of this site data. Data with dynamic, high-end, multi-channel and distributed structure and the structure of the model is only part of the theme before modeling. The paper discussed in the framework of the unity of the three-dimensional Markov model of four structural features of the text of this site data modeling, and analysis of distributed computing and word combination of three-dimensional modeling topics Markov model and type fuzzy systems the possibility of applications. In addition to structural modeling for this site text data, also we discuss some of the three-dimensional Markov model energy minimization of machine learning algorithms.
文摘To solve the problems of data sharing in social network,such as management of private data is too loose,access permissions are not clear,mode of data sharing is too single and soon on,we design a hierarchical access control scheme of private data based on attribute encryption.First,we construct a new algorithm based on attribute encryption,which divides encryption into two phases,and we can design two types of attributes encryption strategy to make sure that different users could get their own decryption keys corresponding to their permissions.We encrypt the private data hierarchically with our algorithm to realize“precise”,“more accurate”,“fuzzy”and“private”four management modes,then users with higher permissions can access the private data inferior to their permissions.And we outsource some complex operations of decryption to DSP to ensure high efficiency on the premise of privacy protection.Finally,we analyze the efficiency and the security of our scheme.
文摘An analysis has been conducted of the multi-hierarchical structure and jump of temperature variation for the globe, China and Yunnan Province over the past 100 years using an auto-adaptive, multi-resolution data filter set up in You, Lin and Deng (1997). The result is shown below in three aspects. (l1 The variation of global temperature in this period is marked by warming on a large scale and can be divided into three stages of being cold (prior to 1919), warm (between 1920 and 1978) and warmer (since 1 979). Well-defined jumps are with the variation in correspondence with the hierarchical evolution on such scale, occurring in 1920 and 1979 when there is the most substantial jump towards warming. For the evolution on smaller scales, however, the variation has shown more of alternations of cold and warm temperatures. The preceding hierarchical structure and warming jump are added with new ones. (2) The trend in which temperature varies is much the same for China and the Yunnan Province, but it is not consistent with that globally, the largest difference being that a weak period of cold temperature in 1955 - 1978 across the globe was suspended in 1979 when it jumped to a significant warming,while a period of very cold temperature in 1955 - 1986 in China and Yunnan was not followed by warming in similar extent until 1987. (3) Though there are consistent hierarchical structure and jumping features throughout the year in Yunnan, significant changes with season are also present and the most striking difference is that temperature tends to vary consistently with China in winter and spring but with the globe in summer and fall.
文摘Hierarchical clustering analysis based on statistic s is one of the most important mining algorithms, but the traditionary hierarchica l clustering method is based on global comparing, which only takes in Q clusteri ng while ignoring R clustering in practice, so it has some limitation especially when the number of sample and index is very large. Furthermore, because of igno ring the association between the different indexes, the clustering result is not good & true. In this paper, we present the model and the algorithm of two-level hierarchi cal clustering which integrates Q clustering with R clustering. Moreover, becaus e two-level hierarchical clustering is based on the respective clustering resul t of each class, the classification of the indexes directly effects on the a ccuracy of the final clustering result, how to appropriately classify the inde xes is the chief and difficult problem we must handle in advance. Although some literatures also have referred to the issue of the classificati on of the indexes, but the articles classify the indexes only according to their superficial signification, which is unscientific. The reasons are as follow s: First, the superficial signification of some indexes usually takes on different meanings and it is easy to be misapprehended by different person. Furthermore, t his classification method seldom make use of history data, the classification re sult is not so objective. Second, for some indexes, its superficial signification didn’t show any mean ings, so simply from the superficial signification, we can’t classify them to c ertain classes. Third, this classification method need the users have higher level knowledge of this field, otherwise it is difficult for the users to understand the signifi cation of some indexes, which sometimes is not available. So in this paper, to this question, we first use R clustering method to cluste ring indexes, dividing p dimension indexes into q classes, then adopt two-level clustering method to get the final result. Obviously, the classification result is more objective and accurate. Moreover, after the first step, we can get the relation of the different indexes and their interaction. We can also know under a certain class indexes, which samples can be clustering to a class. (These semi finished results sometimes are very useful.) The experiments also indicates the effective and accurate of the algorithms. And, the result of R clustering ca n be easily used for the later practice.
基金This work was supported by the MSIT(Ministry of Science and ICT),Korea,under the ITRC(Information Technology Research Center)support program(IITP-2023-2018-0-01431)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation).
文摘An Information-Centric Network(ICN)provides a promising paradigm for the upcoming internet architecture,which will struggle with steady growth in data and changes in accessmodels.Various ICN architectures have been designed,including Named Data Networking(NDN),which is designed around content delivery instead of hosts.As data is the central part of the network.Therefore,NDN was developed to get rid of the dependency on IP addresses and provide content effectively.Mobility is one of the major research dimensions for this upcoming internet architecture.Some research has been carried out to solve the mobility issues,but it still has problems like handover delay and packet loss ratio during real-time video streaming in the case of consumer and producer mobility.To solve this issue,an efficient hierarchical Cluster Base Proactive Caching for Device Mobility Management(CB-PC-DMM)in NDN Vehicular Networks(NDN-VN)is proposed,through which the consumer receives the contents proactively after handover during the mobility of the consumer.When a consumer moves to the next destination,a handover interest is sent to the connected router,then the router multicasts the consumer’s desired data packet to the next hop of neighboring routers.Thus,once the handover process is completed,consumers can easily get the content to the newly connected router.A CB-PCDMM in NDN-VN is proposed that improves the packet delivery ratio and reduces the handover delay aswell as cluster overhead.Moreover,the intra and inter-domain handover handling procedures in CB-PC-DMM for NDN-VN have been described.For the validation of our proposed scheme,MATLAB simulations are conducted.The simulation results show that our proposed scheme reduces the handover delay and increases the consumer’s interest satisfaction ratio.The proposed scheme is compared with the existing stateof-the-art schemes,and the total percentage of handover delays is decreased by up to 0.1632%,0.3267%,2.3437%,2.3255%,and 3.7313%at the mobility speeds of 5 m/s,10 m/s,15 m/s,20 m/s,and 25 m/s,and the efficiency of the packet delivery ratio is improved by up to 1.2048%,5.0632%,6.4935%,6.943%,and 8.4507%.Furthermore,the simulation results of our proposed scheme show better efficiency in terms of Packet Delivery Ratio(PDR)from 0.071 to 0.077 and a decrease in the handover delay from 0.1334 to 0.129.