Recently clustering techniques have been used to automatically discover typical user profiles. In general, it is a challenging problem to design effective similarity measure between the session vectors which are usual...Recently clustering techniques have been used to automatically discover typical user profiles. In general, it is a challenging problem to design effective similarity measure between the session vectors which are usually high-dimensional and sparse. Two approaches for mining typical user profiles, based on matrix dimensionality reduction, are presented. In these approaches, non-negative matrix factorization is applied to reduce dimensionality of the session-URL matrix, and the projecting vectors of the user-session vectors are clustered into typical user-session profiles using the spherical k -means algorithm. The results show that two algorithms are successful in mining many typical user profiles in the user sessions.展开更多
Integrated with an improved architectural vulnerability factor (AVF) computing model, a new architectural level soft error reliability analysis framework, SS-SERA (soft error reliability analysis based on SimpleSca...Integrated with an improved architectural vulnerability factor (AVF) computing model, a new architectural level soft error reliability analysis framework, SS-SERA (soft error reliability analysis based on SimpleScalar), was developed. SS-SERA was used to estimate the AVFs for various on-chip structures accurately. Experimental results show that the AVFs of issue queue (IQ), register update units (RUU), load store queue (LSQ) and functional unit (FU) are 38.11%, 22.17%, 23.05% and 24.43%, respectively. For address-based structures, i.e., levell data cache (LID), DTLB, level2 unified cache (L2U), levell instruction cache (LII) and ITLB, AVFs of their data arrays are 22.86%, 27.57%, 14.80%, 8.25% and 12.58%, lower than their tag arrays' AVFs which are 30.01%, 28.89%, 17.69%, 10.26% and 13.84%, respectively. Furthermore, using the AVF values obtained with SS-SERA, a qualitative and quantitative analysis of the AVF variation and predictability was performed for the structures studied. Experimental results show that the AVF exhibits significant variations across different structures and workloads, and is influenced by multiple microarchitectural metrics and their interactions. Besides, AVFs of SPEC2K floating point programs exhibit better predictability than SPEC2K integer programs.展开更多
Background knowledge is important for data mining, especially in complicated situation. Ontological engineering is the successor of knowledge engineering. The sharable knowledge bases built on ontology can be used to ...Background knowledge is important for data mining, especially in complicated situation. Ontological engineering is the successor of knowledge engineering. The sharable knowledge bases built on ontology can be used to provide background knowledge to direct the process of data mining. This paper gives a common introduction to the method and presents a practical analysis example using SVM (support vector machine) as the classifier. Gene Ontology and the accompanying annotations compose a big knowledge base, on which many researches have been carried out. Microarray dataset is the output of DNA chip. With the help of Gene Ontology we present a more elaborate analysis on microarray data than former researchers. The method can also be used in other fields with similar scenario.展开更多
Traffic matrix is an abstract representation of the traffic volume flowing between sets of source and destination pairs.It is a key input parameter of network operations management,planning,provisioning and traffic en...Traffic matrix is an abstract representation of the traffic volume flowing between sets of source and destination pairs.It is a key input parameter of network operations management,planning,provisioning and traffic engineering.Traffic matrix is also important in the context of OpenFlow-based networks.Because even good measurement systems can suffer from errors and data collection systems can fail,missing values are common.Existing matrix completion methods do not consider traffic exhibit characteristics and only provide a finite precision.To address this problem,this paper proposes a novel approach based on compressive sensing and traffic self-similarity to reconstruct the missing traffic flow data.Firstly,we analyze the realworld traffic matrix,which all exhibit lowrank structure,temporal smoothness feature and spatial self-similarity.Then,we propose Self-Similarity and Temporal Compressive Sensing(SSTCS) algorithm to reconstruct the missing traffic data.The extensive experiments with the real-world traffic matrix show that our proposed SSTCS can significantly reduce data reconstruction errors and achieve satisfactory accuracy comparing with the existing solutions.Typically SSTCS can successfully reconstruct the traffic matrix with less than 32%errors when as much as98%of the data is missing.展开更多
The shell model calculations in the sdgh major shell for the neutron-deficient ^106,107,108,109Sn isotopes have been carried out by using CD-Bonn and Nijmegenl two-body effective nucleon-nucleon interactions. The sing...The shell model calculations in the sdgh major shell for the neutron-deficient ^106,107,108,109Sn isotopes have been carried out by using CD-Bonn and Nijmegenl two-body effective nucleon-nucleon interactions. The singleshell states and the corresponding matrix elements needed for describing Sn isotopes are reconstructed to calculate the coefficient of fractional parantage by reducing the calculation requirements. This reconstruction allows us to do the shell model calculations of the neutron deficient Sn isotopes in very reasonable time. The results are compared to the recent high-resolution experimental data and found to be in good agreement with experiments.展开更多
An ocean reanalysis system for the joining area of Asia and Indian-Pacific Ocean (AIPO) has been developed and is currently delivering reanalysis data sets for study on the air-sea interaction over AIPO and its climat...An ocean reanalysis system for the joining area of Asia and Indian-Pacific Ocean (AIPO) has been developed and is currently delivering reanalysis data sets for study on the air-sea interaction over AIPO and its climate variation over China in the inter-annual time scale.This system consists of a nested ocean model forced by atmospheric reanalysis,an ensemble-based multivariate ocean data assimilation system and various ocean observations.The following report describes the main components of the data assimilation system in detail.The system adopts an ensemble optimal interpolation scheme that uses a seasonal update from a free running model to estimate the background error covariance matrix.In view of the systematic biases in some observation systems,some treatments were performed on the observations before the assimilation.A coarse resolution reanalysis dataset from the system is preliminarily evaluated to demonstrate the performance of the system for the period 1992 to 2006 by comparing this dataset with other observations or reanalysis data.展开更多
Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculat...Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculate similarity. And a sequential NPsim matrix is built to improve indexing performance. To sum up the above innovations,a nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix is proposed in comparison with the nearest neighbor search algorithms based on KD-tree or SR-tree on Munsell spectral data set. Experimental results show that the proposed algorithm similarity is better than that of other algorithms and searching speed is more than thousands times of others. In addition,the slow construction speed of sequential NPsim matrix can be increased by using parallel computing.展开更多
With common borders of the population, total area, and GDP (PPP-based) of Economic Cooperation Organization (ECO) member states are estimated as 416 million persons, 7.9 million m2, and US$2.7 trillion respective...With common borders of the population, total area, and GDP (PPP-based) of Economic Cooperation Organization (ECO) member states are estimated as 416 million persons, 7.9 million m2, and US$2.7 trillion respectively (2010 data). Although heterogeneous in the extent, there is economic development, overall, with serious energy and transport-transit relations among countries that is reflected in growing trade turnover year-by-year. However, there are still rather unused resources and capacity in such areas of cooperation among countries as exchange of energy, transport services, agricultural and industrial goods, use of opportunities for tourism, promoting investment and innovation processes and other areas. Certainly, maximum and optimal use of these resources calls for availability of analytical means capable of accounting for relations both within member states and among them. The implementation of computable general equilibrium (CGE) modeling in each member state would thus be of great significance in resolution of these problems both in terms of accounting for input-output linkages within the countries as well as enabling impact of main trading partners and goods and services among countries. The analysis carried out indicates that there are a number of problems in application of CGE model in most of the member states. As such, input-output tables are not compiled in some countries, while in others despite the fact that these tables are compiled, there are no attempts to build the model, yet in other countries, even if the CGE model is implemented, there are difficulties in taking into account the real results in the face of serious problems related to improving national accounts system database. Summarizing these problems, it is possible to conclude that to ensure the application of a CGE model,there is a great need to work out procedures of compilation of a social accounts matrix (SAM) that lies on the basis of this model, for which the relevant statistics of a member state must be improved. Considering the above-mentioned, the presented research, makes procedures and proposals on compilation of SAM, improves statistical data for researching the extent of application of CGE Model in ECO member states, and identifies the degree of availability and organization of relevant data to develop input-output tables and respective SAM.展开更多
Researchers face many class prediction challenges stemming from a small size of training data vis-a-vis a large number of unlabeled samples to be predicted. Transductive learning is proposed to utilize information abo...Researchers face many class prediction challenges stemming from a small size of training data vis-a-vis a large number of unlabeled samples to be predicted. Transductive learning is proposed to utilize information about unlabeled data to estimate labels of the unlabeled data for this condition. This work presents a new transductive learning method called two-way Markov random walk(TMRW) algorithm. The algorithm uses information about labeled and unlabeled data to predict the labels of the unlabeled data by taking random walks between the labeled and unlabeled data where data points are viewed as nodes of a graph. The labeled points correlate to unlabeled points and vice versa according to a transition probability matrix. We can get the predicted labels of unlabeled samples by combining the results of the two-way walks. Finally, ensemble learning is combined with transductive learning, and Adboost.MH is taken as the study framework to improve the performance of TMRW, which is the basic learner. Experiments show that this algorithm can predict labels of unlabeled data well.展开更多
The H_∞ performance analysis and controller design for linear networked control systems(NCSs) are presented.The NCSs are considered a linear continuous system with time-varying interval input delay by assuming that t...The H_∞ performance analysis and controller design for linear networked control systems(NCSs) are presented.The NCSs are considered a linear continuous system with time-varying interval input delay by assuming that the sensor is time-driven and the logic Zero-order-holder(ZOH) and controller are event-driven.Based on this model,the delay interval is divided into two equal subintervals for H_∞ performance analysis.An improved H_∞ stabilization condition is obtained in linear matrix inequalities(LMIs) framework by adequately considering the information about the bounds of the input delay to construct novel Lyapunov–Krasovskii functionals(LKFs).For the purpose of reducing the conservatism of the proposed results,the bounds of the LKFs differential cross terms are properly estimated without introducing any slack matrix variables.Moreover,the H_∞ controller is reasonably designed to guarantee the robust asymptotic stability for the linear NCSs with an H_∞ performance level γ.Numerical simulation examples are included to validate the reduced conservatism and effectiveness of our proposed method.展开更多
文摘Recently clustering techniques have been used to automatically discover typical user profiles. In general, it is a challenging problem to design effective similarity measure between the session vectors which are usually high-dimensional and sparse. Two approaches for mining typical user profiles, based on matrix dimensionality reduction, are presented. In these approaches, non-negative matrix factorization is applied to reduce dimensionality of the session-URL matrix, and the projecting vectors of the user-session vectors are clustered into typical user-session profiles using the spherical k -means algorithm. The results show that two algorithms are successful in mining many typical user profiles in the user sessions.
基金Projects(60970036,60873016,61170045)supported by the National Natural Science Foundation of ChinaProjects(2009AA01Z102,2009AA01Z124)supported by the National High Technology Development Program of China
文摘Integrated with an improved architectural vulnerability factor (AVF) computing model, a new architectural level soft error reliability analysis framework, SS-SERA (soft error reliability analysis based on SimpleScalar), was developed. SS-SERA was used to estimate the AVFs for various on-chip structures accurately. Experimental results show that the AVFs of issue queue (IQ), register update units (RUU), load store queue (LSQ) and functional unit (FU) are 38.11%, 22.17%, 23.05% and 24.43%, respectively. For address-based structures, i.e., levell data cache (LID), DTLB, level2 unified cache (L2U), levell instruction cache (LII) and ITLB, AVFs of their data arrays are 22.86%, 27.57%, 14.80%, 8.25% and 12.58%, lower than their tag arrays' AVFs which are 30.01%, 28.89%, 17.69%, 10.26% and 13.84%, respectively. Furthermore, using the AVF values obtained with SS-SERA, a qualitative and quantitative analysis of the AVF variation and predictability was performed for the structures studied. Experimental results show that the AVF exhibits significant variations across different structures and workloads, and is influenced by multiple microarchitectural metrics and their interactions. Besides, AVFs of SPEC2K floating point programs exhibit better predictability than SPEC2K integer programs.
基金Project (No. 20040248001) supported by the Ph.D. Programs Foun-dation of Ministry of Education of China
文摘Background knowledge is important for data mining, especially in complicated situation. Ontological engineering is the successor of knowledge engineering. The sharable knowledge bases built on ontology can be used to provide background knowledge to direct the process of data mining. This paper gives a common introduction to the method and presents a practical analysis example using SVM (support vector machine) as the classifier. Gene Ontology and the accompanying annotations compose a big knowledge base, on which many researches have been carried out. Microarray dataset is the output of DNA chip. With the help of Gene Ontology we present a more elaborate analysis on microarray data than former researchers. The method can also be used in other fields with similar scenario.
基金This work is supported by the Prospcctive Research Project on Future Networks of Jiangsu Future Networks Innovation Institute under Grant No.BY2013095-1-05, the National Ba- sic Research Program of China (973) under Grant No. 2012CB315805 and the National Natural Science Foundation of China under Grants No. 61173167.
文摘Traffic matrix is an abstract representation of the traffic volume flowing between sets of source and destination pairs.It is a key input parameter of network operations management,planning,provisioning and traffic engineering.Traffic matrix is also important in the context of OpenFlow-based networks.Because even good measurement systems can suffer from errors and data collection systems can fail,missing values are common.Existing matrix completion methods do not consider traffic exhibit characteristics and only provide a finite precision.To address this problem,this paper proposes a novel approach based on compressive sensing and traffic self-similarity to reconstruct the missing traffic flow data.Firstly,we analyze the realworld traffic matrix,which all exhibit lowrank structure,temporal smoothness feature and spatial self-similarity.Then,we propose Self-Similarity and Temporal Compressive Sensing(SSTCS) algorithm to reconstruct the missing traffic data.The extensive experiments with the real-world traffic matrix show that our proposed SSTCS can significantly reduce data reconstruction errors and achieve satisfactory accuracy comparing with the existing solutions.Typically SSTCS can successfully reconstruct the traffic matrix with less than 32%errors when as much as98%of the data is missing.
基金supported in part by the Scientific and Technological Council of Turkey under Grant No.TUBITAK 105T092Süleyman Demirel University under Grant No.SDUBAP 1075-m-05
文摘The shell model calculations in the sdgh major shell for the neutron-deficient ^106,107,108,109Sn isotopes have been carried out by using CD-Bonn and Nijmegenl two-body effective nucleon-nucleon interactions. The singleshell states and the corresponding matrix elements needed for describing Sn isotopes are reconstructed to calculate the coefficient of fractional parantage by reducing the calculation requirements. This reconstruction allows us to do the shell model calculations of the neutron deficient Sn isotopes in very reasonable time. The results are compared to the recent high-resolution experimental data and found to be in good agreement with experiments.
基金supported by the Chinese Academy of Sciences (Grant No. KZCX2-YW-202)the 973 Pro-gram (Grant No. 2006CB403606),the 863 Program (Grant No.2009AA12Z138)the National Natural Science Foundation of China (Grant Nos. 40606008,40437017,and 40221503)
文摘An ocean reanalysis system for the joining area of Asia and Indian-Pacific Ocean (AIPO) has been developed and is currently delivering reanalysis data sets for study on the air-sea interaction over AIPO and its climate variation over China in the inter-annual time scale.This system consists of a nested ocean model forced by atmospheric reanalysis,an ensemble-based multivariate ocean data assimilation system and various ocean observations.The following report describes the main components of the data assimilation system in detail.The system adopts an ensemble optimal interpolation scheme that uses a seasonal update from a free running model to estimate the background error covariance matrix.In view of the systematic biases in some observation systems,some treatments were performed on the observations before the assimilation.A coarse resolution reanalysis dataset from the system is preliminarily evaluated to demonstrate the performance of the system for the period 1992 to 2006 by comparing this dataset with other observations or reanalysis data.
基金Supported by the National Natural Science Foundation of China(No.61300078)the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions(No.CIT&TCD201504039)+1 种基金Funding Project for Academic Human Resources Development in Beijing Union University(No.BPHR2014A03,Rk100201510)"New Start"Academic Research Projects of Beijing Union University(No.Hzk10201501)
文摘Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculate similarity. And a sequential NPsim matrix is built to improve indexing performance. To sum up the above innovations,a nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix is proposed in comparison with the nearest neighbor search algorithms based on KD-tree or SR-tree on Munsell spectral data set. Experimental results show that the proposed algorithm similarity is better than that of other algorithms and searching speed is more than thousands times of others. In addition,the slow construction speed of sequential NPsim matrix can be increased by using parallel computing.
文摘With common borders of the population, total area, and GDP (PPP-based) of Economic Cooperation Organization (ECO) member states are estimated as 416 million persons, 7.9 million m2, and US$2.7 trillion respectively (2010 data). Although heterogeneous in the extent, there is economic development, overall, with serious energy and transport-transit relations among countries that is reflected in growing trade turnover year-by-year. However, there are still rather unused resources and capacity in such areas of cooperation among countries as exchange of energy, transport services, agricultural and industrial goods, use of opportunities for tourism, promoting investment and innovation processes and other areas. Certainly, maximum and optimal use of these resources calls for availability of analytical means capable of accounting for relations both within member states and among them. The implementation of computable general equilibrium (CGE) modeling in each member state would thus be of great significance in resolution of these problems both in terms of accounting for input-output linkages within the countries as well as enabling impact of main trading partners and goods and services among countries. The analysis carried out indicates that there are a number of problems in application of CGE model in most of the member states. As such, input-output tables are not compiled in some countries, while in others despite the fact that these tables are compiled, there are no attempts to build the model, yet in other countries, even if the CGE model is implemented, there are difficulties in taking into account the real results in the face of serious problems related to improving national accounts system database. Summarizing these problems, it is possible to conclude that to ensure the application of a CGE model,there is a great need to work out procedures of compilation of a social accounts matrix (SAM) that lies on the basis of this model, for which the relevant statistics of a member state must be improved. Considering the above-mentioned, the presented research, makes procedures and proposals on compilation of SAM, improves statistical data for researching the extent of application of CGE Model in ECO member states, and identifies the degree of availability and organization of relevant data to develop input-output tables and respective SAM.
基金Project(61232001) supported by National Natural Science Foundation of ChinaProject supported by the Construct Program of the Key Discipline in Hunan Province,China
文摘Researchers face many class prediction challenges stemming from a small size of training data vis-a-vis a large number of unlabeled samples to be predicted. Transductive learning is proposed to utilize information about unlabeled data to estimate labels of the unlabeled data for this condition. This work presents a new transductive learning method called two-way Markov random walk(TMRW) algorithm. The algorithm uses information about labeled and unlabeled data to predict the labels of the unlabeled data by taking random walks between the labeled and unlabeled data where data points are viewed as nodes of a graph. The labeled points correlate to unlabeled points and vice versa according to a transition probability matrix. We can get the predicted labels of unlabeled samples by combining the results of the two-way walks. Finally, ensemble learning is combined with transductive learning, and Adboost.MH is taken as the study framework to improve the performance of TMRW, which is the basic learner. Experiments show that this algorithm can predict labels of unlabeled data well.
基金Project (61304046) supported by the National Natural Science Funds for Young Scholar of ChinaProject (F201242) supported by Natural Science Foundation of Heilongjiang Province,China
文摘The H_∞ performance analysis and controller design for linear networked control systems(NCSs) are presented.The NCSs are considered a linear continuous system with time-varying interval input delay by assuming that the sensor is time-driven and the logic Zero-order-holder(ZOH) and controller are event-driven.Based on this model,the delay interval is divided into two equal subintervals for H_∞ performance analysis.An improved H_∞ stabilization condition is obtained in linear matrix inequalities(LMIs) framework by adequately considering the information about the bounds of the input delay to construct novel Lyapunov–Krasovskii functionals(LKFs).For the purpose of reducing the conservatism of the proposed results,the bounds of the LKFs differential cross terms are properly estimated without introducing any slack matrix variables.Moreover,the H_∞ controller is reasonably designed to guarantee the robust asymptotic stability for the linear NCSs with an H_∞ performance level γ.Numerical simulation examples are included to validate the reduced conservatism and effectiveness of our proposed method.