Data sparseness, the evident characteristic of short text, has always been regarded as the main cause of the low ac- curacy in the classification of short texts using statistical methods. Intensive research has been c...Data sparseness, the evident characteristic of short text, has always been regarded as the main cause of the low ac- curacy in the classification of short texts using statistical methods. Intensive research has been conducted in this area during the past decade. However, most researchers failed to notice that ignoring the semantic importance of certain feature terms might also contribute to low classification accuracy. In this paper we present a new method to tackle the problem by building a strong feature thesaurus (SFT) based on latent Dirichlet allocation (LDA) and information gain (IG) models. By giving larger weights to feature terms in SFT, the classification accuracy can be improved. Specifically, our method appeared to be more effective with more detailed classification. Experiments in two short text datasets demonstrate that our approach achieved improvement compared with the state-of-the-art methods including support vector machine (SVM) and Naive Bayes Multinomial.展开更多
Token protocol provides a new coherence framework for shared-memory multiprocessor systems. It avoids indirections of directory protocols for common cache-to-cache transfer misses, and achieves higher interconnect ban...Token protocol provides a new coherence framework for shared-memory multiprocessor systems. It avoids indirections of directory protocols for common cache-to-cache transfer misses, and achieves higher interconnect bandwidth and lower interconnect latency compared with snooping protocols. However, the broadcasting increases network traffic, limiting the scalability of token protocol. This paper describes an efficient technique to reduce the token protocol network traffic, called sharing relation cache. This cache provides destination set information for cache-to-cache miss requests by caching directory information for recent shared data. This paper introduces how to implement the technique in a token protocol. Simulations using SPLASH-2 benchmarks show that in a 16-core chip multiprocessor system, the cache reduced the network traffic by 15% on average.展开更多
Memory limitations are always a focus of computer architecture. The live range aware cache (LIRAC) offers a way to reduce memory access using live range information. In the LIRAC system, scratch data need not be wri...Memory limitations are always a focus of computer architecture. The live range aware cache (LIRAC) offers a way to reduce memory access using live range information. In the LIRAC system, scratch data need not be written back if the data will no longer be used. Three kinds of software support developed for LIRAC architecture use compiler analyses, binary analyses, and trace analyses. Trace analysis results show that LIRAC can eliminate 29% of cache write-backs on average and up to 83% in the best case for the SPEC CPU 2000 benchmark. These software techniques can show the feasibility and potential benefit of the LIRAC architecture.展开更多
基金Project (No. 20111081023) supported by the Tsinghua University Initiative Scientific Research Program, China
文摘Data sparseness, the evident characteristic of short text, has always been regarded as the main cause of the low ac- curacy in the classification of short texts using statistical methods. Intensive research has been conducted in this area during the past decade. However, most researchers failed to notice that ignoring the semantic importance of certain feature terms might also contribute to low classification accuracy. In this paper we present a new method to tackle the problem by building a strong feature thesaurus (SFT) based on latent Dirichlet allocation (LDA) and information gain (IG) models. By giving larger weights to feature terms in SFT, the classification accuracy can be improved. Specifically, our method appeared to be more effective with more detailed classification. Experiments in two short text datasets demonstrate that our approach achieved improvement compared with the state-of-the-art methods including support vector machine (SVM) and Naive Bayes Multinomial.
基金Supported by the National Natural Science Foundation of China (No. 60673145)the Basic Research Foundation of Tsinghua Na-tional Laboratory for Information Science and Technology (TNList)+1 种基金the Intel/University Sponsored Research, the National Key Basic Research and Development (973) Program of China (No. 2006CB303100)and the IBM China Research Laboratory
文摘Token protocol provides a new coherence framework for shared-memory multiprocessor systems. It avoids indirections of directory protocols for common cache-to-cache transfer misses, and achieves higher interconnect bandwidth and lower interconnect latency compared with snooping protocols. However, the broadcasting increases network traffic, limiting the scalability of token protocol. This paper describes an efficient technique to reduce the token protocol network traffic, called sharing relation cache. This cache provides destination set information for cache-to-cache miss requests by caching directory information for recent shared data. This paper introduces how to implement the technique in a token protocol. Simulations using SPLASH-2 benchmarks show that in a 16-core chip multiprocessor system, the cache reduced the network traffic by 15% on average.
基金Supported by the National Natural Science Foundation of China (No. 60673145)the Basic Research Foundation of Tsinghua Na-tional Laboratory for Information Science and Technology (TNList)+1 种基金the Intel/University Sponsored Research, the National Key Basic Research and Development (973) Program of China (No. 2006CB303100)the IBM China Research Laboratory
文摘Memory limitations are always a focus of computer architecture. The live range aware cache (LIRAC) offers a way to reduce memory access using live range information. In the LIRAC system, scratch data need not be written back if the data will no longer be used. Three kinds of software support developed for LIRAC architecture use compiler analyses, binary analyses, and trace analyses. Trace analysis results show that LIRAC can eliminate 29% of cache write-backs on average and up to 83% in the best case for the SPEC CPU 2000 benchmark. These software techniques can show the feasibility and potential benefit of the LIRAC architecture.