期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Efficient query processing framework for big data warehouse: an almost join-free approach 被引量:3
1
作者 huiju wang Xiongpai QIN +4 位作者 Xuan ZHOU Furong LI Zuoyan QIN Qing ZHU Shan wang 《Frontiers of Computer Science》 SCIE EI CSCD 2015年第2期224-236,共13页
The rapidly increasing scale of data warehouses is challenging today's data analytical technologies. A con- ventional data analytical platform processes data warehouse queries using a star schema -- it normalizes the... The rapidly increasing scale of data warehouses is challenging today's data analytical technologies. A con- ventional data analytical platform processes data warehouse queries using a star schema -- it normalizes the data into a fact table and a number of dimension tables, and during query processing it selectively joins the tables according to users' demands. This model is space economical. However, it faces two problems when applied to big data. First, join is an expensive operation, which prohibits a parallel database or a MapReduce-based system from achieving efficiency and scalability simultaneously. Second, join operations have to be executed repeatedly, while numerous join results can actually be reused by different queries. In this paper, we propose a new query processing frame- work for data warehouses. It pushes the join operations par- tially to the pre-processing phase and partially to the post- processing phase, so that data warehouse queries can be transformed into massive parallelized filter-aggregation oper- ations on the fact table. In contrast to the conventional query processing models, our approach is efficient, scalable and sta- ble despite of the large number of tables involved in the join. It is especially suitable for a large-scale parallel data ware- house. Our empirical evaluation on Hadoop shows that our framework exhibits linear scalability and outperforms some existing approaches by an order of magnitude. 展开更多
关键词 data warehouse large scale TAMP join-free multi-version schema
原文传递
HC-Store: putting MapReduce's foot in two camps
2
作者 huiju wang Furong LI +4 位作者 Xuan ZHOU Yu CAO Xiongpai QIN Jidong CHEN Shan wang 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第6期859-871,共13页
MapReduce is a popular framework for large- scale data analysis. As data access is critical for MapReduce's performance, some recent work has applied different storage models, such as column-store or PAX-store, to Ma... MapReduce is a popular framework for large- scale data analysis. As data access is critical for MapReduce's performance, some recent work has applied different storage models, such as column-store or PAX-store, to MapReduce platforms. However, the data access patterns of different queries are very different. No storage model is able to achieve the optimal performance alone. In this paper, we study how MapReduce can benefit from the presence of two different column-store models - pure column-store and PAX-store. We propose a hybrid storage system called hybrid columnstore (HC-store). Based on the characteristics of the incoming MapReduce tasks, our storage model can determine whether to access the underlying pure column-store or PAX-store. We studied the properties of the different storage models and create a cost model to decide the data access strategy at runtime. We have implemented HC-store on top of Hadoop. Our experimental results show that HC-store is able to outperform PAX-store and column-store, especially when confronted with diverse workload. 展开更多
关键词 MAPREDUCE Hadoop HC-store cost model column-store PAX-store
原文传递
Gigantic Genomes Provide Empirical Tests of Transposable Element Dynamics Models
3
作者 Jie wang Michael W.Itgen +6 位作者 huiju wang Yuzhou Gong Jianping Jiang Jiatang Li Cheng Sun Stanley K.Sessions Rachel Lockridge Mueller 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2021年第1期123-139,共17页
Transposable elements(TEs)are a major determinant of eukaryotic genome size.The collective properties of a genomic TE community reveal the history of TE/host evolutionary dynamics and impact present-day host structure... Transposable elements(TEs)are a major determinant of eukaryotic genome size.The collective properties of a genomic TE community reveal the history of TE/host evolutionary dynamics and impact present-day host structure and function,from genome to organism levels.In rare cases,TE community/genome size has greatly expanded in animals,associated with increased cell size and changes to anatomy and physiology.Here,we characterize the TE landscape of the genome and transcriptome in an amphibian with a giant genome—the caecilian Ichthyophis bannanicus,which we show has a genome size of 12.2 Gb.Amphibians are an important model system because the clade includes independent cases of genomic gigantism.The I.bannanicus genome differs compositionally from other giant amphibian genomes,but shares a low rate of ectopic recombination-mediated deletion.We examine TE activity using expression and divergence plots;TEs account for 15%of somatic transcription,and most superfamilies appear active.We quantify TE diversity in the caecilian,as well as other vertebrates with a range of genome sizes,using diversity indices commonly applied in community ecology.We synthesize previous models that integrate TE abundance,diversity,and activity,and test whether the caecilian meets model predictions for genomes with high TE abundance.We propose thorough,consistent characterization of TEs to strengthen future comparative analyses.Such analyses will ultimately be required to reveal whether the divergent TE assemblages found across convergent gigantic genomes reflect fundamental shared features of TE/host genome evolutionary dynamics. 展开更多
关键词 TE expression TE diversity index Genome size evolution CAECILIAN Transposon ecology
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部