Slow speed of the Next-Generation sequencing data analysis, compared to the latest high throughput sequencers such as HiSeq X system, using the current industry standard genome analysis pipeline, has been the major fa...Slow speed of the Next-Generation sequencing data analysis, compared to the latest high throughput sequencers such as HiSeq X system, using the current industry standard genome analysis pipeline, has been the major factor of data backlog which limits the real-time use of genomic data for precision medicine. This study demonstrates the DRAGEN Bio-IT Processor as a potential candidate to remove the “Big Data Bottleneck”. DRAGENTM accomplished the variant calling, for ~40× coverage WGS data in as low as ~30 minutes using a single command, achieving the over 50-fold data analysis speed while maintaining the similar or better variant calling accuracy than the standard GATK Best Practices workflow. This systematic comparison provides the faster and efficient NGS data analysis alternative to NGS-based healthcare industries and research institutes to meet the requirement for precision medicine based healthcare.展开更多
This paper investigates the Web data aggregation issues in multidimensional on-line analytical processing (MOLAP) and presents a rule-driven aggregation approach. The core of the approach is defining aggregate rules...This paper investigates the Web data aggregation issues in multidimensional on-line analytical processing (MOLAP) and presents a rule-driven aggregation approach. The core of the approach is defining aggregate rules. To define the rules for reading warehouse data and computing aggregates, a rule definition language - array aggregation language (AAL) is developed. This language treats an array as a function from indexes to values and provides syntax and semantics based on monads. External functions can be called in aggregation rules to specify array reading, writing, and aggregating. Based on the features of AAL, array operations are unified as function operations, which can be easily expressed and automatically evaluated. To implement the aggregation approach, a processor for computing aggregates over the base cube and for materializing them in the data warehouse is built, and the component structure and working principle of the aggregation processor are introduced.展开更多
This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machin...This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machines at plant-level collect real-time raw data from sensors and measurement instrumentations and transfer them to a central processor over the Ethernets, and the central processor handles tasks of real-time data processing and monitoring. This system utilizes the computation power of Intel T2300 dual-core processor and parallel computations supported by multi-threading techniques. Our experiments show that these techniques can significantly improve the system performance and are viable solutions to real-time high-speed data processing.展开更多
文摘Slow speed of the Next-Generation sequencing data analysis, compared to the latest high throughput sequencers such as HiSeq X system, using the current industry standard genome analysis pipeline, has been the major factor of data backlog which limits the real-time use of genomic data for precision medicine. This study demonstrates the DRAGEN Bio-IT Processor as a potential candidate to remove the “Big Data Bottleneck”. DRAGENTM accomplished the variant calling, for ~40× coverage WGS data in as low as ~30 minutes using a single command, achieving the over 50-fold data analysis speed while maintaining the similar or better variant calling accuracy than the standard GATK Best Practices workflow. This systematic comparison provides the faster and efficient NGS data analysis alternative to NGS-based healthcare industries and research institutes to meet the requirement for precision medicine based healthcare.
基金The National Natural Science Foundationof China (No60573165)
文摘This paper investigates the Web data aggregation issues in multidimensional on-line analytical processing (MOLAP) and presents a rule-driven aggregation approach. The core of the approach is defining aggregate rules. To define the rules for reading warehouse data and computing aggregates, a rule definition language - array aggregation language (AAL) is developed. This language treats an array as a function from indexes to values and provides syntax and semantics based on monads. External functions can be called in aggregation rules to specify array reading, writing, and aggregating. Based on the features of AAL, array operations are unified as function operations, which can be easily expressed and automatically evaluated. To implement the aggregation approach, a processor for computing aggregates over the base cube and for materializing them in the data warehouse is built, and the component structure and working principle of the aggregation processor are introduced.
文摘This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machines at plant-level collect real-time raw data from sensors and measurement instrumentations and transfer them to a central processor over the Ethernets, and the central processor handles tasks of real-time data processing and monitoring. This system utilizes the computation power of Intel T2300 dual-core processor and parallel computations supported by multi-threading techniques. Our experiments show that these techniques can significantly improve the system performance and are viable solutions to real-time high-speed data processing.