Slow speed of the Next-Generation sequencing data analysis, compared to the latest high throughput sequencers such as HiSeq X system, using the current industry standard genome analysis pipeline, has been the major fa...Slow speed of the Next-Generation sequencing data analysis, compared to the latest high throughput sequencers such as HiSeq X system, using the current industry standard genome analysis pipeline, has been the major factor of data backlog which limits the real-time use of genomic data for precision medicine. This study demonstrates the DRAGEN Bio-IT Processor as a potential candidate to remove the “Big Data Bottleneck”. DRAGENTM accomplished the variant calling, for ~40× coverage WGS data in as low as ~30 minutes using a single command, achieving the over 50-fold data analysis speed while maintaining the similar or better variant calling accuracy than the standard GATK Best Practices workflow. This systematic comparison provides the faster and efficient NGS data analysis alternative to NGS-based healthcare industries and research institutes to meet the requirement for precision medicine based healthcare.展开更多
In order to develop a practical postprocessor for 5-axis machine tool,the general equations of numerically controlled(NC) data for 5-axis configurations with non-orthogonal rotary axes were exactly expressed by the in...In order to develop a practical postprocessor for 5-axis machine tool,the general equations of numerically controlled(NC) data for 5-axis configurations with non-orthogonal rotary axes were exactly expressed by the inverse kinematics,and a windows-based postprocessor written with Visual Basic was developed according to the proposed algorithm.The developed postprocessor is a general system suitable for all kinds of 5-axis machines with orthogonal and non-orthogonal rotary axes.Through implementation of the developed postprocessor and verification by a cutting simulation and machining experiment,the effectiveness of the proposed algorithm is confirmed.Compatibility is improved by allowing exchange of data formats such as rotational total center position(RTCP) controlled NC data,vector post NC data,and program object file(POF) cutter location(CL) data,and convenience is increased by adding the function of work-piece origin offset.Consequently,a practical post-processor for 5-axis machining is developed.展开更多
With high computational capacity, e.g. many-core and wide floating point SIMD units, Intel Xeon Phi shows promising prospect to accelerate high-performance computing(HPC) applications. But the application of Intel Xeo...With high computational capacity, e.g. many-core and wide floating point SIMD units, Intel Xeon Phi shows promising prospect to accelerate high-performance computing(HPC) applications. But the application of Intel Xeon Phi on data analytics workloads in data center is still an open question. Phibench 2.0 is built for the latest generation of Intel Xeon Phi(KNL, Knights Landing), based on the prior work PhiBench(also named BigDataBench-Phi), which is designed for the former generation of Intel Xeon Phi(KNC, Knights Corner). Workloads of PhiBench 2.0 are delicately chosen based on BigdataBench 4.0 and PhiBench 1.0. Other than that, these workloads are well optimized on KNL, and run on real-world datasets to evaluate their performance and scalability. Further, the microarchitecture-level characteristics including CPI, cache behavior, vectorization intensity, and branch prediction efficiency are analyzed and the impact of affinity and scheduling policy on performance are investigated. It is believed that the observations would help other researchers working on Intel Xeon Phi and data analytics workloads.展开更多
This paper investigates the Web data aggregation issues in multidimensional on-line analytical processing (MOLAP) and presents a rule-driven aggregation approach. The core of the approach is defining aggregate rules...This paper investigates the Web data aggregation issues in multidimensional on-line analytical processing (MOLAP) and presents a rule-driven aggregation approach. The core of the approach is defining aggregate rules. To define the rules for reading warehouse data and computing aggregates, a rule definition language - array aggregation language (AAL) is developed. This language treats an array as a function from indexes to values and provides syntax and semantics based on monads. External functions can be called in aggregation rules to specify array reading, writing, and aggregating. Based on the features of AAL, array operations are unified as function operations, which can be easily expressed and automatically evaluated. To implement the aggregation approach, a processor for computing aggregates over the base cube and for materializing them in the data warehouse is built, and the component structure and working principle of the aggregation processor are introduced.展开更多
Architectures based on the data flow computing model provide an alternative to the conventional Von-Neumann architecture that are widelyused for general purpose computing.Processors based on the data flow architecture...Architectures based on the data flow computing model provide an alternative to the conventional Von-Neumann architecture that are widelyused for general purpose computing.Processors based on the data flow architecture employ fine-grain data-driven parallelism.These architectures have thepotential to exploit the inherent parallelism in compute intensive applicationslike signal processing,image and video processing and so on and can thusachieve faster throughputs and higher power efficiency.In this paper,severaldata flow computing architectures are explored,and their main architecturalfeatures are studied.Furthermore,a classification of the processors is presented based on whether they employ either the data flow execution modelexclusively or in combination with the control flow model and are accordinglygrouped as exclusive data flow or hybrid architectures.The hybrid categoryis further subdivided as conjoint or accelerator-style architectures dependingon how they deploy and separate the data flow and control flow executionmodel within their execution blocks.Lastly,a brief comparison and discussionof their advantages and drawbacks is also considered.From this study weconclude that although the data flow architectures are seen to have maturedsignificantly,issues like data-structure handling and lack of efficient placementand scheduling algorithms have prevented these from becoming commerciallyviable.展开更多
This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machin...This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machines at plant-level collect real-time raw data from sensors and measurement instrumentations and transfer them to a central processor over the Ethernets, and the central processor handles tasks of real-time data processing and monitoring. This system utilizes the computation power of Intel T2300 dual-core processor and parallel computations supported by multi-threading techniques. Our experiments show that these techniques can significantly improve the system performance and are viable solutions to real-time high-speed data processing.展开更多
文摘Slow speed of the Next-Generation sequencing data analysis, compared to the latest high throughput sequencers such as HiSeq X system, using the current industry standard genome analysis pipeline, has been the major factor of data backlog which limits the real-time use of genomic data for precision medicine. This study demonstrates the DRAGEN Bio-IT Processor as a potential candidate to remove the “Big Data Bottleneck”. DRAGENTM accomplished the variant calling, for ~40× coverage WGS data in as low as ~30 minutes using a single command, achieving the over 50-fold data analysis speed while maintaining the similar or better variant calling accuracy than the standard GATK Best Practices workflow. This systematic comparison provides the faster and efficient NGS data analysis alternative to NGS-based healthcare industries and research institutes to meet the requirement for precision medicine based healthcare.
基金Work supported by the Second Stage of Brain Korea 21 Projects
文摘In order to develop a practical postprocessor for 5-axis machine tool,the general equations of numerically controlled(NC) data for 5-axis configurations with non-orthogonal rotary axes were exactly expressed by the inverse kinematics,and a windows-based postprocessor written with Visual Basic was developed according to the proposed algorithm.The developed postprocessor is a general system suitable for all kinds of 5-axis machines with orthogonal and non-orthogonal rotary axes.Through implementation of the developed postprocessor and verification by a cutting simulation and machining experiment,the effectiveness of the proposed algorithm is confirmed.Compatibility is improved by allowing exchange of data formats such as rotational total center position(RTCP) controlled NC data,vector post NC data,and program object file(POF) cutter location(CL) data,and convenience is increased by adding the function of work-piece origin offset.Consequently,a practical post-processor for 5-axis machining is developed.
基金Supported by the National High Technology Research and Development Program of China(No.2015AA015308)the National Key Research and Development Plan of China(No.2016YFB1000600,2016YFB1000601)the Major Program of National Natural Science Foundation of China(No.61432006)
文摘With high computational capacity, e.g. many-core and wide floating point SIMD units, Intel Xeon Phi shows promising prospect to accelerate high-performance computing(HPC) applications. But the application of Intel Xeon Phi on data analytics workloads in data center is still an open question. Phibench 2.0 is built for the latest generation of Intel Xeon Phi(KNL, Knights Landing), based on the prior work PhiBench(also named BigDataBench-Phi), which is designed for the former generation of Intel Xeon Phi(KNC, Knights Corner). Workloads of PhiBench 2.0 are delicately chosen based on BigdataBench 4.0 and PhiBench 1.0. Other than that, these workloads are well optimized on KNL, and run on real-world datasets to evaluate their performance and scalability. Further, the microarchitecture-level characteristics including CPI, cache behavior, vectorization intensity, and branch prediction efficiency are analyzed and the impact of affinity and scheduling policy on performance are investigated. It is believed that the observations would help other researchers working on Intel Xeon Phi and data analytics workloads.
基金The National Natural Science Foundationof China (No60573165)
文摘This paper investigates the Web data aggregation issues in multidimensional on-line analytical processing (MOLAP) and presents a rule-driven aggregation approach. The core of the approach is defining aggregate rules. To define the rules for reading warehouse data and computing aggregates, a rule definition language - array aggregation language (AAL) is developed. This language treats an array as a function from indexes to values and provides syntax and semantics based on monads. External functions can be called in aggregation rules to specify array reading, writing, and aggregating. Based on the features of AAL, array operations are unified as function operations, which can be easily expressed and automatically evaluated. To implement the aggregation approach, a processor for computing aggregates over the base cube and for materializing them in the data warehouse is built, and the component structure and working principle of the aggregation processor are introduced.
文摘Architectures based on the data flow computing model provide an alternative to the conventional Von-Neumann architecture that are widelyused for general purpose computing.Processors based on the data flow architecture employ fine-grain data-driven parallelism.These architectures have thepotential to exploit the inherent parallelism in compute intensive applicationslike signal processing,image and video processing and so on and can thusachieve faster throughputs and higher power efficiency.In this paper,severaldata flow computing architectures are explored,and their main architecturalfeatures are studied.Furthermore,a classification of the processors is presented based on whether they employ either the data flow execution modelexclusively or in combination with the control flow model and are accordinglygrouped as exclusive data flow or hybrid architectures.The hybrid categoryis further subdivided as conjoint or accelerator-style architectures dependingon how they deploy and separate the data flow and control flow executionmodel within their execution blocks.Lastly,a brief comparison and discussionof their advantages and drawbacks is also considered.From this study weconclude that although the data flow architectures are seen to have maturedsignificantly,issues like data-structure handling and lack of efficient placementand scheduling algorithms have prevented these from becoming commerciallyviable.
文摘This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machines at plant-level collect real-time raw data from sensors and measurement instrumentations and transfer them to a central processor over the Ethernets, and the central processor handles tasks of real-time data processing and monitoring. This system utilizes the computation power of Intel T2300 dual-core processor and parallel computations supported by multi-threading techniques. Our experiments show that these techniques can significantly improve the system performance and are viable solutions to real-time high-speed data processing.