现有的重复数据删除技术大部分是基于变长分块(content defined chunking,CDC)算法的,不考虑不同文件类型的内容特征.这种方法以一种随机的方式确定分块边界并应用于所有文件类型,已经证明其非常适合于文本和简单内容,而不适合非结构化...现有的重复数据删除技术大部分是基于变长分块(content defined chunking,CDC)算法的,不考虑不同文件类型的内容特征.这种方法以一种随机的方式确定分块边界并应用于所有文件类型,已经证明其非常适合于文本和简单内容,而不适合非结构化数据构成的复合文件.分析了OpenXML标准的复合文件属性,给出了对象提取的基本方法,并提出基于对象分布和对象结构的去重粒度确定算法.目的是对于非结构化数据构成的复合文件,有效地检测不同文件中和同一文件不同位置的相同对象,在文件物理布局改变时也能够有效去重.通过对典型的非结构化数据集合的模拟实验表明,在综合情况下,对象重复数据删除比CDC方法提高了10%左右的非结构化数据的去重率.展开更多
Assays that measure steroid hormones in patient care, public health, and research need to be both accurate and precise, as these criteria help to ensure comparability across all clinical and research applications. Thi...Assays that measure steroid hormones in patient care, public health, and research need to be both accurate and precise, as these criteria help to ensure comparability across all clinical and research applications. This review addresses major issues relevant to assay variability and describes recent activities by the US Centers for Disease Control and Prevention (CDC) to improve assay performance. Currently, high degrees of accuracy and precision are not always met for testosterone and estradiol measurements; although technologies for steroid hormone measurement have advanced significantly, measurement variability within and across laboratories has not improved accordingly. Differences in calibration and specificity are discussed as sources of variability in measurement accuracy. Ultimately, a combination of factors appears to cause inaccuracy of steroid hormone measurements, with nonuniform assay calibration and lack of specificity being two major contributors to assay variability. Within-assay variability for current assays is generally high, especially at low analyte concentrations. The CDC Hormone Standardization (HoSt) Program is improving clinical assays, as evidenced by a 50% decline in mean absolute bias between mass spectrometry assays and the CDC reference method from 2007 to 2011. This program provides the measurement traceability to CDC reference methods and helps to minimize factors affecting measurement variability.展开更多
文摘现有的重复数据删除技术大部分是基于变长分块(content defined chunking,CDC)算法的,不考虑不同文件类型的内容特征.这种方法以一种随机的方式确定分块边界并应用于所有文件类型,已经证明其非常适合于文本和简单内容,而不适合非结构化数据构成的复合文件.分析了OpenXML标准的复合文件属性,给出了对象提取的基本方法,并提出基于对象分布和对象结构的去重粒度确定算法.目的是对于非结构化数据构成的复合文件,有效地检测不同文件中和同一文件不同位置的相同对象,在文件物理布局改变时也能够有效去重.通过对典型的非结构化数据集合的模拟实验表明,在综合情况下,对象重复数据删除比CDC方法提高了10%左右的非结构化数据的去重率.
文摘Assays that measure steroid hormones in patient care, public health, and research need to be both accurate and precise, as these criteria help to ensure comparability across all clinical and research applications. This review addresses major issues relevant to assay variability and describes recent activities by the US Centers for Disease Control and Prevention (CDC) to improve assay performance. Currently, high degrees of accuracy and precision are not always met for testosterone and estradiol measurements; although technologies for steroid hormone measurement have advanced significantly, measurement variability within and across laboratories has not improved accordingly. Differences in calibration and specificity are discussed as sources of variability in measurement accuracy. Ultimately, a combination of factors appears to cause inaccuracy of steroid hormone measurements, with nonuniform assay calibration and lack of specificity being two major contributors to assay variability. Within-assay variability for current assays is generally high, especially at low analyte concentrations. The CDC Hormone Standardization (HoSt) Program is improving clinical assays, as evidenced by a 50% decline in mean absolute bias between mass spectrometry assays and the CDC reference method from 2007 to 2011. This program provides the measurement traceability to CDC reference methods and helps to minimize factors affecting measurement variability.