BACKGROUND Missing occult cancer lesions accounts for the most diagnostic errors in retrospective radiology reviews as early cancer can be small or subtle,making the lesions difficult to detect.Secondobserver is the m...BACKGROUND Missing occult cancer lesions accounts for the most diagnostic errors in retrospective radiology reviews as early cancer can be small or subtle,making the lesions difficult to detect.Secondobserver is the most effective technique for reducing these events and can be economically implemented with the advent of artificial intelligence(AI).AIM To achieve appropriate AI model training,a large annotated dataset is necessary to train the AI models.Our goal in this research is to compare two methods for decreasing the annotation time to establish ground truth:Skip-slice annotation and AI-initiated annotation.METHODS We developed a 2D U-Net as an AI second observer for detecting colorectal cancer(CRC)and an ensemble of 5 differently initiated 2D U-Net for ensemble technique.Each model was trained with 51 cases of annotated CRC computed tomography of the abdomen and pelvis,tested with 7 cases,and validated with 20 cases from The Cancer Imaging Archive cases.The sensitivity,false positives per case,and estimated Dice coefficient were obtained for each method of training.We compared the two methods of annotations and the time reduction associated with the technique.The time differences were tested using Friedman’s two-way analysis of variance.RESULTS Sparse annotation significantly reduces the time for annotation particularly skipping 2 slices at a time(P<0.001).Reduction of up to 2/3 of the annotation does not reduce AI model sensitivity or false positives per case.Although initializing human annotation with AI reduces the annotation time,the reduction is minimal,even when using an ensemble AI to decrease false positives.CONCLUSION Our data support the sparse annotation technique as an efficient technique for reducing the time needed to establish the ground truth.展开更多
The poor performance of random writes has been a cause of major concern which needs to be addressed to better utilize the potential of flash in enterprise-scale environments. We examine one of the important causes of ...The poor performance of random writes has been a cause of major concern which needs to be addressed to better utilize the potential of flash in enterprise-scale environments. We examine one of the important causes of this poor performance: the design of the flash translation layer (FTL) which performs the virtual-to-physical address translations and hides the erase-before-write characteristics of flash. We propose a complete paradigm shift in the design of the core FTL engine from the existing techniques with our Demand-Based Flash Translation Layer (DFTL) which selectively caches page- level address mappings. Our experimental evaluation using FlashSim with realistic enterprise-scale workloads endorses the utility of DFTL in enterprise-scale storage systems by demonstrating: 1) improved performance, 2) reduced garbage collection overhead and 3) better overload behavior compared with hybrid FTL schemes which are the most popular implementation methods. For example, a predominantly random-write dominant I/O trace from an OLTP application running at a large financial institution shows a 78% improvement in average response time (due to a 3-fold reduction in operations of the garbage collector), compared with the hybrid FTL scheme. Even for the well-known read-dominant TPC-H benchmark, for which DFTL introduces additional overheads, we improve system response time by 56%. Moreover, interestingly, when write-back cache on DFTL-based SSD is enabled, DFTL even outperforms the page-based FTL scheme, improving their response time by 72% in Financial trace.展开更多
Consider the problem of pricing n items under an unlimited supply with m single minded buyers, each of which is interested in at most k of the items. The goal is to price each item with profit margin p1,p2,... ,pn so ...Consider the problem of pricing n items under an unlimited supply with m single minded buyers, each of which is interested in at most k of the items. The goal is to price each item with profit margin p1,p2,... ,pn so as to maximize the overall profit. There is an O(k)-approximation algorithm by Balcan and Blum when the price on each item must be above its margin cost; i.e., each pi 〉 0. We investigate the above problem when the seller is allowed to price some of the items below their margin cost. It was shown by Balcan et al. that by pricing some of the items below cost, the seller could possibly increase the maximum profit by /2(logn) times. These items sold at low prices to stimulate other profitable sales are usually called "loss leader". It is unclear what kind of approximation guarantees are achievable when some of the items can be priced below cost. Understanding this question is posed as an open problem by Balcan and Blum. In this paper, we give a strong negative result for the problem of pricing loss leaders . We prove that assuming the Unique Games Conjecture (UGC), there is no constant approximation algorithm for item pricing with prices below cost allowed even when each customer is interested in at most three items. Conceptually, our result indicates that although it is possible to make more money by selling some items below their margin cost, it can be computationally intractable to do so.展开更多
A linear (q,δ,ε, m(n))-locally decodable code (LDC) C : F^n → F^m(n) is a linear transformation from the vector 1 space F^m(n) to the space F^m(n) for which each message symbol xi can be recovered wit...A linear (q,δ,ε, m(n))-locally decodable code (LDC) C : F^n → F^m(n) is a linear transformation from the vector 1 space F^m(n) to the space F^m(n) for which each message symbol xi can be recovered with probability at least 1/|F| +ε from C(x) by a randomized algorithm that queries only q positions of C(x), even if up to δm(n) positions of C(x) are corrupted. In a recent work of Dvir, the author shows that lower bounds for linear LDCs can imply lower bounds for arithmetic circuits. He suggests that proving lower bounds for LDCs over the complex or real field is a good starting point for approaching one of his conjectures. Our main result is an re(n) =Ω(n^2) lower bound for linear 3-query LDCs over any, possibly infinite, field. The constant in the Ω(·) depends only on ε and δ. This is the first lower bound better than the trivial re(n) = Ω(n) for arbitrary fields and more than two queries.展开更多
文摘BACKGROUND Missing occult cancer lesions accounts for the most diagnostic errors in retrospective radiology reviews as early cancer can be small or subtle,making the lesions difficult to detect.Secondobserver is the most effective technique for reducing these events and can be economically implemented with the advent of artificial intelligence(AI).AIM To achieve appropriate AI model training,a large annotated dataset is necessary to train the AI models.Our goal in this research is to compare two methods for decreasing the annotation time to establish ground truth:Skip-slice annotation and AI-initiated annotation.METHODS We developed a 2D U-Net as an AI second observer for detecting colorectal cancer(CRC)and an ensemble of 5 differently initiated 2D U-Net for ensemble technique.Each model was trained with 51 cases of annotated CRC computed tomography of the abdomen and pelvis,tested with 7 cases,and validated with 20 cases from The Cancer Imaging Archive cases.The sensitivity,false positives per case,and estimated Dice coefficient were obtained for each method of training.We compared the two methods of annotations and the time reduction associated with the technique.The time differences were tested using Friedman’s two-way analysis of variance.RESULTS Sparse annotation significantly reduces the time for annotation particularly skipping 2 slices at a time(P<0.001).Reduction of up to 2/3 of the annotation does not reduce AI model sensitivity or false positives per case.Although initializing human annotation with AI reduces the annotation time,the reduction is minimal,even when using an ensemble AI to decrease false positives.CONCLUSION Our data support the sparse annotation technique as an efficient technique for reducing the time needed to establish the ground truth.
基金funded in part by the Natural Science Foundation of U.S.under Grant Nos.CCF-0811670,CNS-0720456a gift from Cisco System,Inc.and partially through the Ofce of Science of the U.S.Department of Energy under Contract No.DE-AC05-00OR22725
文摘The poor performance of random writes has been a cause of major concern which needs to be addressed to better utilize the potential of flash in enterprise-scale environments. We examine one of the important causes of this poor performance: the design of the flash translation layer (FTL) which performs the virtual-to-physical address translations and hides the erase-before-write characteristics of flash. We propose a complete paradigm shift in the design of the core FTL engine from the existing techniques with our Demand-Based Flash Translation Layer (DFTL) which selectively caches page- level address mappings. Our experimental evaluation using FlashSim with realistic enterprise-scale workloads endorses the utility of DFTL in enterprise-scale storage systems by demonstrating: 1) improved performance, 2) reduced garbage collection overhead and 3) better overload behavior compared with hybrid FTL schemes which are the most popular implementation methods. For example, a predominantly random-write dominant I/O trace from an OLTP application running at a large financial institution shows a 78% improvement in average response time (due to a 3-fold reduction in operations of the garbage collector), compared with the hybrid FTL scheme. Even for the well-known read-dominant TPC-H benchmark, for which DFTL introduces additional overheads, we improve system response time by 56%. Moreover, interestingly, when write-back cache on DFTL-based SSD is enabled, DFTL even outperforms the page-based FTL scheme, improving their response time by 72% in Financial trace.
基金supported by the National Natural Science Foundation of China under Grant Nos. 91024028, 91024031
文摘Consider the problem of pricing n items under an unlimited supply with m single minded buyers, each of which is interested in at most k of the items. The goal is to price each item with profit margin p1,p2,... ,pn so as to maximize the overall profit. There is an O(k)-approximation algorithm by Balcan and Blum when the price on each item must be above its margin cost; i.e., each pi 〉 0. We investigate the above problem when the seller is allowed to price some of the items below their margin cost. It was shown by Balcan et al. that by pricing some of the items below cost, the seller could possibly increase the maximum profit by /2(logn) times. These items sold at low prices to stimulate other profitable sales are usually called "loss leader". It is unclear what kind of approximation guarantees are achievable when some of the items can be priced below cost. Understanding this question is posed as an open problem by Balcan and Blum. In this paper, we give a strong negative result for the problem of pricing loss leaders . We prove that assuming the Unique Games Conjecture (UGC), there is no constant approximation algorithm for item pricing with prices below cost allowed even when each customer is interested in at most three items. Conceptually, our result indicates that although it is possible to make more money by selling some items below their margin cost, it can be computationally intractable to do so.
文摘A linear (q,δ,ε, m(n))-locally decodable code (LDC) C : F^n → F^m(n) is a linear transformation from the vector 1 space F^m(n) to the space F^m(n) for which each message symbol xi can be recovered with probability at least 1/|F| +ε from C(x) by a randomized algorithm that queries only q positions of C(x), even if up to δm(n) positions of C(x) are corrupted. In a recent work of Dvir, the author shows that lower bounds for linear LDCs can imply lower bounds for arithmetic circuits. He suggests that proving lower bounds for LDCs over the complex or real field is a good starting point for approaching one of his conjectures. Our main result is an re(n) =Ω(n^2) lower bound for linear 3-query LDCs over any, possibly infinite, field. The constant in the Ω(·) depends only on ε and δ. This is the first lower bound better than the trivial re(n) = Ω(n) for arbitrary fields and more than two queries.