期刊文献+
共找到19篇文章
< 1 >
每页显示 20 50 100
含铝废LiFePO_(4)正极粉在H2SO_(4)溶液中的浸出性能 被引量:4
1
作者 娄文博 张洋 +8 位作者 张盈 郑诗礼 孙沛 王晓健 李建中 乔珊 张懿 Marco WENZEL Jan JWEIGAND 《Transactions of Nonferrous Metals Society of China》 SCIE EI CAS CSCD 2021年第3期817-831,共15页
研究含铝废LiFePO_(4)(LFP)正极粉中LFP和Al的浸出行为及浸出动力学。考察温度(273~368 K)、搅拌速率(200~950 r/min)、反应时间(0~240 min)、酸料比(0.1:1~1:1 mL/g)和液固比(3:1~9:1 mL/g)对浸出过程的影响。结果表明,反应物浓度和温... 研究含铝废LiFePO_(4)(LFP)正极粉中LFP和Al的浸出行为及浸出动力学。考察温度(273~368 K)、搅拌速率(200~950 r/min)、反应时间(0~240 min)、酸料比(0.1:1~1:1 mL/g)和液固比(3:1~9:1 mL/g)对浸出过程的影响。结果表明,反应物浓度和温度对Al浸出影响较大。在优化的浸出条件下,LFP和Al的浸出率分别为91.53%和15.98%。动力学研究表明,LFP的浸出受表面化学反应与扩散混合控制,活化能为22.990 kJ/mol;而Al的浸出仅受表面化学反应的控制,活化能为46.581 kJ/mol。在废LFP正极材料酸浸过程中控制浸出体系低温能有效抑制铝的溶解。 展开更多
关键词 LiFePO_(4) 浸出性能 浸出动力学
下载PDF
A new heuristic for task scheduling in heterogeneous computing environment
2
作者 Ehsan Ullah MUNIR jian-zhong li +2 位作者 Sheng-fei SHI Zhao-nian ZOU Qaisar RASOOL 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2008年第12期1715-1723,共9页
Heterogeneous computing (HC) environment utilizes diverse resources with different computational capabilities to solve computing-intensive applications having diverse computational requirements and constraints. The ta... Heterogeneous computing (HC) environment utilizes diverse resources with different computational capabilities to solve computing-intensive applications having diverse computational requirements and constraints. The task assignment problem in HC environment can be formally defined as for a given set of tasks and machines, assigning tasks to machines to achieve the minimum makespan. In this paper we propose a new task scheduling heuristic, high standard deviation first (HSTDF), which considers the standard deviation of the expected execution time of a task as a selection criterion. Standard deviation of the ex- pected execution time of a task represents the amount of variation in task execution time on different machines. Our conclusion is that tasks having high standard deviation must be assigned first for scheduling. A large number of experiments were carried out to check the effectiveness of the proposed heuristic in different scenarios, and the comparison with the existing heuristics (Max-min, Sufferage, Segmented Min-average, Segmented Min-min, and Segmented Max-min) clearly reveals that the proposed heuristic outperforms all existing heuristics in terms of average makespan. 展开更多
关键词 异类计算 任务调度 贪心启示法 高标准偏差
下载PDF
Research and Application of Code Automatic Generation Algorithm Based on Structured Flowchart
3
作者 Xiang-Hu Wu Ming-Cheng Qu +1 位作者 Zhi-Qiang liu jian-zhong li 《Journal of Software Engineering and Applications》 2011年第9期534-545,共12页
It is of great significance to automatically generate code from structured flowchart. There are some deficiencies in existing researches, and their key algorithms and technologies are not elaborated, also there are ve... It is of great significance to automatically generate code from structured flowchart. There are some deficiencies in existing researches, and their key algorithms and technologies are not elaborated, also there are very few full-featured integrated development platforms that can generate code automatically based on structured flowchart. By analyzing the characteristics of structured flowchart, a structure identification algorithm for structured flowchart is put forward. The correctness of algorithm is verified by enumeration iteration. Then taking the identified flowchart as input, an automatic code generation algorithm is proposed. Also the correctness is verified by enumeration iteration. Finally an integrated development platform is developed using those algorithms, including flowchart modeling, code automatic generation, CDT\GCC\GDB etc. The correctness and effectiveness of algorithms proposed are verified through practical operations. 展开更多
关键词 AUTOMATIC GENERATION of CODES STRUCTURED Flowchart Identification of Structure INTEGRATED Development PLATFORM
下载PDF
腹膜透析患者体质量指数与腹膜透析充分性的关系 被引量:3
4
作者 陈铭聿 周丽君 +3 位作者 李建中 肖卓韬 李琳 徐德宇 《中国现代医学杂志》 CAS 北大核心 2022年第12期90-95,共6页
目的探讨腹膜透析患者体质量指数(BMI)与腹膜透析充分性的关系。方法回顾性分析2016年1月—2020年5月在苏州大学附属第一医院肾内科行腹膜透析并规律随访的282例患者。按BMI分为低体质量组(BMI<18.5 kg/m^(2))、正常体质量组(18.5~23... 目的探讨腹膜透析患者体质量指数(BMI)与腹膜透析充分性的关系。方法回顾性分析2016年1月—2020年5月在苏州大学附属第一医院肾内科行腹膜透析并规律随访的282例患者。按BMI分为低体质量组(BMI<18.5 kg/m^(2))、正常体质量组(18.5~23.9 kg/m^(2))、超体质量组(24~27.9 kg/m^(2))和肥胖组(BMI≥28 kg/m^(2)),连续观察12个月,收集临床资料并进行统计学分析。结果4组患者的BMI、糖尿病、增加透析剂量、增加腹膜透析液浓度、腹膜炎发生率比较,差异有统计学意义(P<0.05)。低体质量组、正常体质量组、超体质量组、肥胖组患者腹膜透析开始后1个月、6个月及12个月Kt/V、D/Pcr比较,采用重复测量设计的方差分析,结果:①不同时间点Kt/V、D/Pcr有差异(F=2.115和2.384,P=0.144和0.589);②4组患者的Kt/V、D/Pcr有差异(F=4.151和0.286,P=0.033和0.834);③4组患者的Kt/V、D/Pcr变化趋势无差异(F=0.545和1.346,P=0.476和0.644)。多元线性回归分析结果显示,性别、残余肾功能、血白蛋白是腹膜透析1个月后每周总Kt/V的影响因素(b=0.314、0.061和0.016,P=0.001、0.001和0.020);性别及BMI是腹膜透析12个月后每周总Kt/V的影响因素(b=0.386和-0.029,均P=0.001);性别、BMI、糖尿病是腹膜透析12个月后D/Pcr的影响因素(b=-0.047、0.081和-0.005,P=0.006、0.003和0.047)。多因素Logistics回归分析结果显示:高BMI[OR=1.110(95%CI:1.006,1.225)]、高龄[OR=1.049(95%CI:1.022,1.077)]及低白蛋白[OR=0.911(95%CI:0.852,0.975)]是腹膜透析12个月后发生腹膜炎的危险因素(P<0.05)。结论高BMI腹膜透析患者可能具有较低的腹膜透析充分性,更容易发生腹膜透析相关性腹膜炎,BMI可作为远期腹膜功能评估指标。 展开更多
关键词 腹膜透析 腹膜透析充分性 体质量指数
下载PDF
Paroxysmal drastic abdominal pain with tardive cutaneous lesions presenting in Henoch-Schnlein purpura 被引量:8
5
作者 Xiao-liang Chen Hong Tian +4 位作者 jian-zhong li Jin Tao Hua Tang Yang li Bin Wu 《World Journal of Gastroenterology》 SCIE CAS CSCD 2012年第16期1991-1995,共5页
Henoch-Schnlein purpura(HSP) is a small-vessel vasculitis mediated by IgA-immune complex deposition.It is characterized by the clinical tetrad of non-thrombocytopenic palpable purpura,abdominal pain,arthritis and re... Henoch-Schnlein purpura(HSP) is a small-vessel vasculitis mediated by IgA-immune complex deposition.It is characterized by the clinical tetrad of non-thrombocytopenic palpable purpura,abdominal pain,arthritis and renal involvement.The diagnosis of HSP is difficult,especially when abdominal symptoms precede cutaneous lesions.We report a rare case of paroxysmal drastic abdominal pain with gastrointestinal bleeding presented in HSP.The diagnosis was verified by renal damage and the occurrence of purpura. 展开更多
关键词 皮肤过敏 阵发性 SCH 紫癜 腹痛 迟发性 血小板减少 病灶
下载PDF
Gastric myeloid sarcoma without acute myeloblastic leukemia 被引量:4
6
作者 Xiao-li Huang Jin Tao +4 位作者 jian-zhong li Xiao-liang Chen Jian-Ning Chen Chun-Kui Shao Bin Wu 《World Journal of Gastroenterology》 SCIE CAS 2015年第7期2242-2248,共7页
Myeloid sarcomas(MS)involve extramedullary blast proliferation from one or more myeloid lineages thatreplace the original tissue architecture,and these neoplasias are called granulocytic sarcomas,chloromas or extramed... Myeloid sarcomas(MS)involve extramedullary blast proliferation from one or more myeloid lineages thatreplace the original tissue architecture,and these neoplasias are called granulocytic sarcomas,chloromas or extramedullary myeloid tumors.Such tumors develop in lymphoid organs,bones(e.g.,skulls and orbits),skin,soft tissue,various mucosae,organs,and the central nervous system.Gastrointestinal(GI)involvement is rare,while the occurrence of myeloid sarcomas in patients without leukemia is even rare.Here,we report a case of a 38-year-old man who presented with epigastric pain and progressive jaundice.An upper GI endoscopy had shown extensive multifocal hyperemic fold thickening and the spread of nodular lesions in the body of the stomach.Biopsies from the gastric lesions indicated myeloid sarcoma of the stomach.However,concurrent peripheral blood and bone marrow examinations showed no evidence of acute myeloid leukemia.For diagnosis,the immunohistochemical markers must be checked when evaluating a suspected myeloid sarcoma case.Accurate MS diagnosis determines the appropriate therapy and prognosis. 展开更多
关键词 MYELOID SARCOMA STOMACH ACUTE myeloblastic leukemi
下载PDF
Primary duodenal NK/T-cell lymphoma with massive bleeding: A case report 被引量:1
7
作者 jian-zhong li Jin Tao +7 位作者 Dan-Yun Ruan Yi-Dong Yang Ya-Shi Zhan Xing Wang Yu Chen Si-Chi Kuang Chun-Kui Shao Bin Wu 《World Journal of Clinical Oncology》 CAS 2012年第6期92-97,共6页
Primary natural killer/T-cell(NK/T-cell) lymphoma of the gastrointestinal tract is a very rare disease with a poor prognosis, and the duodenum is quite extraordinary as a primary lesion site. Here, we describe a uniqu... Primary natural killer/T-cell(NK/T-cell) lymphoma of the gastrointestinal tract is a very rare disease with a poor prognosis, and the duodenum is quite extraordinary as a primary lesion site. Here, we describe a unique case of a primary duodenal NK/T-cell lymphoma in a 26-year-old man who presented with abdominal painand weight loss. Abdominal computed tomography scan demonstrated a hypodense tumor in the duodenum. Because of massive upper gastrointestinal tract bleeding during hospitalization, the patient was examined by emergency upper gastrointestinal endoscopy. Under endoscopy, an irregular ulcer with mucosal edema, destruction, necrosis, a hyperplastic nodule and active bleeding was observed on the duodenal posterior wall. Following endoscopic hemostasis, a biopsy was obtained for pathological evaluation. The lesion was subsequently confirmed to be a duodenal NK/T-cell lymphoma. The presenting symptoms of primary duodenal NK-/T-cell lymphoma in this patient were abdominal pain and gastrointestinal bleeding, and endoscopy was important for diagnosis. Despite aggressive treatments, the prognosis was very poor. 展开更多
关键词 BLEEDING DUODENUM Natural killer/T-cell LYMPHOMA
下载PDF
Angioimmunoblastic T-cell lymphoma-associated pure red cell aplasia with abdominal pain
8
作者 Jin Tao Feng-Ping Zheng +6 位作者 Hong Tian Ying lin jian-zhong li Xiao-liang Chen Jian-Ning Chen Chun-Kui Shao Bin Wu 《World Journal of Clinical Oncology》 CAS 2013年第3期75-81,共7页
Angioimmunoblastic T-cell lymphoma(AITL)is a unique type of peripheral T-cell lymphoma with a constellation of clinical symptoms and signs,including weight loss,fever,chills,anemia,skin rash,hepatosplenomegaly,lymphad... Angioimmunoblastic T-cell lymphoma(AITL)is a unique type of peripheral T-cell lymphoma with a constellation of clinical symptoms and signs,including weight loss,fever,chills,anemia,skin rash,hepatosplenomegaly,lymphadenopathy,thrombocytopenia and polyclonal hypergammaglobulinemia.The histological features of AITL are also distinctive.Pure red cell aplasia is a bone marrow failure characterized by progressive normocytic anemia and reticulocytopenia without leucopenia or thrombocytopenia.However,AITL with abdominal pain and pure red cell aplasia has rarely been reported.Here,we report a rare case of AITL-associated pure red cell aplasia with abdominal pain.The diagnosis was verified by a biopsy of the enlarged abdominal lymph nodes with immunohistochemical staining. 展开更多
关键词 ANGIOIMMUNOBLASTIC T-CELL LYMPHOMA ANEMIA Pure red cell APLASIA ABDOMINAL pain
下载PDF
The relationship between neutrophil-to-lymphocyte ratio and major cardiovascular events in elderly patients with chronic heart failure
9
作者 WeiYAN jian-zhong li Kun-Lun HE 《Journal of Geriatric Cardiology》 SCIE CAS CSCD 2017年第12期780-780,共1页
关键词 心血管 事件 比率
下载PDF
CrowdOLA: Online Aggregation on Duplicate Data Powered by Crowdsourcing 被引量:3
10
作者 An-Zhen Zhang jian-zhong li +3 位作者 Hong Gao Yu-Biao Chen Heng-Zhao Ma Mohamed Jaward Bah 《Journal of Computer Science & Technology》 SCIE EI CSCD 2018年第2期366-379,共14页
Recently there is an increasing need for interactive human-driven analysis on large volumes of data. Online aggregation (OLA), which provides a quick sketch of massive data before a long wait of the final accurate q... Recently there is an increasing need for interactive human-driven analysis on large volumes of data. Online aggregation (OLA), which provides a quick sketch of massive data before a long wait of the final accurate query result, has drawn significant research attention. However, the direct processing of OLA on duplicate data will lead to incorrect query answers, since sampling from duplicate records leads to an over representation of the duplicate data in the sample. This violates the prerequisite of uniform distributions in most statistical theories. In this paper, we propose CrowdOLA, a novel framework for integrating online aggregation processing with deduplication. Instead of cleaning the whole dataset, Crow~ dOLA retrieves block-level samples continuously from the dataset, and employs a crowd-based entity resolution approach to detect duplicates in the sample in a pay-as-you-go fashion. After cleaning the sample, an unbiased estimator is provided to address the error bias that is introduced by the duplication. We evaluate CrowdOLA on both real-world and synthetic workloads. Experimental results show that CrowdOLA provides a good balance between efficiency and accuracy. 展开更多
关键词 online aggregation entity resolution crowdsourcing cloud computing
原文传递
准噶尔盆地南缘西段下部成藏组合油气藏形成过程--以独山子背斜独山1井为例 被引量:3
11
作者 刘刚 李建忠 +3 位作者 齐雪峰 朱明 袁波 庞志超 《天然气地球科学》 CAS CSCD 北大核心 2021年第7期1009-1021,共13页
准噶尔盆地南缘西段下部成藏组合油气资源丰富,勘探程度低,厘清成藏过程对该区油气勘探具有重要的指导意义。利用流体包裹体岩相观察、均一温度测试、储集层定量荧光技术、全扫描荧光分析等实验方法,结合原油和天然气的物理及地球化学特... 准噶尔盆地南缘西段下部成藏组合油气资源丰富,勘探程度低,厘清成藏过程对该区油气勘探具有重要的指导意义。利用流体包裹体岩相观察、均一温度测试、储集层定量荧光技术、全扫描荧光分析等实验方法,结合原油和天然气的物理及地球化学特征,以及单井埋藏史模拟、生排烃史恢复、构造演化史分析等成藏要素配置关系分析,系统研究了盆地南缘独山子背斜下部成藏组合的形成过程。研究表明:①独山子背斜头屯河组储集层中存在2期烃类包裹体,第一期为黄色荧光的液烃包裹体,烃类以低成熟—成熟原油为主,该期包裹体丰度低,油气充注强度较低,未形成规模油气藏;第二期为发蓝色荧光的成熟—高成熟轻质油包裹体,包裹体丰度较高。②独山1井头屯河组部分层段(6416 m、6493 m)储层颗粒表面吸附烃浓度较高,表明头屯河组存在油层,吸附烃以低密度轻质油为主;储集层中也可见沥青及沥青质较高的稠油,指示油气藏形成后遭受过一定程度破坏调整。③独山子背斜头屯河组油气充注最早始于古近纪早期,但油气充注量有限,并未形成规模油气藏,直至中新世时期(5~3 Ma),烃源岩生排烃强度显著增强,油气快速充注并形成油气藏,之后受到喜马拉雅构造运动的影响,油气藏遭受不同程度破坏改造,油气沿断裂向浅层溢散,早期的古油藏呈现为不同烃类饱和度的残留油层,部分层段饱和度较高(6416.9~6417.4 m、6493~6493.5 m),推测试油可获工业油流。 展开更多
关键词 成藏期次 成藏过程 流体包裹体 颗粒定量荧光 下部成藏组合 准噶尔盆地南缘
原文传递
EntityManager: Managing Dirty Data Based on Entity Resolution 被引量:2
12
作者 Xue-li liu Hong-Zhi Wang +1 位作者 jian-zhong li Hong Gao 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第3期644-662,共19页
Data quality is important in many data-driven applications, such as decision making, data analysis, and data mining. Recent studies focus on data cleaning techniques by deleting or repairing the dirty data, which may ... Data quality is important in many data-driven applications, such as decision making, data analysis, and data mining. Recent studies focus on data cleaning techniques by deleting or repairing the dirty data, which may cause information loss and bring new inconsistencies. To avoid these problems, we propose EntityManager, a general system to manage dirty data without data cleaning. This system takes real-world entity as the basic storage unit and retrieves query results according to the quality requirement of users. The system is able to handle all kinds of inconsistencies recognized by entity resolution. We elaborate the EntityManager system, covering its architecture, data model, and query processing techniques. To process queries efficiently, our system adopts novel indices, similarity operator and query optimization techniques. Finally, we verify the efficiency and effectiveness of this system and present future research challenges. 展开更多
关键词 dirty data entity resolution uncertain attribute query processing query optimization
原文传递
Interval Estimation for Aggregate Queries on Incomplete Data 被引量:1
13
作者 An-Zhen Zhang jian-zhong li Hong Gao 《Journal of Computer Science & Technology》 SCIE EI CSCD 2019年第6期1203-1216,共14页
Incomplete data has been a longstanding issue in the database community, and the subject is yet poorly handled by both theories and practices. One common way to cope with missing values is to complete their imputation... Incomplete data has been a longstanding issue in the database community, and the subject is yet poorly handled by both theories and practices. One common way to cope with missing values is to complete their imputation (filling in) as a preprocessing step before analyses. Unfortunately, not a single imputation method could impute all missing values correctly in all cases. Users could hardly trust the query result on such complete data without any confidence guarantee. In this paper, we propose to directly estimate the aggregate query result on incomplete data, rather than to impute the missing values. An interval estimation, composed of the upper and the lower bound of aggregate query results among all possible interpretations of missing values, is presented to the end users. The ground-truth aggregate result is guaranteed to be among the interval. We believe that decision support applications could benefit significantly from the estimation, since they can tolerate inexact answers, as long as there are clearly defined semantics and guarantees associated with the results. Our main techniques are parameter-free and do not assume prior knowledge about the distribution and missingness mechanisms. Experimental results are consistent with the theoretical results and suggest that the estimation is invaluable to better assess the results of aggregate queries on incomplete data. 展开更多
关键词 INCOMPLETE DATA AGGREGATE QUERY INTERVAL estimation DATA QUALITY
原文传递
O2iJoin: An Efficient Index-Based Algorithm for Overlap Interval Join 被引量:1
14
作者 Ji-Zhou Luo Sheng-Fei Shi +2 位作者 Guang Yang Hong-Zhi Wang jian-zhong li 《Journal of Computer Science & Technology》 SCIE EI CSCD 2018年第5期1023-1038,共16页
Time intervals are often associated with tuples to represent their valid time in temporal relations, where overlap join is crucial for various kinds of queries. Many existing overlap join algorithms use indices based ... Time intervals are often associated with tuples to represent their valid time in temporal relations, where overlap join is crucial for various kinds of queries. Many existing overlap join algorithms use indices based on tree structures such as quad-tree, B+-tree and interval tree. These algorithms usually have high CPU cost since deep path traversals are unavoidable, which makes them not so competitive as data-partition or plane-sweep based algorithms. This paper proposes an efficient overlap join algorithm based on a new two-layer flat index named as Overlap Interval Inverted Index (i.e., O2i Index). It uses an array to record the end points of intervals and approximates the nesting structures of intervals via two functions in the first layer, and the second layer uses inverted lists to trace all intervals satisfying the approximated nesting structures. With the help of the new index, the join algorithm only visits the must-be-scanned lists and skips all others. Analyses and experiments on both real and synthetic datasets show that the proposed algorithm is as competitive as the state-of-the-art algorithms. 展开更多
关键词 overlap interval join temporal relation overlap inverted index join algorithm
原文传递
Synthesis of Novel Scolopendra-type Polydodecyloxybenzoyl[1,5]-diazocine as New Material for Optical Sensor 被引量:1
15
作者 Ya-Nan liu jian-zhong li Xiao-Bo Wan 《Chinese Journal of Polymer Science》 SCIE CAS CSCD 2018年第6期736-741,共6页
A new scolopendra-type polymer of polydodecyloxybenzoyl[1,5]-diazocine(PDBD) was designed and prepared using 2,5-bis(4-(dodecyloxy)-benzoyl)terephthaloyl azide with trifluoroacetic acid(TFA) via one-pot reacti... A new scolopendra-type polymer of polydodecyloxybenzoyl[1,5]-diazocine(PDBD) was designed and prepared using 2,5-bis(4-(dodecyloxy)-benzoyl)terephthaloyl azide with trifluoroacetic acid(TFA) via one-pot reaction in good yields. The structure of polymer was characterized using ~1 H-NMR, ^(13) C-NMR and MALDI-TOF spectra. The polymer PDBD exhibits good thermal stability as measured by TGA and DSC, and can be dissolved well in common organic solvents such as chloroform and tetrahydrofuran. In addition, UV-Vis spectral studies indicate that the polymer PDBD shows unique optical property changes(protonation/deprotonation) in the different trifluoroacetic acid environments. The new polymer is expected to be utilized as an optical functional material for fabricating optical sensors in environmental and biological fields. 展开更多
关键词 Diazocine Scolopendra-type polymer Trifluoroacetic acid UV-Vis
原文传递
COSSETS+: Crowdsourced Missing Value Imputation Optimized byKnowledge Base
16
作者 Hong-Zhi Wang Zhi-Xin Qi +2 位作者 Ruo-Xi Shi jian-zhong li Hong Gao 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第5期845-857,共13页
Missing value imputation with crowdsourcing is a novel method in data cleaning to capture missing values that could hardly be filled with automatic approaches. However, the time cost and overhead in crowdsourcing are ... Missing value imputation with crowdsourcing is a novel method in data cleaning to capture missing values that could hardly be filled with automatic approaches. However, the time cost and overhead in crowdsourcing are high. Therefore, we have to reduce cost and guarantee the accuracy of crowdsourced imputation. To achieve the optimization goal, we present COSSET+, a crowdsourced framework optimized by knowledge base. We combine the advantages of both knowledge-based filter and crowdsourcing platform to capture missing values. Since the amount of crowd values will affect the cost of COSSET+, we aim to select partial missing values to be crowdsourced. We prove that the crowd value selection problem is an NP-hard problem and develop an approximation algorithm for this problem. Extensive experimental results demonstrate the efficiency and effectiveness of the proposed approaches. 展开更多
关键词 crowdsourcing missing value IMPUTATION knowledge base OPTIMIZATION
原文传递
Determining the Real Data Completeness of a Relational Dataset
17
作者 Yong-Nan liu jian-zhong li Zhao-Nian Zou 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第4期720-740,共21页
Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is comm... Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is common in low quality data, and it is necessary to determine the data completeness of a dataset to provide hints for follow-up operations on it.Little existing work focuses on the completeness of a dataset, and such work views all missing values as unknown values. In this paper, we study how to determine real data completeness of a relational dataset. By taking advantage of given functional dependencies, we aim to determine some missing attribute values by other tuples and capture the really missing attribute cells. We propose a data completeness model, formalize the problem of determining the real data completeness of a relational dataset, and give a lower bound of the time complexity of this problem. Two optimal algorithms to determine the data completeness of a dataset for different cases are proposed. We empirically show the effectiveness and the scalability of our algorithms on both real-world data and synthetic data. 展开更多
关键词 data quality data completeness functional dependency data completeness model optimal algorithm
原文传递
FrepJoin:an efficient partition-based algorithm for edit similarity join
18
作者 Ji-zhou LUO Sheng-fei SHI +1 位作者 Hong-zhi WANG jian-zhong li 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2017年第10期1499-1510,共12页
String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-and... String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-andrefine framework. They cannot catch the dissimilarity between string subsets, and do not fully exploit the statistics such as the frequencies of characters. We investigate to develop a partition-based algorithm by using such statistics.The frequency vectors are used to partition datasets into data chunks with dissimilarity between them being caught easily. A novel algorithm is designed to accelerate SSJ via the partitioned data. A new filter is proposed to leverage the statistics to avoid computing edit distances for a noticeable proportion of candidate pairs which survive the existing filters. Our algorithm outperforms alternative methods notably on real datasets. 展开更多
关键词 String similarity join Edit distance Filter and refine Data partition Combined frequency vectors
原文传递
Minimum Epsilon-Kernel Computation for Large-Scale Data Processing
19
作者 郭鸿杰 李建中 高宏 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第6期1398-1411,共14页
Kernel is a kind of data summary which is elaborately extracted from a large dataset.Given a problem,the solution obtained from the kernel is an approximate version of the solution obtained from the whole dataset with... Kernel is a kind of data summary which is elaborately extracted from a large dataset.Given a problem,the solution obtained from the kernel is an approximate version of the solution obtained from the whole dataset with a provable approximate ratio.It is widely used in geometric optimization,clustering,and approximate query processing,etc.,for scaling them up to massive data.In this paper,we focus on the minimumε-kernel(MK)computation that asks for a kernel of the smallest size for large-scale data processing.For the open problem presented by Wang et al.that whether the minimumε-coreset(MC)problem and the MK problem can be reduced to each other,we first formalize the MK problem and analyze its complexity.Due to the NP-hardness of the MK problem in three or higher dimensions,an approximate algorithm,namely Set Cover-Based Minimumε-Kernel algorithm(SCMK),is developed to solve it.We prove that the MC problem and the MK problem can be Turing-reduced to each other.Then,we discuss the update of MK under insertion and deletion operations,respectively.Finally,a randomized algorithm,called the Randomized Algorithm of Set Cover-Based Minimumε-Kernel algorithm(RA-SCMK),is utilized to further reduce the complexity of SCMK.The efficiency and effectiveness of SCMK and RA-SCMK are verified by experimental results on real-world and synthetic datasets.Experiments show that the kernel sizes of SCMK are 2x and 17.6x smaller than those of an ANN-based method on real-world and synthetic datasets,respectively.The speedup ratio of SCMK over the ANN-based method is 5.67 on synthetic datasets.RA-SCMK runs up to three times faster than SCMK on synthetic datasets. 展开更多
关键词 approximate query processing KERNEL large-scale dataset NP-HARD
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部