This paper improved the known study for technical progress in Malmquist productivity index calculating. In the method, all the possible movements for decision making units (DMUs) are listed, and the condition that s...This paper improved the known study for technical progress in Malmquist productivity index calculating. In the method, all the possible movements for decision making units (DMUs) are listed, and the condition that several DMUs lie on the productivity frontier is analyzed. The dynamic efficiencies of Chinese listed power companies from 1997 to 2006 were evaluated. The empirical results indicate that the improved method is effective.展开更多
SCIENCE CITATION INDEX EXPANDED-NEUROSCIENCES-JOURNAL LIST Total journals: 245 1. ACS CHEMICAL NEUROSCIENCE Monthly ISSN: 1948-7193 AMER CHEMICAL SOC, 1155 16TH ST, NW, WASHINGTON, USA, DC, 20036 · Science Cita...SCIENCE CITATION INDEX EXPANDED-NEUROSCIENCES-JOURNAL LIST Total journals: 245 1. ACS CHEMICAL NEUROSCIENCE Monthly ISSN: 1948-7193 AMER CHEMICAL SOC, 1155 16TH ST, NW, WASHINGTON, USA, DC, 20036 · Science Citation Index Expanded · BIOSIS Previews展开更多
针对在海量数据中频繁项集挖掘耗时问题,近年来提出的N-List结构可有效提高挖掘效率。基于N-List提出一种新的频繁项集挖掘算法HNSFI(Hash table and subsume frequent itemsets mining based on N-List)。该算法利用PPC-tree生成N-List...针对在海量数据中频繁项集挖掘耗时问题,近年来提出的N-List结构可有效提高挖掘效率。基于N-List提出一种新的频繁项集挖掘算法HNSFI(Hash table and subsume frequent itemsets mining based on N-List)。该算法利用PPC-tree生成N-List,引入哈希表存储N-List表示的项集,加快N-List相交操作运算时间;引入包含因子概念,利用其性质通过组合方法可以直接生成部分频繁项集,进一步提高算法时间性能。在三种不同的数据集上对该算法进行了测试和分析,实验结果表明在稠密数据集中该算法的时间性能是最优的。展开更多
针对现有的跨级高效用项集挖掘(HUIM)算法非常耗时且占用大量内存的问题,提出一种基于数据索引结构的跨级高效用项集挖掘算法(DISCH)。首先,为了高效存储和快速检索到搜索空间中的所有项集,拓展带有分类信息和索引信息的效用链表为数据...针对现有的跨级高效用项集挖掘(HUIM)算法非常耗时且占用大量内存的问题,提出一种基于数据索引结构的跨级高效用项集挖掘算法(DISCH)。首先,为了高效存储和快速检索到搜索空间中的所有项集,拓展带有分类信息和索引信息的效用链表为数据索引结构(DIS);然后,为了提高内存利用率,对不满足条件的效用链表所占的内存进行回收再分配;最后,在构建效用链表时使用提前结束策略,以减少效用链表的产生。基于真实零售数据集和合成数据集进行的实验结果表明,与CLH-Miner(Cross-Level High utility itemsets Miner)算法相比,DISCH在运行时间上平均降低了77.6%,同时在内存消耗上平均降低了73.3%,可见该算法能高效完成跨级高效用项集的搜索,并且降低算法的内存消耗。展开更多
This study seeks to evaluate the comparative productivity of 32 listed tourism companies which are the main suppliers of China tourism, using the popular methodology known as the data envelopment analysis(DEA). This s...This study seeks to evaluate the comparative productivity of 32 listed tourism companies which are the main suppliers of China tourism, using the popular methodology known as the data envelopment analysis(DEA). This study analyzes the productivity of listed tourism companies from business and region aspects based on the calculation of Malmquist index. The results show that(1) the overall productivity is non-effi cient(0.954);(2) the productivity of accommodation and catering is biggest, which shows the tourism develops quickly with supports from technology;(3) the productivity in western China is highest, where the economy and tourism attraction are better than other regions; and(4) the effi ciency differences among the listed tourism companies are not signifi cant, and they attribute to the scale effi-ciency, that is the input of the fi nance, resource, talents and policy.展开更多
在大数据时代,数据访问速度是衡量大规模存储系统性能的一个重要指标,而索引是用于提升数据库系统中数据存取性能的主要技术之一。近几年,使用机器学习模型代替B+树等传统索引,拟合数据分布规律,将数据的间接查找优化为函数直接计算的...在大数据时代,数据访问速度是衡量大规模存储系统性能的一个重要指标,而索引是用于提升数据库系统中数据存取性能的主要技术之一。近几年,使用机器学习模型代替B+树等传统索引,拟合数据分布规律,将数据的间接查找优化为函数直接计算的学习索引(Learned Index,LI)被提出,LI提高了查询的速度,减少了索引空间开销。但是LI的拟合误差较大,不支持插入等修改性操作。文中提出了一种利用梯度下降算法拟合数据的学习索引模型GDLIN(A Learned Index By Gradient Descent)。GDLIN利用梯度下降算法更好地拟合数据,减少拟合误差,缩短本地查找的时间;同时递归调用数据拟合算法,充分利用键的分布规律,构建上层结构,避免索引结构随着数据量而增大。另外,GDLIN利用链表解决LI不支持数据插入的问题。实验结果表明,GDLIN在无新数据插入的情况下,吞吐量是B+树的2.1倍;在插入操作占比为50%的情况下,是LI的1.08倍。展开更多
文摘This paper improved the known study for technical progress in Malmquist productivity index calculating. In the method, all the possible movements for decision making units (DMUs) are listed, and the condition that several DMUs lie on the productivity frontier is analyzed. The dynamic efficiencies of Chinese listed power companies from 1997 to 2006 were evaluated. The empirical results indicate that the improved method is effective.
文摘SCIENCE CITATION INDEX EXPANDED-NEUROSCIENCES-JOURNAL LIST Total journals: 245 1. ACS CHEMICAL NEUROSCIENCE Monthly ISSN: 1948-7193 AMER CHEMICAL SOC, 1155 16TH ST, NW, WASHINGTON, USA, DC, 20036 · Science Citation Index Expanded · BIOSIS Previews
文摘针对在海量数据中频繁项集挖掘耗时问题,近年来提出的N-List结构可有效提高挖掘效率。基于N-List提出一种新的频繁项集挖掘算法HNSFI(Hash table and subsume frequent itemsets mining based on N-List)。该算法利用PPC-tree生成N-List,引入哈希表存储N-List表示的项集,加快N-List相交操作运算时间;引入包含因子概念,利用其性质通过组合方法可以直接生成部分频繁项集,进一步提高算法时间性能。在三种不同的数据集上对该算法进行了测试和分析,实验结果表明在稠密数据集中该算法的时间性能是最优的。
文摘针对现有的跨级高效用项集挖掘(HUIM)算法非常耗时且占用大量内存的问题,提出一种基于数据索引结构的跨级高效用项集挖掘算法(DISCH)。首先,为了高效存储和快速检索到搜索空间中的所有项集,拓展带有分类信息和索引信息的效用链表为数据索引结构(DIS);然后,为了提高内存利用率,对不满足条件的效用链表所占的内存进行回收再分配;最后,在构建效用链表时使用提前结束策略,以减少效用链表的产生。基于真实零售数据集和合成数据集进行的实验结果表明,与CLH-Miner(Cross-Level High utility itemsets Miner)算法相比,DISCH在运行时间上平均降低了77.6%,同时在内存消耗上平均降低了73.3%,可见该算法能高效完成跨级高效用项集的搜索,并且降低算法的内存消耗。
基金supported by the project of Shaanxi Normal University(Grant No.999521)Xianyang Normal University(Grant Nos.11XSYK316,201002001)
文摘This study seeks to evaluate the comparative productivity of 32 listed tourism companies which are the main suppliers of China tourism, using the popular methodology known as the data envelopment analysis(DEA). This study analyzes the productivity of listed tourism companies from business and region aspects based on the calculation of Malmquist index. The results show that(1) the overall productivity is non-effi cient(0.954);(2) the productivity of accommodation and catering is biggest, which shows the tourism develops quickly with supports from technology;(3) the productivity in western China is highest, where the economy and tourism attraction are better than other regions; and(4) the effi ciency differences among the listed tourism companies are not signifi cant, and they attribute to the scale effi-ciency, that is the input of the fi nance, resource, talents and policy.
文摘在大数据时代,数据访问速度是衡量大规模存储系统性能的一个重要指标,而索引是用于提升数据库系统中数据存取性能的主要技术之一。近几年,使用机器学习模型代替B+树等传统索引,拟合数据分布规律,将数据的间接查找优化为函数直接计算的学习索引(Learned Index,LI)被提出,LI提高了查询的速度,减少了索引空间开销。但是LI的拟合误差较大,不支持插入等修改性操作。文中提出了一种利用梯度下降算法拟合数据的学习索引模型GDLIN(A Learned Index By Gradient Descent)。GDLIN利用梯度下降算法更好地拟合数据,减少拟合误差,缩短本地查找的时间;同时递归调用数据拟合算法,充分利用键的分布规律,构建上层结构,避免索引结构随着数据量而增大。另外,GDLIN利用链表解决LI不支持数据插入的问题。实验结果表明,GDLIN在无新数据插入的情况下,吞吐量是B+树的2.1倍;在插入操作占比为50%的情况下,是LI的1.08倍。