To reduce the time required to complete the regeneration process of erasure codes, we propose a Tree-structured Parallel Regeneration (TPR) scheme for multiple data losses in distributed storage systems. Under the sch...To reduce the time required to complete the regeneration process of erasure codes, we propose a Tree-structured Parallel Regeneration (TPR) scheme for multiple data losses in distributed storage systems. Under the scheme, two algorithms are proposed for the construction of multiple regeneration trees, namely the edge-disjoint algorithm and edge-sharing algorithm. The edge-disjoint algorithm constructs multiple independent trees, and is simple and appropriate for environments where newcomers and their providers are distributed over a large area and have few intersections. The edge-sharing algorithm constructs multiple trees that compete to utilize the bandwidth, and make a better utilization of the bandwidth, although it needs to measure the available band-width and deal with the bandwidth changes; it is therefore difficult to implement in practical systems. The parallel regeneration for multiple data losses of TPR primarily includes two optimizations: firstly, transferring the data through the bandwidth optimized-paths in a pipe-line manner; secondly, executing data regeneration over multiple trees in parallel. To evaluate the proposal, we implement an event-based simulator and make a detailed comparison with some popular regeneration methods. The quantitative comparison results show that the use of TPR employing either the edge-disjoint algorithm or edge-sharing algorithm reduces the regeneration time significantly.展开更多
A binary tree can be represented by a code reflecting the traversal of the corresponding regular binary tree in given monotonic order. A different coding scheme based on the branches of a regular binary tree with n-no...A binary tree can be represented by a code reflecting the traversal of the corresponding regular binary tree in given monotonic order. A different coding scheme based on the branches of a regular binary tree with n-nodes is proposed. It differs from the coding scheme generally used and makes no distinction between internal nodes and terminal nodes. A code of a regular binary tree with nnodes is formed by labeling the left branches by O’s and the right branches by l’s and then traversing these branches in pre-order. Root is always assumed to be on a left branch.展开更多
The classification for handwritten Chinese character recognition can be viewed as a transformation in discrete vector space. In this paper, from the point of discrete vector space transformation, a new 4-corner codes ...The classification for handwritten Chinese character recognition can be viewed as a transformation in discrete vector space. In this paper, from the point of discrete vector space transformation, a new 4-corner codes classifier based on decision tree inductive learning algorithm ID3 for handwritten Chinese characters is presented. With a feature extraction controller, the classifier can reduce the number of extracted features and accelerate classification speed. Experimental results show that the 4-corner codes classifier performs well on both recognition accuracy and speed.展开更多
Software projects are becoming larger and more complicated. Managing those projects is based on several software development methodologies. One of those methodologies is software version control, which is used in the ...Software projects are becoming larger and more complicated. Managing those projects is based on several software development methodologies. One of those methodologies is software version control, which is used in the majority of worldwide software projects. Although existing version control systems provide sufficient functionality in many situations, they are lacking in terms of semantics and structure for source code. It is commonly believed that improving software version control can contribute substantially to the development of software. We present a solution that considers a structural model for matching source code that can be used in version control.展开更多
This article proposes the high-speed and high-accuracy code clone detection method based on the combination of tree-based and token-based methods. Existence of duplicated program codes, called code clone, is one of th...This article proposes the high-speed and high-accuracy code clone detection method based on the combination of tree-based and token-based methods. Existence of duplicated program codes, called code clone, is one of the main factors that reduces the quality and maintainability of software. If one code fragment contains faults (bugs) and they are copied and modified to other locations, it is necessary to correct all of them. But it is not easy to find all code clones in large and complex software. Much research efforts have been done for code clone detection. There are mainly two methods for code clone detection. One is token-based and the other is tree-based method. Token-based method is fast and requires less resources. However it cannot detect all kinds of code clones. Tree-based method can detect all kinds of code clones, but it is slow and requires much computing resources. In this paper combination of these two methods was proposed to improve the efficiency and accuracy of detecting code clones. Firstly some candidates of code clones will be extracted by token-based method that is fast and lightweight. Then selected candidates will be checked more precisely by using tree-based method that can find all kinds of code clones. The prototype system was developed. This system accepts source code and tokenizes it in the first step. Then token-based method is applied to this token sequence to find candidates of code clones. After extracting several candidates, selected source codes will be converted into abstract syntax tree (AST) for applying tree-based method. Some sample source codes were used to evaluate the proposed method. This evaluation proved the improvement of efficiency and precision of code clones detecting.展开更多
This paper studies the algorithms for coding and decoding Prufer codes of a labeled tree. The algorithms for coding and decoding Prufer codes of a labeled tree in the literatures require time usually. Although there e...This paper studies the algorithms for coding and decoding Prufer codes of a labeled tree. The algorithms for coding and decoding Prufer codes of a labeled tree in the literatures require time usually. Although there exist linear time algorithms for Prufer-like codes [1,2,3], the algorithms utilize the integer sorting algorithms. The special range of the integers to be sorted is utilized to obtain a linear time integer sorting algorithm. The Prufer code problem is reduced to integer sorting. In this paper we consider the Prufer code problem in a different angle and a more direct manner. We start from a naïve algorithm, then improved it gradually and finally we obtain a very practical linear time algorithm. The techniques we used in this paper are of interest in their own right.展开更多
The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactic...The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactical and semantic information of source code is proposed. Firstly, abstract syntax tree(AST) is divided based on node category to obtain statement sequence. The statement tree is encoded into vectors to capture lexical and syntactical knowledge at the statement level.Secondly, the source code is transformed into vector representation by the sequence naturalness of the statement. Therefore,the problem of gradient vanishing and explosion caused by a large AST size is obviated when using AST to the represent source code. Finally, the correlation between bug reports and source files are comprehensively analyzed from three aspects of syntax, semantics and text to locate the buggy code. Experiments show that compared with other standard models, the proposed model improves the performance of bug localization, and it has good advantages in mean reciprocal rank(MRR), mean average precision(MAP) and Top N Rank.展开更多
攻击者利用域名灵活地实施各类网络攻击,诸多学者针对性地提出了一些基于统计特征和基于关联关系的恶意域名检测方法,但这2类方法在域名属性高阶关系表示方面存在不足,无法准确呈现域间全局高阶关系.针对这类问题,提出一种基于嵌入式特...攻击者利用域名灵活地实施各类网络攻击,诸多学者针对性地提出了一些基于统计特征和基于关联关系的恶意域名检测方法,但这2类方法在域名属性高阶关系表示方面存在不足,无法准确呈现域间全局高阶关系.针对这类问题,提出一种基于嵌入式特征超图学习的恶意域名检测方法:首先基于域名空间统计特征利用决策树构建域名超图结构,利用决策树倒数第2层节点的输出结果作为先验条件形成超边,快速将域名流量之间的多阶关联关系清晰地表示出来;其次基于超图结构特征对字符嵌入特征进行增强编码,基于域名空间统计特征和域名字符嵌入编码特征从域名数据中挖掘出字符间隐藏的高阶关系;最后结合中国科技网真实的域名系统(domain name system,DNS)流量,对有效性和可行性进行了分析与评估,能够快速高效地检测隐蔽的恶意域名.展开更多
相比于高效视频编码(high efficiency video coding,HEVC)标准,新一代编码标准多功能视频编码(versatile video coding,VVC)引入了很多新的技术,其中包括四叉树(quadtree,QT)和多类型树(multi-type tree,MTT)划分,MTT划分由HEVC中的QT...相比于高效视频编码(high efficiency video coding,HEVC)标准,新一代编码标准多功能视频编码(versatile video coding,VVC)引入了很多新的技术,其中包括四叉树(quadtree,QT)和多类型树(multi-type tree,MTT)划分,MTT划分由HEVC中的QT划分延伸而来。新划分方法提高了压缩效率,但导致编码时间急剧增加。为了降低编码复杂度,提出了一种结合深度学习方法和MTT方向早期判决的快速帧内编码算法。首先使用轻量级的卷积神经网络(convolutional neural network,CNN)对QT和部分MTT进行预测划分,其余MTT则采用提前预测MTT划分方向的方法作进一步的优化。实验结果表明,所提方法能够大幅降低编码复杂度,相比于原始编码器的编码时间减少了74.3%,且只有3.3%的码率损失,性能优于对比的方法。展开更多
基金supported by the National Grand Fundamental Research of China (973 Program) under Grant No. 2011CB302601the National High Technology Research and Development of China (863 Program) under GrantNo. 2013AA01A213+2 种基金the National Natural Science Foundation of China under Grant No. 60873215the Natural Science Foundation for Distinguished Young Scholars of Hunan Province under Grant No. S2010J5050Specialized Research Fund for the Doctoral Program of Higher Education under Grant No. 20124307110015
文摘To reduce the time required to complete the regeneration process of erasure codes, we propose a Tree-structured Parallel Regeneration (TPR) scheme for multiple data losses in distributed storage systems. Under the scheme, two algorithms are proposed for the construction of multiple regeneration trees, namely the edge-disjoint algorithm and edge-sharing algorithm. The edge-disjoint algorithm constructs multiple independent trees, and is simple and appropriate for environments where newcomers and their providers are distributed over a large area and have few intersections. The edge-sharing algorithm constructs multiple trees that compete to utilize the bandwidth, and make a better utilization of the bandwidth, although it needs to measure the available band-width and deal with the bandwidth changes; it is therefore difficult to implement in practical systems. The parallel regeneration for multiple data losses of TPR primarily includes two optimizations: firstly, transferring the data through the bandwidth optimized-paths in a pipe-line manner; secondly, executing data regeneration over multiple trees in parallel. To evaluate the proposal, we implement an event-based simulator and make a detailed comparison with some popular regeneration methods. The quantitative comparison results show that the use of TPR employing either the edge-disjoint algorithm or edge-sharing algorithm reduces the regeneration time significantly.
文摘A binary tree can be represented by a code reflecting the traversal of the corresponding regular binary tree in given monotonic order. A different coding scheme based on the branches of a regular binary tree with n-nodes is proposed. It differs from the coding scheme generally used and makes no distinction between internal nodes and terminal nodes. A code of a regular binary tree with nnodes is formed by labeling the left branches by O’s and the right branches by l’s and then traversing these branches in pre-order. Root is always assumed to be on a left branch.
文摘The classification for handwritten Chinese character recognition can be viewed as a transformation in discrete vector space. In this paper, from the point of discrete vector space transformation, a new 4-corner codes classifier based on decision tree inductive learning algorithm ID3 for handwritten Chinese characters is presented. With a feature extraction controller, the classifier can reduce the number of extracted features and accelerate classification speed. Experimental results show that the 4-corner codes classifier performs well on both recognition accuracy and speed.
文摘Software projects are becoming larger and more complicated. Managing those projects is based on several software development methodologies. One of those methodologies is software version control, which is used in the majority of worldwide software projects. Although existing version control systems provide sufficient functionality in many situations, they are lacking in terms of semantics and structure for source code. It is commonly believed that improving software version control can contribute substantially to the development of software. We present a solution that considers a structural model for matching source code that can be used in version control.
文摘This article proposes the high-speed and high-accuracy code clone detection method based on the combination of tree-based and token-based methods. Existence of duplicated program codes, called code clone, is one of the main factors that reduces the quality and maintainability of software. If one code fragment contains faults (bugs) and they are copied and modified to other locations, it is necessary to correct all of them. But it is not easy to find all code clones in large and complex software. Much research efforts have been done for code clone detection. There are mainly two methods for code clone detection. One is token-based and the other is tree-based method. Token-based method is fast and requires less resources. However it cannot detect all kinds of code clones. Tree-based method can detect all kinds of code clones, but it is slow and requires much computing resources. In this paper combination of these two methods was proposed to improve the efficiency and accuracy of detecting code clones. Firstly some candidates of code clones will be extracted by token-based method that is fast and lightweight. Then selected candidates will be checked more precisely by using tree-based method that can find all kinds of code clones. The prototype system was developed. This system accepts source code and tokenizes it in the first step. Then token-based method is applied to this token sequence to find candidates of code clones. After extracting several candidates, selected source codes will be converted into abstract syntax tree (AST) for applying tree-based method. Some sample source codes were used to evaluate the proposed method. This evaluation proved the improvement of efficiency and precision of code clones detecting.
文摘This paper studies the algorithms for coding and decoding Prufer codes of a labeled tree. The algorithms for coding and decoding Prufer codes of a labeled tree in the literatures require time usually. Although there exist linear time algorithms for Prufer-like codes [1,2,3], the algorithms utilize the integer sorting algorithms. The special range of the integers to be sorted is utilized to obtain a linear time integer sorting algorithm. The Prufer code problem is reduced to integer sorting. In this paper we consider the Prufer code problem in a different angle and a more direct manner. We start from a naïve algorithm, then improved it gradually and finally we obtain a very practical linear time algorithm. The techniques we used in this paper are of interest in their own right.
基金supported by the National Key R&D Program of China (2018YFB1702700)。
文摘The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactical and semantic information of source code is proposed. Firstly, abstract syntax tree(AST) is divided based on node category to obtain statement sequence. The statement tree is encoded into vectors to capture lexical and syntactical knowledge at the statement level.Secondly, the source code is transformed into vector representation by the sequence naturalness of the statement. Therefore,the problem of gradient vanishing and explosion caused by a large AST size is obviated when using AST to the represent source code. Finally, the correlation between bug reports and source files are comprehensively analyzed from three aspects of syntax, semantics and text to locate the buggy code. Experiments show that compared with other standard models, the proposed model improves the performance of bug localization, and it has good advantages in mean reciprocal rank(MRR), mean average precision(MAP) and Top N Rank.
文摘攻击者利用域名灵活地实施各类网络攻击,诸多学者针对性地提出了一些基于统计特征和基于关联关系的恶意域名检测方法,但这2类方法在域名属性高阶关系表示方面存在不足,无法准确呈现域间全局高阶关系.针对这类问题,提出一种基于嵌入式特征超图学习的恶意域名检测方法:首先基于域名空间统计特征利用决策树构建域名超图结构,利用决策树倒数第2层节点的输出结果作为先验条件形成超边,快速将域名流量之间的多阶关联关系清晰地表示出来;其次基于超图结构特征对字符嵌入特征进行增强编码,基于域名空间统计特征和域名字符嵌入编码特征从域名数据中挖掘出字符间隐藏的高阶关系;最后结合中国科技网真实的域名系统(domain name system,DNS)流量,对有效性和可行性进行了分析与评估,能够快速高效地检测隐蔽的恶意域名.
文摘相比于高效视频编码(high efficiency video coding,HEVC)标准,新一代编码标准多功能视频编码(versatile video coding,VVC)引入了很多新的技术,其中包括四叉树(quadtree,QT)和多类型树(multi-type tree,MTT)划分,MTT划分由HEVC中的QT划分延伸而来。新划分方法提高了压缩效率,但导致编码时间急剧增加。为了降低编码复杂度,提出了一种结合深度学习方法和MTT方向早期判决的快速帧内编码算法。首先使用轻量级的卷积神经网络(convolutional neural network,CNN)对QT和部分MTT进行预测划分,其余MTT则采用提前预测MTT划分方向的方法作进一步的优化。实验结果表明,所提方法能够大幅降低编码复杂度,相比于原始编码器的编码时间减少了74.3%,且只有3.3%的码率损失,性能优于对比的方法。