The batch splitting scheduling problem has recently become a major target in manufacturing systems, and the researchers have obtained great achievements, whereas most of existing related researches focus on equal-size...The batch splitting scheduling problem has recently become a major target in manufacturing systems, and the researchers have obtained great achievements, whereas most of existing related researches focus on equal-sized and consistent-sized batch splitting scheduling problem, and solve the problem by fixing the number of sub-batches, or the sub-batch sizes, or both. Under such circumstance and to provide a practical method for production scheduling in batch production mode, a study was made on the batch splitting scheduling problem on alternative machines, based on the objective to minimize the makespan. A scheduling approach was presented to address the variable-sized batch splitting scheduling problem in job shops trying to optimize both the number of sub-bathes and the sub-batch sizes, based on differential evolution(DE), making full use of the finding that the sum of values of genes in one chromosome remains the same before and after mutation in DE. Considering before-arrival set-up time and processing time separately, a variable-sized batch splitting scheduling model was established and a new hybrid algorithm was brought forward to solve both the batch splitting problem and the batch scheduling problem. A new parallel chromosome representation was adopted, and the batch scheduling chromosome and the batch splitting chromosome were treated separately during the global search procedure, based on self-adaptive DE and genetic crossover operator, respectively. A new local search method was further designed to gain a better performance. A solution consists of the optimum number of sub-bathes for each operation per job, the optimum batch size for each sub-batch and the optimum sequence of sub-batches. Computational experiments of four test instances and a realistic problem in a speaker workshop were performed to testify the effectiveness of the proposed scheduling method. The study takes advantage of DE's distinctive feature, and employs the algorithm as a solution approach, and thereby deepens and enriches the content of batch splitting scheduling.展开更多
This paper proposes a non-segmented document clustering method using self-organizing map (SOM) and frequent max substring technique to improve the efficiency of information retrieval. SOM has been widely used for docu...This paper proposes a non-segmented document clustering method using self-organizing map (SOM) and frequent max substring technique to improve the efficiency of information retrieval. SOM has been widely used for document clustering and is successful in many applications. However, when applying to non-segmented document, the challenge is to identify any interesting pattern efficiently. There are two main phases in the propose method: preprocessing phase and clustering phase. In the preprocessing phase, the frequent max substring technique is first applied to discover the patterns of interest called Frequent Max substrings that are long and frequent substrings, rather than individual words from the non-segmented texts. These discovered patterns are then used as indexing terms. The indexing terms together with their number of occurrences form a document vector. In the clustering phase, SOM is used to generate the document cluster map by using the feature vector of Frequent Max substrings. To demonstrate the proposed technique, experimental studies and comparison results on clustering the Thai text documents, which consist of non-segmented texts, are presented in this paper. The results show that the proposed technique can be used for Thai texts. The document cluster map generated with the method can be used to find the relevant documents more efficiently.展开更多
Given a list of items and a sequence of variable-sized bins arriving one by one, it is NP-hard to pack the items into the bin list with a goal to minimize the total size of bins from the earliest one to the last used....Given a list of items and a sequence of variable-sized bins arriving one by one, it is NP-hard to pack the items into the bin list with a goal to minimize the total size of bins from the earliest one to the last used. In this paper a set of approximation algorithms is presented for cases in which the ability to preview at most k(〉=2) arriving bins is given. With the essential assumption that all bin sizes are not less than the largest item size, analytical results show the asymptotic worst case ratios of all k-bounded space and offiine algorithms are 2. Based on experiments by applying algorithms to instances in which item sizes and bin sizes are drawn independently from the continuous uniform distribution respectively in the interval [0,u] and [u,l ], averagecase experimental results show that, with fixed k, algorithms with the Best Fit packing(closing) rule are statistically better than those with the First Fit packing(closing) rule.展开更多
Ensuring the correctness of answers to substring queries has not been a concern for consumers working within the traditional confines of their own organisational infrastructure. This is due to the fact that organisati...Ensuring the correctness of answers to substring queries has not been a concern for consumers working within the traditional confines of their own organisational infrastructure. This is due to the fact that organisations generally trust their handling of their own data hosted on their own servers and networks. With cloud computing however, where both data and processing are delegated to unknown servers, guarantees of the correctness of queries need to be available. The verification of the results of substring searches has not been given much focus to date within the wider scope of data and query, verification. We present a verification scheme for existential substring searc, hes on text files, which is the first of its kind to satisfy the desired properties of authenticity, completeness, and freshness. The scheme is based on suffix arrays, Merkle hash trees and cryptographic hashes to provide strong guarantees of correctness for the consumer, even in fully untrusted environments. We provide a description of our scheme, along with the results of experiments conducted on a fully-working prototype.展开更多
高效求解2个字符串的最长公共子串(Longest Common Substring)是实现很多字符串算法的关键。文中首先给出了求解LCP问题的动态规划算法,广义后缀树算法,研究并分析了这两种算法,得出动态规划算法易于理解,但时间复杂度较高;广义后缀树...高效求解2个字符串的最长公共子串(Longest Common Substring)是实现很多字符串算法的关键。文中首先给出了求解LCP问题的动态规划算法,广义后缀树算法,研究并分析了这两种算法,得出动态规划算法易于理解,但时间复杂度较高;广义后缀树算法的时间复杂度较低,但实现较为复杂并且广义后缀树占用的空间也较多。最后提出了一个新算法,该算法使用2个字符串的广义后缀数组,在保持和广义后缀树时间复杂度相等的基础上,可以简单地实现并且占用较少的空间。展开更多
本文介绍了一种基于最大公共子串(Longest Common Substring,LCS)算法的术语抽取方法:按标点符号对领域文档进行切分;抽取切分后的语句片断的所有最大公共子串作为候选术语集;通过停用词过滤、对照领域词筛选和术语嵌套子串筛选等规...本文介绍了一种基于最大公共子串(Longest Common Substring,LCS)算法的术语抽取方法:按标点符号对领域文档进行切分;抽取切分后的语句片断的所有最大公共子串作为候选术语集;通过停用词过滤、对照领域词筛选和术语嵌套子串筛选等规则进行判别,得到最终的术语集。通过学前教育领域术语抽取的实验,验证了该算法可以有效地抽取中文领域术语:术语抽取平均准确率达84.2%;4~6字符双词术语抽取的效果尤佳,准确率接近100%。展开更多
基金supported by National Hi-tech Research and Development Program of China (863 Program, Grant No. 2007AA04Z155)National Natural Science Foundation of China (Grant No. 60970021)Zhejiang Provincial Natural Science Foundation of China (Grant No. Y1090592)
文摘The batch splitting scheduling problem has recently become a major target in manufacturing systems, and the researchers have obtained great achievements, whereas most of existing related researches focus on equal-sized and consistent-sized batch splitting scheduling problem, and solve the problem by fixing the number of sub-batches, or the sub-batch sizes, or both. Under such circumstance and to provide a practical method for production scheduling in batch production mode, a study was made on the batch splitting scheduling problem on alternative machines, based on the objective to minimize the makespan. A scheduling approach was presented to address the variable-sized batch splitting scheduling problem in job shops trying to optimize both the number of sub-bathes and the sub-batch sizes, based on differential evolution(DE), making full use of the finding that the sum of values of genes in one chromosome remains the same before and after mutation in DE. Considering before-arrival set-up time and processing time separately, a variable-sized batch splitting scheduling model was established and a new hybrid algorithm was brought forward to solve both the batch splitting problem and the batch scheduling problem. A new parallel chromosome representation was adopted, and the batch scheduling chromosome and the batch splitting chromosome were treated separately during the global search procedure, based on self-adaptive DE and genetic crossover operator, respectively. A new local search method was further designed to gain a better performance. A solution consists of the optimum number of sub-bathes for each operation per job, the optimum batch size for each sub-batch and the optimum sequence of sub-batches. Computational experiments of four test instances and a realistic problem in a speaker workshop were performed to testify the effectiveness of the proposed scheduling method. The study takes advantage of DE's distinctive feature, and employs the algorithm as a solution approach, and thereby deepens and enriches the content of batch splitting scheduling.
文摘This paper proposes a non-segmented document clustering method using self-organizing map (SOM) and frequent max substring technique to improve the efficiency of information retrieval. SOM has been widely used for document clustering and is successful in many applications. However, when applying to non-segmented document, the challenge is to identify any interesting pattern efficiently. There are two main phases in the propose method: preprocessing phase and clustering phase. In the preprocessing phase, the frequent max substring technique is first applied to discover the patterns of interest called Frequent Max substrings that are long and frequent substrings, rather than individual words from the non-segmented texts. These discovered patterns are then used as indexing terms. The indexing terms together with their number of occurrences form a document vector. In the clustering phase, SOM is used to generate the document cluster map by using the feature vector of Frequent Max substrings. To demonstrate the proposed technique, experimental studies and comparison results on clustering the Thai text documents, which consist of non-segmented texts, are presented in this paper. The results show that the proposed technique can be used for Thai texts. The document cluster map generated with the method can be used to find the relevant documents more efficiently.
文摘Given a list of items and a sequence of variable-sized bins arriving one by one, it is NP-hard to pack the items into the bin list with a goal to minimize the total size of bins from the earliest one to the last used. In this paper a set of approximation algorithms is presented for cases in which the ability to preview at most k(〉=2) arriving bins is given. With the essential assumption that all bin sizes are not less than the largest item size, analytical results show the asymptotic worst case ratios of all k-bounded space and offiine algorithms are 2. Based on experiments by applying algorithms to instances in which item sizes and bin sizes are drawn independently from the continuous uniform distribution respectively in the interval [0,u] and [u,l ], averagecase experimental results show that, with fixed k, algorithms with the Best Fit packing(closing) rule are statistically better than those with the First Fit packing(closing) rule.
文摘Ensuring the correctness of answers to substring queries has not been a concern for consumers working within the traditional confines of their own organisational infrastructure. This is due to the fact that organisations generally trust their handling of their own data hosted on their own servers and networks. With cloud computing however, where both data and processing are delegated to unknown servers, guarantees of the correctness of queries need to be available. The verification of the results of substring searches has not been given much focus to date within the wider scope of data and query, verification. We present a verification scheme for existential substring searc, hes on text files, which is the first of its kind to satisfy the desired properties of authenticity, completeness, and freshness. The scheme is based on suffix arrays, Merkle hash trees and cryptographic hashes to provide strong guarantees of correctness for the consumer, even in fully untrusted environments. We provide a description of our scheme, along with the results of experiments conducted on a fully-working prototype.
文摘高效求解2个字符串的最长公共子串(Longest Common Substring)是实现很多字符串算法的关键。文中首先给出了求解LCP问题的动态规划算法,广义后缀树算法,研究并分析了这两种算法,得出动态规划算法易于理解,但时间复杂度较高;广义后缀树算法的时间复杂度较低,但实现较为复杂并且广义后缀树占用的空间也较多。最后提出了一个新算法,该算法使用2个字符串的广义后缀数组,在保持和广义后缀树时间复杂度相等的基础上,可以简单地实现并且占用较少的空间。
文摘本文介绍了一种基于最大公共子串(Longest Common Substring,LCS)算法的术语抽取方法:按标点符号对领域文档进行切分;抽取切分后的语句片断的所有最大公共子串作为候选术语集;通过停用词过滤、对照领域词筛选和术语嵌套子串筛选等规则进行判别,得到最终的术语集。通过学前教育领域术语抽取的实验,验证了该算法可以有效地抽取中文领域术语:术语抽取平均准确率达84.2%;4~6字符双词术语抽取的效果尤佳,准确率接近100%。