A partition of intervals method is adopted in current classification based on associations (CBA), but this method cannot reflect the actual distribution of data and exists the problem of sharp boundary problem. The cl...A partition of intervals method is adopted in current classification based on associations (CBA), but this method cannot reflect the actual distribution of data and exists the problem of sharp boundary problem. The classification system based on the longest association rules with linguistic terms is discussed, and the shortcoming of this classification system is analyzed. Then, the classification system based on the short association rules with linguistic terms is presented. The example shows that the accuracy of the classification system based on the association rules with linguistic terms is better than two popular classification methods: C4.5 and CBA.展开更多
In order to construct the data mining frame for the generic project risk research, the basic definitions of the generic project risk element were given, and then a new model of the generic project risk element was pre...In order to construct the data mining frame for the generic project risk research, the basic definitions of the generic project risk element were given, and then a new model of the generic project risk element was presented with the definitions. From the model, data mining method was used to acquire the risk transmission matrix from the historical databases analysis. The quantitative calculation problem among the generic project risk elements was solved. This method deals with well the risk element transmission problems with limited states. And in order to get the limited states, fuzzy theory was used to discrete the historical data in historical databases. In an example, the controlling risk degree is chosen as P(Rs≥2) ≤0.1, it means that the probability of risk state which is not less than 2 in project is not more than 0.1, the risk element R3 is chosen to control the project, respectively. The result shows that three risk element transmission matrix can be acquired in 4 risk elements, and the frequency histogram and cumulative frequency histogram of each risk element are also given.展开更多
Current technology for frequent itemset mining mostly applies to the data stored in a single transaction database. This paper presents a novel algorithm MultiClose for frequent itemset mining in data warehouses. Multi...Current technology for frequent itemset mining mostly applies to the data stored in a single transaction database. This paper presents a novel algorithm MultiClose for frequent itemset mining in data warehouses. MultiClose respectively computes the results in single dimension tables and merges the results with a very efficient approach. Close itemsets technique is used to improve the performance of the algorithm. The authors propose an efficient implementation for star schemas in which their al- gorithm outperforms state-of-the-art single-table algorithms.展开更多
HA (hashing array), a new algorithm, for mining frequent itemsets of large database is proposed. It employs a structure hash array, ltemArray ( ) to store the information of database and then uses it instead of da...HA (hashing array), a new algorithm, for mining frequent itemsets of large database is proposed. It employs a structure hash array, ltemArray ( ) to store the information of database and then uses it instead of database in later iteration. By this improvement, only twice scanning of the whole database is necessary, thereby the computational cost can be reduced significantly. To overcome the performance bottleneck of frequent 2-itemsets mining, a modified algorithm of HA, DHA (directaddressing hashing and array) is proposed, which combines HA with direct-addressing hashing technique. The new hybrid algorithm, DHA, not only overcomes the performance bottleneck but also inherits the advantages of HA. Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm, and the results prove the new algorithm is more efficient and reasonable.展开更多
In this paper, it described the architecture of a tool called DiagData. This tool aims to use a large amount of data and information in the field of plant disease diagnostic to generate a disease predictive system. In...In this paper, it described the architecture of a tool called DiagData. This tool aims to use a large amount of data and information in the field of plant disease diagnostic to generate a disease predictive system. In this approach, techniques of data mining are used to extract knowledge from existing data. The data is extracted in the form of rules that are used in the development of a predictive intelligent system. Currently, the specification of these rules is built by an expert or data mining. When data mining on a large database is used, the number of generated rules is very complex too. The main goal of this work is minimize the rule generation time. The proposed tool, called DiagData, extracts knowledge automatically or semi-automatically from a database and uses it to build an intelligent system for disease prediction. In this work, the decision tree learning algorithm was used to generate the rules. A toolbox called Fuzzygen was used to generate a prediction system from rules generated by decision tree algorithm. The language used to implement this software was Java. The DiagData has been used in diseases prediction and diagnosis systems and in the validation of economic and environmental indicators in agricultural production systems. The validation process involved measurements and comparisons of the time spent to enter the rules by an expert with the time used to insert the same rules with the proposed tool. Thus, the tool was successfully validated, providing a reduction of time.展开更多
In this research article, we analyze the multimedia data mining and classification algorithm based on database optimization techniques. Of high performance application requirements of various kinds are springing up co...In this research article, we analyze the multimedia data mining and classification algorithm based on database optimization techniques. Of high performance application requirements of various kinds are springing up constantly makes parallel computer system structure is valued by more and more common but the corresponding software system development lags far behind the development of the hardware system, it is more obvious in the field of database technology application. Multimedia mining is different from the low level of computer multimedia processing technology and the former focuses on the extracted from huge multimedia collection mode which focused on specific features of understanding or extraction from a single multimedia objects. Our research provides new paradigm for the methodology which will be meaningful and necessary.展开更多
This paper describes in detail the web data mining technology, analyzes the relationship between the data on the web site to the tourism electronic commerce (including the server log, tourism commodity database, user...This paper describes in detail the web data mining technology, analyzes the relationship between the data on the web site to the tourism electronic commerce (including the server log, tourism commodity database, user database, the shopping cart), access to relevant user preference information for tourism commodity. Based on these models, the paper presents recommended strategies for the site registered users, and has had the corresponding formulas for calculating the current user of certain items recommended values and the corresponding recommendation algorithm, and the system can get a recommendation for user.展开更多
Modular technology can effectively support the rapid design of products, and it is one of the key technologies to realize mass customization design. With the application of product lifecycle management(PLM) system in ...Modular technology can effectively support the rapid design of products, and it is one of the key technologies to realize mass customization design. With the application of product lifecycle management(PLM) system in enterprises, the product lifecycle data have been effectively managed. However, these data have not been fully utilized in module division, especially for complex machinery products. To solve this problem, a product module mining method for the PLM database is proposed to improve the effect of module division. Firstly, product data are extracted from the PLM database by data extraction algorithm. Then, data normalization and structure logical inspection are used to preprocess the extracted defective data. The preprocessed product data are analyzed and expressed in a matrix for module mining. Finally, the fuzzy c-means clustering(FCM) algorithm is used to generate product modules, which are stored in product module library after module marking and post-processing. The feasibility and effectiveness of the proposed method are verified by a case study of high pressure valve.展开更多
Objective To establish a warehouse on acupuncture-moxibution (acup-mox) methods to explore valuable laws about research and clinical application of acup-mox in a great number of literature by use of data mining tech...Objective To establish a warehouse on acupuncture-moxibution (acup-mox) methods to explore valuable laws about research and clinical application of acup-mox in a great number of literature by use of data mining technique and to promote acup-mox research and effective treatment of diseases. Methods According to the acup-mox literature information of different types, different subjects of the aeup-mox literature are determined and the relevant database is established. In the continuously enriched subject database, the data warehouse catering to multi-subjects and multi-dimensions is set up so as to provide a platform for wider application of aeup-mox literature information. Results Based on characteristics of the acup-mox literature, many subject databases, such as needling with filiform needle, moxibustion, etc., are established and clinical treatment laws of acup-mox are revealed by use of data mining method in the database established. Conclusion Establishment of the acup-mox literature warehouse provides a standard data expression model, rich attributes and relation between different literature information for study of aeup-mox literature by more effective techniques, and a rich and standard data basis for acup-mox researches.展开更多
Objective To explore the clinically indicated diseases of acupoint catgut-embedding therapy, and summarize and analyze the disease spectrum of acupoint catgut-embedding therapy. Methods By literature research and data...Objective To explore the clinically indicated diseases of acupoint catgut-embedding therapy, and summarize and analyze the disease spectrum of acupoint catgut-embedding therapy. Methods By literature research and data mining technique, the clinical study papers relevant to acupoint catgut-embedding therapy published from 1971 to June 2011 were selected, entered and verified, then the effective information were extracted, and finally, the disease spectrum were summarized. Results Acupoint catgut-embedding therapy is indicated for 103 diseases, involving 6 departments, of which there are 50 internal diseases, which is the most, accounting to 48.54%, and 15 surgical diseases, 12 ENT diseases, 11 gynecological diseases and 11 dermatological diseases, and 4 pediatric diseases, which is the least. Meanwhile, according to the rule of "Efficacy acupuncture grading disease spectrum", the diseases treated with this therapy were graded preliminarily into grade Ⅰ with 26 diseases, grade Ⅱ with 30 diseases, and grade Ⅲ with 8 diseases. Conclusion Acupoint catgut-embedding therapy can be used widely in clinical treatment with much broader disease spectrum, and it is worthy of being spread and applied.展开更多
Keyword search is an alternative for structured languages in querying graph-structured data.A result to a keyword query is a connected structure covering all or part of the queried keywords.The textual coverage and st...Keyword search is an alternative for structured languages in querying graph-structured data.A result to a keyword query is a connected structure covering all or part of the queried keywords.The textual coverage and structural compactness have been known as the two main properties of a relevant result to a keyword query.Many previous works examined these properties after retrieving all of the candidate results using a ranking function in a comparative manner.However,this needs a time-consuming search process,which is not appropriate for an interactive system in which the user expects results in the least possible time.This problem has been addressed in recent works by confining the shape of results to examine their coverage and compactness during the search.However,these methods still suffer from the existence of redundant nodes in the retrieved results.In this paper,we introduce the semantic of minimal covered r-clique(MCCr)for the results of a keyword query as an extended model of existing definitions.We propose some efficient algorithms to detect the MCCrs of a given query.These algorithms can retrieve a comprehensive set of non-duplicate MCCrs in response to a keyword query.In addition,these algorithms can be executed in a distributive manner,which makes them outstanding in the field of keyword search.We also propose the approximate versions of these algorithms to retrieve the top-k approximate MCCrs in a polynomial delay.It is proved that the approximate algorithms can retrieve results in two-approximation.Extensive experiments on two real-world datasets confirm the efficiency and effectiveness of the proposed algorithms.展开更多
文摘A partition of intervals method is adopted in current classification based on associations (CBA), but this method cannot reflect the actual distribution of data and exists the problem of sharp boundary problem. The classification system based on the longest association rules with linguistic terms is discussed, and the shortcoming of this classification system is analyzed. Then, the classification system based on the short association rules with linguistic terms is presented. The example shows that the accuracy of the classification system based on the association rules with linguistic terms is better than two popular classification methods: C4.5 and CBA.
基金Project(70572090) supported by the National Natural Science Foundation of China
文摘In order to construct the data mining frame for the generic project risk research, the basic definitions of the generic project risk element were given, and then a new model of the generic project risk element was presented with the definitions. From the model, data mining method was used to acquire the risk transmission matrix from the historical databases analysis. The quantitative calculation problem among the generic project risk elements was solved. This method deals with well the risk element transmission problems with limited states. And in order to get the limited states, fuzzy theory was used to discrete the historical data in historical databases. In an example, the controlling risk degree is chosen as P(Rs≥2) ≤0.1, it means that the probability of risk state which is not less than 2 in project is not more than 0.1, the risk element R3 is chosen to control the project, respectively. The result shows that three risk element transmission matrix can be acquired in 4 risk elements, and the frequency histogram and cumulative frequency histogram of each risk element are also given.
文摘Current technology for frequent itemset mining mostly applies to the data stored in a single transaction database. This paper presents a novel algorithm MultiClose for frequent itemset mining in data warehouses. MultiClose respectively computes the results in single dimension tables and merges the results with a very efficient approach. Close itemsets technique is used to improve the performance of the algorithm. The authors propose an efficient implementation for star schemas in which their al- gorithm outperforms state-of-the-art single-table algorithms.
文摘HA (hashing array), a new algorithm, for mining frequent itemsets of large database is proposed. It employs a structure hash array, ltemArray ( ) to store the information of database and then uses it instead of database in later iteration. By this improvement, only twice scanning of the whole database is necessary, thereby the computational cost can be reduced significantly. To overcome the performance bottleneck of frequent 2-itemsets mining, a modified algorithm of HA, DHA (directaddressing hashing and array) is proposed, which combines HA with direct-addressing hashing technique. The new hybrid algorithm, DHA, not only overcomes the performance bottleneck but also inherits the advantages of HA. Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm, and the results prove the new algorithm is more efficient and reasonable.
文摘In this paper, it described the architecture of a tool called DiagData. This tool aims to use a large amount of data and information in the field of plant disease diagnostic to generate a disease predictive system. In this approach, techniques of data mining are used to extract knowledge from existing data. The data is extracted in the form of rules that are used in the development of a predictive intelligent system. Currently, the specification of these rules is built by an expert or data mining. When data mining on a large database is used, the number of generated rules is very complex too. The main goal of this work is minimize the rule generation time. The proposed tool, called DiagData, extracts knowledge automatically or semi-automatically from a database and uses it to build an intelligent system for disease prediction. In this work, the decision tree learning algorithm was used to generate the rules. A toolbox called Fuzzygen was used to generate a prediction system from rules generated by decision tree algorithm. The language used to implement this software was Java. The DiagData has been used in diseases prediction and diagnosis systems and in the validation of economic and environmental indicators in agricultural production systems. The validation process involved measurements and comparisons of the time spent to enter the rules by an expert with the time used to insert the same rules with the proposed tool. Thus, the tool was successfully validated, providing a reduction of time.
文摘In this research article, we analyze the multimedia data mining and classification algorithm based on database optimization techniques. Of high performance application requirements of various kinds are springing up constantly makes parallel computer system structure is valued by more and more common but the corresponding software system development lags far behind the development of the hardware system, it is more obvious in the field of database technology application. Multimedia mining is different from the low level of computer multimedia processing technology and the former focuses on the extracted from huge multimedia collection mode which focused on specific features of understanding or extraction from a single multimedia objects. Our research provides new paradigm for the methodology which will be meaningful and necessary.
文摘This paper describes in detail the web data mining technology, analyzes the relationship between the data on the web site to the tourism electronic commerce (including the server log, tourism commodity database, user database, the shopping cart), access to relevant user preference information for tourism commodity. Based on these models, the paper presents recommended strategies for the site registered users, and has had the corresponding formulas for calculating the current user of certain items recommended values and the corresponding recommendation algorithm, and the system can get a recommendation for user.
基金Project(51275362)supported by the National Natural Science Foundation of ChinaProject(2013M542055)supported by China Postdoctoral Science Foundation Funded
文摘Modular technology can effectively support the rapid design of products, and it is one of the key technologies to realize mass customization design. With the application of product lifecycle management(PLM) system in enterprises, the product lifecycle data have been effectively managed. However, these data have not been fully utilized in module division, especially for complex machinery products. To solve this problem, a product module mining method for the PLM database is proposed to improve the effect of module division. Firstly, product data are extracted from the PLM database by data extraction algorithm. Then, data normalization and structure logical inspection are used to preprocess the extracted defective data. The preprocessed product data are analyzed and expressed in a matrix for module mining. Finally, the fuzzy c-means clustering(FCM) algorithm is used to generate product modules, which are stored in product module library after module marking and post-processing. The feasibility and effectiveness of the proposed method are verified by a case study of high pressure valve.
基金Supported by National Natural Science Foundation of China: No.81072883
文摘Objective To establish a warehouse on acupuncture-moxibution (acup-mox) methods to explore valuable laws about research and clinical application of acup-mox in a great number of literature by use of data mining technique and to promote acup-mox research and effective treatment of diseases. Methods According to the acup-mox literature information of different types, different subjects of the aeup-mox literature are determined and the relevant database is established. In the continuously enriched subject database, the data warehouse catering to multi-subjects and multi-dimensions is set up so as to provide a platform for wider application of aeup-mox literature information. Results Based on characteristics of the acup-mox literature, many subject databases, such as needling with filiform needle, moxibustion, etc., are established and clinical treatment laws of acup-mox are revealed by use of data mining method in the database established. Conclusion Establishment of the acup-mox literature warehouse provides a standard data expression model, rich attributes and relation between different literature information for study of aeup-mox literature by more effective techniques, and a rich and standard data basis for acup-mox researches.
基金Supported by Na onal Natural Science Founda on:81072883, 81173342University Student Innova on Instruc on Project of Hebei Medical University:2010
文摘Objective To explore the clinically indicated diseases of acupoint catgut-embedding therapy, and summarize and analyze the disease spectrum of acupoint catgut-embedding therapy. Methods By literature research and data mining technique, the clinical study papers relevant to acupoint catgut-embedding therapy published from 1971 to June 2011 were selected, entered and verified, then the effective information were extracted, and finally, the disease spectrum were summarized. Results Acupoint catgut-embedding therapy is indicated for 103 diseases, involving 6 departments, of which there are 50 internal diseases, which is the most, accounting to 48.54%, and 15 surgical diseases, 12 ENT diseases, 11 gynecological diseases and 11 dermatological diseases, and 4 pediatric diseases, which is the least. Meanwhile, according to the rule of "Efficacy acupuncture grading disease spectrum", the diseases treated with this therapy were graded preliminarily into grade Ⅰ with 26 diseases, grade Ⅱ with 30 diseases, and grade Ⅲ with 8 diseases. Conclusion Acupoint catgut-embedding therapy can be used widely in clinical treatment with much broader disease spectrum, and it is worthy of being spread and applied.
文摘Keyword search is an alternative for structured languages in querying graph-structured data.A result to a keyword query is a connected structure covering all or part of the queried keywords.The textual coverage and structural compactness have been known as the two main properties of a relevant result to a keyword query.Many previous works examined these properties after retrieving all of the candidate results using a ranking function in a comparative manner.However,this needs a time-consuming search process,which is not appropriate for an interactive system in which the user expects results in the least possible time.This problem has been addressed in recent works by confining the shape of results to examine their coverage and compactness during the search.However,these methods still suffer from the existence of redundant nodes in the retrieved results.In this paper,we introduce the semantic of minimal covered r-clique(MCCr)for the results of a keyword query as an extended model of existing definitions.We propose some efficient algorithms to detect the MCCrs of a given query.These algorithms can retrieve a comprehensive set of non-duplicate MCCrs in response to a keyword query.In addition,these algorithms can be executed in a distributive manner,which makes them outstanding in the field of keyword search.We also propose the approximate versions of these algorithms to retrieve the top-k approximate MCCrs in a polynomial delay.It is proved that the approximate algorithms can retrieve results in two-approximation.Extensive experiments on two real-world datasets confirm the efficiency and effectiveness of the proposed algorithms.