An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic rela...An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic relativity of ontology concepts is used to describe complicated relationships of domains in the method.Candidate item sets with less semantic relativity are filtered to reduce the number of candidate item sets in association rules mining.An ontology hierarchy relationship is regarded as a directed acyclic graph rather than a hierarchy tree in the semantic relativity computation.Not only direct hierarchy relationships,but also non-direct hierarchy relationships and other typical semantic relationships are taken into account.Experimental results show that the proposed method can reduce the number of candidate item sets effectively and improve the efficiency of association rules mining.展开更多
In this paper, we propose an efficient algorithm, called FFP-Growth (shortfor fast FP-Growth) , to mine frequent itemsets. Similar to FP-Growth, FFP-Growth searches theFP-tree in the bottom-up order, but need not cons...In this paper, we propose an efficient algorithm, called FFP-Growth (shortfor fast FP-Growth) , to mine frequent itemsets. Similar to FP-Growth, FFP-Growth searches theFP-tree in the bottom-up order, but need not construct conditional pattern bases and sub-FP-trees,thus, saving a substantial amount of time and space, and the FP-tree created by it is much smallerthan that created by TD-FP-Growth, hence improving efficiency. At the same time, FFP-Growth can beeasily extended for reducing the search space as TD-FP-Growth (M) and TD-FP-Growth (C). Experimentalresults show that the algorithm of this paper is effective and efficient.展开更多
A partition of intervals method is adopted in current classification based on associations (CBA), but this method cannot reflect the actual distribution of data and exists the problem of sharp boundary problem. The cl...A partition of intervals method is adopted in current classification based on associations (CBA), but this method cannot reflect the actual distribution of data and exists the problem of sharp boundary problem. The classification system based on the longest association rules with linguistic terms is discussed, and the shortcoming of this classification system is analyzed. Then, the classification system based on the short association rules with linguistic terms is presented. The example shows that the accuracy of the classification system based on the association rules with linguistic terms is better than two popular classification methods: C4.5 and CBA.展开更多
Objective To analyze the basic characteristics,drug features,prescription rules,and drug-symptom relationships of patients in the splenic deficiency and impairment stage,by data mining of medical records under the New...Objective To analyze the basic characteristics,drug features,prescription rules,and drug-symptom relationships of patients in the splenic deficiency and impairment stage,by data mining of medical records under the New Theory on Spleen Dampness Syndrome(Pi Dan Xin Lun,《脾瘅新论》).Methods Medical records listed in the“New Theory on Spleen Dampness Syndrome-Under-standing and Treatment of Metabolic Syndrome from the Perspective of Traditional Chinese Medicine”,and which were diagnosed with the spleen dampness syndrome at the splenic de-ficiency and impairment stage,during January 2004 and December 2016 were selected.These patients’data,including basic information,clinical symptoms,laboratory examination res-ults,traditional Chinese medicine(TCM)and western medicine diagnoses,treatment meth-ods,prescriptions,etc.,were collected.The collected data were subsequently compiled into a medical record database using the Epidata 3.1 data management software,followed by the use of Apriori algorithm provided in the SPSS Modeler 14.2 statistical software to investigate the association rules between drug-drug,drug-symptom,and drug-western medicine indices.Results(i)A total of 51 medical records were included,involving 17 types of syndromes.Among them,the top three with frequency≥3 included“Phlegm and blood stasis,and thoracic obstruction”“Deficiency-weakness of the spleen Qi,and static blood blocking collat-erals”,and“Deficiency-weakness of the spleen Qi,and static blood blocking collaterals”.Al-ternatively,of the 14 treatment methods,the top three treatments with frequency of≥3 in-cluded“Activating Yang and eliminating turbidity,and removing phlegm and dredging chan-nel blockage”“Strengthening the spleen and benefiting Qi,and eliminating phlegm to activ-ate the channels”,and“Warming Yang and benefiting Qi,and expelling cold to remove ob-structions”.Among the 15 prescriptions,the top three used with frequency≥3 included Huangqi Guizhi Wuwu Tang(黄芪桂枝五物汤),Gualou Xiebai Banxia Tang(瓜蒌薤白半夏汤),and Ganjiang Huangqin Huanglian Renshen Tang(干姜黄芩黄连人参汤).Lastly,of the 83 drugs used for a total of 476 times,those with frequency≥15 included Huanglian(Coptid-is Rhizoma),Huangqi(Astragali Radix),Jiudahuang(Wine-processed Rhei Radix et Rhizoma),Jixueteng(Spatholobi Caulis),Shengjiang(Zingiberis Rhizoma Recens),Huangqin(Scutellariae Radix),and Guizhi(Cinnamomi Ramulus).(ii)For the drug-drug associations,under the criteria of support≥15%and confidence=100%,seven second-order association rules,seven third-order rules,and six fourth-order roles were identified.The top-ranking rule of each was“Huangqin(Scutellariae Radix)→Huanglian(Coptidis Rhizoma)”“Ganjiang(Zingiberis Rhizoma)+Huangqin(Scutellariae Radix)→Huanglian(Coptidis Rhizoma)”,and“Baishao(Paeoniae Radix Alba)+Guizhi(Cinnamomi Ramulus)+Jixueteng(Spatho-lobi Caulis)→Huangqin(Scutellariae Radix)”,respectively.Alternatively,the drug-symptom associations were analyzed under the criteria of support≥5%and confidence=100%,which derived eight second-order association rules,31 third-order rules,and 30 fourth-order rules.The top-ranking association rule of each order was“Huangqi(Astragali Radix)→Limb ed-ema”“Guizhi(Cinnamomi Ramulus)+Jixueteng(Spatholobi Caulis)→Limb numbness and pain”,and“Guizhi(Cinnamomi Ramulus)+Jixueteng(Spatholobi Caulis)+Huangqi(As-tragali Radix)→Limb numbness and pain”,respectively.Similarly,the drug-western medi-cine index associations were investigated under the criteria of support≥5%and confidence=100%,and five second-order association rules,16 third-order rules,and 16 fourth-order rules were identified.In this category,the top-ranking association rule of each order was“Qinpi(Fraxini Cortex)→Uric acid”“Huanglian(Coptidis Rhizoma)+Ganjiang(Zingiberis Rhizoma)→Glycated hemoglobin”,and“Huanglian(Coptidis Rhizoma)+Ganjiang(Zing-iberis Rhizoma)+Huangqin(Scutellariae Radix)→Glycated hemoglobin”,respectively.Conclusion Through association rule mining,this study objectively and quantitatively demonstrated the drug-drug,drug-symptom,and drug-physicochemical index associations of patients with the spleen dampness syndrome at the splenic deficiency and impairment stage treated by Academician TONG Xiaolin.The results indicated that treatment for these patients adopted the“state-target”syndrome differentiation method.The drug combination was characterized by“small prescriptions”,targeting both the patient’s symptoms and signs(syndrome target)and western medicine indices(treatment target).This study could provide references for future research on the academic thoughts and medical experience of Academi-cian TONG Xiaolin.展开更多
Sensors are ubiquitous in the Internet of Things for measuring and collecting data. Analyzing these data derived from sensors is an essential task and can reveal useful latent information besides the data. Since the I...Sensors are ubiquitous in the Internet of Things for measuring and collecting data. Analyzing these data derived from sensors is an essential task and can reveal useful latent information besides the data. Since the Internet of Things contains many sorts of sensors, the measurement data collected by these sensors are multi-type data, sometimes contai- ning temporal series information. If we separately deal with different sorts of data, we will miss useful information. This paper proposes a method to dis- cover the correlation in multi-faceted data, which contains many types of data with temporal informa- tion, and our method can simultaneously deal with multi-faceted data. We transform high-dimensional multi-faeeted data into lower-dimensional data which is set as multivariate Gaussian Graphical Models, then mine the correlation in multi-faceted data by discover the structure of the multivariate Gausslan Graphical Models. With a real data set, we verifies our method, and the experiment demonstrates that the method we propose can correctly fred out the correlation among multi-faceted meas- urement data.展开更多
Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results conta...Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.展开更多
HA (hashing array), a new algorithm, for mining frequent itemsets of large database is proposed. It employs a structure hash array, ltemArray ( ) to store the information of database and then uses it instead of da...HA (hashing array), a new algorithm, for mining frequent itemsets of large database is proposed. It employs a structure hash array, ltemArray ( ) to store the information of database and then uses it instead of database in later iteration. By this improvement, only twice scanning of the whole database is necessary, thereby the computational cost can be reduced significantly. To overcome the performance bottleneck of frequent 2-itemsets mining, a modified algorithm of HA, DHA (directaddressing hashing and array) is proposed, which combines HA with direct-addressing hashing technique. The new hybrid algorithm, DHA, not only overcomes the performance bottleneck but also inherits the advantages of HA. Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm, and the results prove the new algorithm is more efficient and reasonable.展开更多
In this paper, a novel data mining method is introduced to solve the multi-objective optimization problems of process industry. A hyperrectangle association rule mining (HARM) algorithm based on support vector machi...In this paper, a novel data mining method is introduced to solve the multi-objective optimization problems of process industry. A hyperrectangle association rule mining (HARM) algorithm based on support vector machines (SVMs) is proposed. Hyperrectangles rules are constructed on the base of prototypes and support vectors (SVs) under some heuristic limitations. The proposed algorithm is applied to a simulated moving bed (SMB) paraxylene (PX) adsorption process. The relationships between the key process variables and some objective variables such as purity, recovery rate of PX are obtained. Using existing domain knowledge about PX adsorption process, most of the obtained association rules can be explained.展开更多
The rapid development of Internet of Things imposes new requirements on the data mining system, due to the weak capability of traditional distributed networking data mining. To meet the needs of the Internet of Things...The rapid development of Internet of Things imposes new requirements on the data mining system, due to the weak capability of traditional distributed networking data mining. To meet the needs of the Internet of Things, this paper proposes a novel distributed data-mining model to realize the seamless access between cloud computing and distributed data mining. The model is based on the cloud computing architecture, which belongs to the type of incredible nodes.展开更多
In order to improve the efficiency of learning the triangular membership functions( TMFs) for mining fuzzy association rule( FAR) in dynamic database,a single-pass fuzzy c means( SPFCM)algorithm is combined with the r...In order to improve the efficiency of learning the triangular membership functions( TMFs) for mining fuzzy association rule( FAR) in dynamic database,a single-pass fuzzy c means( SPFCM)algorithm is combined with the real-coded CHC genetic model to incrementally learn the TMFs. The cluster centers resulting from SPFCM are regarded as the midpoint of TMFs. The population of CHC is generated randomly according to the cluster center and constraint conditions among TMFs. Then a new population for incremental learning is composed of the excellent chromosomes stored in the first genetic process and the chromosomes generated based on the cluster center adjusted by SPFCM. The experiments on real datasets show that the number of generations converging to the solution of the proposed approach is less than that of the existing batch learning approach. The quality of TMFs generated by the approach is comparable to that of the batch learning approach. Compared with the existing incremental learning strategy,the proposed approach is superior in terms of the quality of TMFs and time cost.展开更多
The similarity search is one of the fundamental components in time series data mining,e.g.clustering,classification,association rules mining.Many methods have been proposed to measure the similarity between time serie...The similarity search is one of the fundamental components in time series data mining,e.g.clustering,classification,association rules mining.Many methods have been proposed to measure the similarity between time series,including Euclidean distance,Manhattan distance,and dynamic time warping(DTW).In contrast,DTW has been suggested to allow more robust similarity measure and be able to find the optimal alignment in time series.However,due to its quadratic time and space complexity,DTW is not suitable for large time series datasets.Many improving algorithms have been proposed for DTW search in large databases,such as approximate search or exact indexed search.Unlike the previous modified algorithm,this paper presents a novel parallel scheme for fast similarity search based on DTW,which is called MRDTW(MapRedcuebased DTW).The experimental results show that our approach not only retained the original accuracy as DTW,but also greatly improved the efficiency of similarity measure in large time series.展开更多
Mining association rules from large database is very costly. We develop a parallel algorithm for this task on shared-memory multiprocessor (SMP). Most proposed parallel algorithms for association rules mining have to ...Mining association rules from large database is very costly. We develop a parallel algorithm for this task on shared-memory multiprocessor (SMP). Most proposed parallel algorithms for association rules mining have to scan the database at least two times. In this article, a parallel algorithm Scan Once (SO) has been proposed for SMP, which only scans the database once. And this algorithm is fundamentally different from the known parallel algorithm Count Distribution (CD). It adopts bit matrix to store the database information and gets the support of the frequent itemsets by adopting Vector-And-Operation, which greatly improve the efficiency of generating all frequent itemsets. Empirical evaluation shows that the algorithm outperforms the known one CD algorithm.展开更多
This paper uses the extension theory of knowledge, probes into the problems of students employment of College of computer science, puts forward to the solving method,specific and provides corresponding strategies. At ...This paper uses the extension theory of knowledge, probes into the problems of students employment of College of computer science, puts forward to the solving method,specific and provides corresponding strategies. At the same time, it carries on the appraisal to provide strategy, put forward to optimal strategies; it uses of baseing on extension data mining and mining association rules of the corresponding and finding the meaning relations existing in enterprise recruitment,展开更多
Objective Based on intra-set correlation analysis, this paper deconstructs the clinical medical records of traditional Chinese medicine(TCM) Master ZHOU Zhongying in treating thyroid cancer, and analyzes the experienc...Objective Based on intra-set correlation analysis, this paper deconstructs the clinical medical records of traditional Chinese medicine(TCM) Master ZHOU Zhongying in treating thyroid cancer, and analyzes the experience in “mechanism-syndrome-medicine-prescription” for thyroid cancer.Methods Through Medcase data processing platform, based on Frequent Pattern(FP)-Growth enhanced correlation analysis algorithm, the medical records of Professor ZHOU Zhongying for the treatment of thyroid cancer from June 1, 2001 to February 28, 2015 were analyzed within the set.Results This study involved 43 medical records, 43 patients, and 167 visits. After processing intra-set correlations, 28 groups of highly correlated symptoms, 21 groups of highly correlated tongue images, 10 groups of highly correlated pulse conditions, 28 groups of highly correlated pathogenesis, 34 groups of highly correlated herbs, and 26 groups of highly correlated western medicine diagnosis were selected. Professor ZHOU Zhongying treats thyroid cancer according to syndrome differentiation. Symptoms with more association rules included neck swelling, neck pain, cough, and dry mouth;tongue images with more association rules included dark purple tongue, dark red tongue, and fissured tongue;pulse conditions with more association rules were wiry pulse, thready pulse, small pulse, and slippery pulse;the pathogenesis with more association rules was phlegm and blood stasis, damp-heat accumulation,and impairment of both Qi and Yin;herbs with more association rules were Chaihu(Bupleuri Radix), Zeqi(Sun Euphoribiae Herb), and Tiandong(Asparagi Radix);western medicine diagnosis with more association rules included thyroid cancer, insomnia, and chronic gastritis.Conclusion Thyroid cancer mostly presents as deficiency in origin and excess in manifestations. The basic pathogenesis is phlegm and blood stasis, damp-heat accumulation, and impairment of both Qi and Yin, which are closely related to liver, kidney, and spleen. Professor ZHOU Zhongying adopts both attack and supplement approaches as the general treatment principle, with a strong emphasis on regulating Qi and relieving depression, eliminating phlegm and resolving stagnation, eliminating dampness and turbidity, clearing fire and destroying poison, moistening dryness and softening hard mass, invigorating Qi and nourishing Yin, and paying attention to nourishing liver and kidney, invigorating spleen and stomach,while protecting the heart and lungs.展开更多
Due to the rapid development,Internet has become the main field for brand building.Under this circumstance,the image of the brand is always consistent with the consumers' perception.Therefore,this study uses the m...Due to the rapid development,Internet has become the main field for brand building.Under this circumstance,the image of the brand is always consistent with the consumers' perception.Therefore,this study uses the method of text mining of search engine to explore the categories of brand archetype based on Brand Personality Theory from the perspective of Internet.The results find that 12 brand archetypes,including caregiver,sage,hero,innocent,dominator,creator,vitality,explorer,stylish woman,lover,cooperator,and vogue gentleman,have a high degree explanation.Deeper study uses case study to verify the reasonability and effectiveness of the classification standard.展开更多
This paper describes the influence of joint spacing and joint orientation on the penetration rate of a Tunnel Boring Machine (TBM) disc cutter as modeled by the Discrete Element Method (DEM). The input data for th...This paper describes the influence of joint spacing and joint orientation on the penetration rate of a Tunnel Boring Machine (TBM) disc cutter as modeled by the Discrete Element Method (DEM). The input data for the siLmulations were obtained from the sandstone along the AIborz tunnel that is currently being excavated in Iran using a 5.2 m diameter open TBM. Three joint spacings, 150, 200, and 300 mm, were modeled together with seven values of joint orientation; 0°, 15°, 30°, 45°, 60°, 75°, and 90°. The results show that the penetration increases when joint orientation increases from 0° to 75°, but it decreases as the joint orientation increases further from 75° to 90°. This is true for each joint spacing. In addition, for a given joint orientation increasing the joint spacing causes the TBM penetration to decrease. The optimum joint orientation, from the viewpoint of TBM penetration, is about 60-75°.展开更多
Objective to analyze the rule of drug use in treating lung cancer disease by using the theory of fuzheng and dispelling evil in traditional Chinese medicine by data mining.Methods:By following Dr.Qianjinghua's out...Objective to analyze the rule of drug use in treating lung cancer disease by using the theory of fuzheng and dispelling evil in traditional Chinese medicine by data mining.Methods:By following Dr.Qianjinghua's outpatient department,we collected the prescription for the treatment of lung cancer by using frequency analysis,association rule analysis and cluster analysis.Results:746 prescriptions were included in this study,commonly used drugs 170 flavors.Traditional Chinese medicine with a high frequency of use were Baihuasheshecao(Hedyotis diffusa),Maorenshen(Actinidia valvata Dunn),Shishangbai(Selaginella doederleinii Hieron),Shijianchuan(Salvia chinensis Benth),Huangqi(Astragali Radix),Sanyeqing(Tetrastlgma hemsleyanum),Baimaoteng(HerbaSolani),Zhuling(Polyporus),Nvzhenzi(Ligustrum lucidum Ait)etc.Through the analysis of association rule,we find out the core prescription according to the support degree(10%,20%,30%)and the confidence degree,and set the correlation degree to 8 and punishment degree to 2.Finally:The core prescription efficacy of the drug group was analyzed and the theoretical connotation of preventive treatment of disease was obtained.Conclusion:The core prescription of preventive treatment of disease is to clear up heat and detoxification on the basis of strengthening the spleen,replenishing qi and tonifying kidney.The use of blood supplements and yin tonics throughout the course is reduced during treatments.The treatments also pay attention to the function of spleen and stomach of lung cancer patients,and adjust heat and detoxification force according to different conditions of patients and pay attention to the relief of patients’emotions.展开更多
Objective To establish a warehouse on acupuncture-moxibution (acup-mox) methods to explore valuable laws about research and clinical application of acup-mox in a great number of literature by use of data mining tech...Objective To establish a warehouse on acupuncture-moxibution (acup-mox) methods to explore valuable laws about research and clinical application of acup-mox in a great number of literature by use of data mining technique and to promote acup-mox research and effective treatment of diseases. Methods According to the acup-mox literature information of different types, different subjects of the aeup-mox literature are determined and the relevant database is established. In the continuously enriched subject database, the data warehouse catering to multi-subjects and multi-dimensions is set up so as to provide a platform for wider application of aeup-mox literature information. Results Based on characteristics of the acup-mox literature, many subject databases, such as needling with filiform needle, moxibustion, etc., are established and clinical treatment laws of acup-mox are revealed by use of data mining method in the database established. Conclusion Establishment of the acup-mox literature warehouse provides a standard data expression model, rich attributes and relation between different literature information for study of aeup-mox literature by more effective techniques, and a rich and standard data basis for acup-mox researches.展开更多
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Science and Technology Fund of China University of Mining and Technology(No.2007B016)
文摘An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic relativity of ontology concepts is used to describe complicated relationships of domains in the method.Candidate item sets with less semantic relativity are filtered to reduce the number of candidate item sets in association rules mining.An ontology hierarchy relationship is regarded as a directed acyclic graph rather than a hierarchy tree in the semantic relativity computation.Not only direct hierarchy relationships,but also non-direct hierarchy relationships and other typical semantic relationships are taken into account.Experimental results show that the proposed method can reduce the number of candidate item sets effectively and improve the efficiency of association rules mining.
文摘In this paper, we propose an efficient algorithm, called FFP-Growth (shortfor fast FP-Growth) , to mine frequent itemsets. Similar to FP-Growth, FFP-Growth searches theFP-tree in the bottom-up order, but need not construct conditional pattern bases and sub-FP-trees,thus, saving a substantial amount of time and space, and the FP-tree created by it is much smallerthan that created by TD-FP-Growth, hence improving efficiency. At the same time, FFP-Growth can beeasily extended for reducing the search space as TD-FP-Growth (M) and TD-FP-Growth (C). Experimentalresults show that the algorithm of this paper is effective and efficient.
文摘A partition of intervals method is adopted in current classification based on associations (CBA), but this method cannot reflect the actual distribution of data and exists the problem of sharp boundary problem. The classification system based on the longest association rules with linguistic terms is discussed, and the shortcoming of this classification system is analyzed. Then, the classification system based on the short association rules with linguistic terms is presented. The example shows that the accuracy of the classification system based on the association rules with linguistic terms is better than two popular classification methods: C4.5 and CBA.
基金The Construction of First-class Integrated Traditional Chinese and western Medicine Disciplines in Guangxi(Scientific Research Project No.12 of Guangxi Ministry of Education[2018])Qihuang High-level Talent Team Training Projects of Guangxi University of Chinese Medicine−Application of Systems Biology in Chinese Medicine Research(2021005).
文摘Objective To analyze the basic characteristics,drug features,prescription rules,and drug-symptom relationships of patients in the splenic deficiency and impairment stage,by data mining of medical records under the New Theory on Spleen Dampness Syndrome(Pi Dan Xin Lun,《脾瘅新论》).Methods Medical records listed in the“New Theory on Spleen Dampness Syndrome-Under-standing and Treatment of Metabolic Syndrome from the Perspective of Traditional Chinese Medicine”,and which were diagnosed with the spleen dampness syndrome at the splenic de-ficiency and impairment stage,during January 2004 and December 2016 were selected.These patients’data,including basic information,clinical symptoms,laboratory examination res-ults,traditional Chinese medicine(TCM)and western medicine diagnoses,treatment meth-ods,prescriptions,etc.,were collected.The collected data were subsequently compiled into a medical record database using the Epidata 3.1 data management software,followed by the use of Apriori algorithm provided in the SPSS Modeler 14.2 statistical software to investigate the association rules between drug-drug,drug-symptom,and drug-western medicine indices.Results(i)A total of 51 medical records were included,involving 17 types of syndromes.Among them,the top three with frequency≥3 included“Phlegm and blood stasis,and thoracic obstruction”“Deficiency-weakness of the spleen Qi,and static blood blocking collat-erals”,and“Deficiency-weakness of the spleen Qi,and static blood blocking collaterals”.Al-ternatively,of the 14 treatment methods,the top three treatments with frequency of≥3 in-cluded“Activating Yang and eliminating turbidity,and removing phlegm and dredging chan-nel blockage”“Strengthening the spleen and benefiting Qi,and eliminating phlegm to activ-ate the channels”,and“Warming Yang and benefiting Qi,and expelling cold to remove ob-structions”.Among the 15 prescriptions,the top three used with frequency≥3 included Huangqi Guizhi Wuwu Tang(黄芪桂枝五物汤),Gualou Xiebai Banxia Tang(瓜蒌薤白半夏汤),and Ganjiang Huangqin Huanglian Renshen Tang(干姜黄芩黄连人参汤).Lastly,of the 83 drugs used for a total of 476 times,those with frequency≥15 included Huanglian(Coptid-is Rhizoma),Huangqi(Astragali Radix),Jiudahuang(Wine-processed Rhei Radix et Rhizoma),Jixueteng(Spatholobi Caulis),Shengjiang(Zingiberis Rhizoma Recens),Huangqin(Scutellariae Radix),and Guizhi(Cinnamomi Ramulus).(ii)For the drug-drug associations,under the criteria of support≥15%and confidence=100%,seven second-order association rules,seven third-order rules,and six fourth-order roles were identified.The top-ranking rule of each was“Huangqin(Scutellariae Radix)→Huanglian(Coptidis Rhizoma)”“Ganjiang(Zingiberis Rhizoma)+Huangqin(Scutellariae Radix)→Huanglian(Coptidis Rhizoma)”,and“Baishao(Paeoniae Radix Alba)+Guizhi(Cinnamomi Ramulus)+Jixueteng(Spatho-lobi Caulis)→Huangqin(Scutellariae Radix)”,respectively.Alternatively,the drug-symptom associations were analyzed under the criteria of support≥5%and confidence=100%,which derived eight second-order association rules,31 third-order rules,and 30 fourth-order rules.The top-ranking association rule of each order was“Huangqi(Astragali Radix)→Limb ed-ema”“Guizhi(Cinnamomi Ramulus)+Jixueteng(Spatholobi Caulis)→Limb numbness and pain”,and“Guizhi(Cinnamomi Ramulus)+Jixueteng(Spatholobi Caulis)+Huangqi(As-tragali Radix)→Limb numbness and pain”,respectively.Similarly,the drug-western medi-cine index associations were investigated under the criteria of support≥5%and confidence=100%,and five second-order association rules,16 third-order rules,and 16 fourth-order rules were identified.In this category,the top-ranking association rule of each order was“Qinpi(Fraxini Cortex)→Uric acid”“Huanglian(Coptidis Rhizoma)+Ganjiang(Zingiberis Rhizoma)→Glycated hemoglobin”,and“Huanglian(Coptidis Rhizoma)+Ganjiang(Zing-iberis Rhizoma)+Huangqin(Scutellariae Radix)→Glycated hemoglobin”,respectively.Conclusion Through association rule mining,this study objectively and quantitatively demonstrated the drug-drug,drug-symptom,and drug-physicochemical index associations of patients with the spleen dampness syndrome at the splenic deficiency and impairment stage treated by Academician TONG Xiaolin.The results indicated that treatment for these patients adopted the“state-target”syndrome differentiation method.The drug combination was characterized by“small prescriptions”,targeting both the patient’s symptoms and signs(syndrome target)and western medicine indices(treatment target).This study could provide references for future research on the academic thoughts and medical experience of Academi-cian TONG Xiaolin.
基金the Project"The Basic Research on Internet of Things Architecture"supported by National Key Basic Research Program of China(No.2011CB302704)supported by National Natural Science Foundation of China(No.60802034)+2 种基金Specialized Research Fund for the Doctoral Program of Higher Education(No.20070013026)Beijing Nova Program(No.2008B50)"New generation broadband wireless mobile communication network"Key Projects for Science and Technology Development(No.2011ZX03002-002-01)
文摘Sensors are ubiquitous in the Internet of Things for measuring and collecting data. Analyzing these data derived from sensors is an essential task and can reveal useful latent information besides the data. Since the Internet of Things contains many sorts of sensors, the measurement data collected by these sensors are multi-type data, sometimes contai- ning temporal series information. If we separately deal with different sorts of data, we will miss useful information. This paper proposes a method to dis- cover the correlation in multi-faceted data, which contains many types of data with temporal informa- tion, and our method can simultaneously deal with multi-faceted data. We transform high-dimensional multi-faeeted data into lower-dimensional data which is set as multivariate Gaussian Graphical Models, then mine the correlation in multi-faceted data by discover the structure of the multivariate Gausslan Graphical Models. With a real data set, we verifies our method, and the experiment demonstrates that the method we propose can correctly fred out the correlation among multi-faceted meas- urement data.
基金Under the auspices of Special Fund of Ministry of Land and Resources of China in Public Interest(No.201511001)
文摘Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.
文摘HA (hashing array), a new algorithm, for mining frequent itemsets of large database is proposed. It employs a structure hash array, ltemArray ( ) to store the information of database and then uses it instead of database in later iteration. By this improvement, only twice scanning of the whole database is necessary, thereby the computational cost can be reduced significantly. To overcome the performance bottleneck of frequent 2-itemsets mining, a modified algorithm of HA, DHA (directaddressing hashing and array) is proposed, which combines HA with direct-addressing hashing technique. The new hybrid algorithm, DHA, not only overcomes the performance bottleneck but also inherits the advantages of HA. Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm, and the results prove the new algorithm is more efficient and reasonable.
基金Supported by the National Natural Science Foundation of China (No. 60421002)National Outstanding Youth Science Foundation of China (No. 60025308)the New Century 151 Talent Project of Zhejiang Province.
文摘In this paper, a novel data mining method is introduced to solve the multi-objective optimization problems of process industry. A hyperrectangle association rule mining (HARM) algorithm based on support vector machines (SVMs) is proposed. Hyperrectangles rules are constructed on the base of prototypes and support vectors (SVs) under some heuristic limitations. The proposed algorithm is applied to a simulated moving bed (SMB) paraxylene (PX) adsorption process. The relationships between the key process variables and some objective variables such as purity, recovery rate of PX are obtained. Using existing domain knowledge about PX adsorption process, most of the obtained association rules can be explained.
文摘The rapid development of Internet of Things imposes new requirements on the data mining system, due to the weak capability of traditional distributed networking data mining. To meet the needs of the Internet of Things, this paper proposes a novel distributed data-mining model to realize the seamless access between cloud computing and distributed data mining. The model is based on the cloud computing architecture, which belongs to the type of incredible nodes.
基金Supported by the National Natural Science Foundation of China(No.61301245,U1533104)
文摘In order to improve the efficiency of learning the triangular membership functions( TMFs) for mining fuzzy association rule( FAR) in dynamic database,a single-pass fuzzy c means( SPFCM)algorithm is combined with the real-coded CHC genetic model to incrementally learn the TMFs. The cluster centers resulting from SPFCM are regarded as the midpoint of TMFs. The population of CHC is generated randomly according to the cluster center and constraint conditions among TMFs. Then a new population for incremental learning is composed of the excellent chromosomes stored in the first genetic process and the chromosomes generated based on the cluster center adjusted by SPFCM. The experiments on real datasets show that the number of generations converging to the solution of the proposed approach is less than that of the existing batch learning approach. The quality of TMFs generated by the approach is comparable to that of the batch learning approach. Compared with the existing incremental learning strategy,the proposed approach is superior in terms of the quality of TMFs and time cost.
基金supported in part by National High-tech R&D Program of China under Grants No.2012AA012600,2011AA010702,2012AA01A401,2012AA01A402National Natural Science Foundation of China under Grant No.60933005+1 种基金National Science and Technology Ministry of China under Grant No.2012BAH38B04National 242 Information Security of China under Grant No.2011A010
文摘The similarity search is one of the fundamental components in time series data mining,e.g.clustering,classification,association rules mining.Many methods have been proposed to measure the similarity between time series,including Euclidean distance,Manhattan distance,and dynamic time warping(DTW).In contrast,DTW has been suggested to allow more robust similarity measure and be able to find the optimal alignment in time series.However,due to its quadratic time and space complexity,DTW is not suitable for large time series datasets.Many improving algorithms have been proposed for DTW search in large databases,such as approximate search or exact indexed search.Unlike the previous modified algorithm,this paper presents a novel parallel scheme for fast similarity search based on DTW,which is called MRDTW(MapRedcuebased DTW).The experimental results show that our approach not only retained the original accuracy as DTW,but also greatly improved the efficiency of similarity measure in large time series.
文摘Mining association rules from large database is very costly. We develop a parallel algorithm for this task on shared-memory multiprocessor (SMP). Most proposed parallel algorithms for association rules mining have to scan the database at least two times. In this article, a parallel algorithm Scan Once (SO) has been proposed for SMP, which only scans the database once. And this algorithm is fundamentally different from the known parallel algorithm Count Distribution (CD). It adopts bit matrix to store the database information and gets the support of the frequent itemsets by adopting Vector-And-Operation, which greatly improve the efficiency of generating all frequent itemsets. Empirical evaluation shows that the algorithm outperforms the known one CD algorithm.
文摘This paper uses the extension theory of knowledge, probes into the problems of students employment of College of computer science, puts forward to the solving method,specific and provides corresponding strategies. At the same time, it carries on the appraisal to provide strategy, put forward to optimal strategies; it uses of baseing on extension data mining and mining association rules of the corresponding and finding the meaning relations existing in enterprise recruitment,
基金Six Talent Peak Projects in Jiangsu Province (RJFW-40)Jiangsu Province “333 High-level Talent Training Project”(2018Ⅲ-0121)+2 种基金Technology Innovation Fund of Science and Technology Enterprises in Jiangsu Province (BC2015022)Representative Project of Intangible Cultural Heritage in Pukou District,Nanjing (PKIX-4)The Construction and Application of Thyroid Disease Differentiation and Treatment Rule Mining and Clinical Decision Support System by Traditional Chinese Medicine Master ZHOU Zhongying (012071003583)。
文摘Objective Based on intra-set correlation analysis, this paper deconstructs the clinical medical records of traditional Chinese medicine(TCM) Master ZHOU Zhongying in treating thyroid cancer, and analyzes the experience in “mechanism-syndrome-medicine-prescription” for thyroid cancer.Methods Through Medcase data processing platform, based on Frequent Pattern(FP)-Growth enhanced correlation analysis algorithm, the medical records of Professor ZHOU Zhongying for the treatment of thyroid cancer from June 1, 2001 to February 28, 2015 were analyzed within the set.Results This study involved 43 medical records, 43 patients, and 167 visits. After processing intra-set correlations, 28 groups of highly correlated symptoms, 21 groups of highly correlated tongue images, 10 groups of highly correlated pulse conditions, 28 groups of highly correlated pathogenesis, 34 groups of highly correlated herbs, and 26 groups of highly correlated western medicine diagnosis were selected. Professor ZHOU Zhongying treats thyroid cancer according to syndrome differentiation. Symptoms with more association rules included neck swelling, neck pain, cough, and dry mouth;tongue images with more association rules included dark purple tongue, dark red tongue, and fissured tongue;pulse conditions with more association rules were wiry pulse, thready pulse, small pulse, and slippery pulse;the pathogenesis with more association rules was phlegm and blood stasis, damp-heat accumulation,and impairment of both Qi and Yin;herbs with more association rules were Chaihu(Bupleuri Radix), Zeqi(Sun Euphoribiae Herb), and Tiandong(Asparagi Radix);western medicine diagnosis with more association rules included thyroid cancer, insomnia, and chronic gastritis.Conclusion Thyroid cancer mostly presents as deficiency in origin and excess in manifestations. The basic pathogenesis is phlegm and blood stasis, damp-heat accumulation, and impairment of both Qi and Yin, which are closely related to liver, kidney, and spleen. Professor ZHOU Zhongying adopts both attack and supplement approaches as the general treatment principle, with a strong emphasis on regulating Qi and relieving depression, eliminating phlegm and resolving stagnation, eliminating dampness and turbidity, clearing fire and destroying poison, moistening dryness and softening hard mass, invigorating Qi and nourishing Yin, and paying attention to nourishing liver and kidney, invigorating spleen and stomach,while protecting the heart and lungs.
基金supported by Project 71202155 of National Science Funds for Distinguished Young Scientists of China
文摘Due to the rapid development,Internet has become the main field for brand building.Under this circumstance,the image of the brand is always consistent with the consumers' perception.Therefore,this study uses the method of text mining of search engine to explore the categories of brand archetype based on Brand Personality Theory from the perspective of Internet.The results find that 12 brand archetypes,including caregiver,sage,hero,innocent,dominator,creator,vitality,explorer,stylish woman,lover,cooperator,and vogue gentleman,have a high degree explanation.Deeper study uses case study to verify the reasonability and effectiveness of the classification standard.
文摘This paper describes the influence of joint spacing and joint orientation on the penetration rate of a Tunnel Boring Machine (TBM) disc cutter as modeled by the Discrete Element Method (DEM). The input data for the siLmulations were obtained from the sandstone along the AIborz tunnel that is currently being excavated in Iran using a 5.2 m diameter open TBM. Three joint spacings, 150, 200, and 300 mm, were modeled together with seven values of joint orientation; 0°, 15°, 30°, 45°, 60°, 75°, and 90°. The results show that the penetration increases when joint orientation increases from 0° to 75°, but it decreases as the joint orientation increases further from 75° to 90°. This is true for each joint spacing. In addition, for a given joint orientation increasing the joint spacing causes the TBM penetration to decrease. The optimum joint orientation, from the viewpoint of TBM penetration, is about 60-75°.
文摘Objective to analyze the rule of drug use in treating lung cancer disease by using the theory of fuzheng and dispelling evil in traditional Chinese medicine by data mining.Methods:By following Dr.Qianjinghua's outpatient department,we collected the prescription for the treatment of lung cancer by using frequency analysis,association rule analysis and cluster analysis.Results:746 prescriptions were included in this study,commonly used drugs 170 flavors.Traditional Chinese medicine with a high frequency of use were Baihuasheshecao(Hedyotis diffusa),Maorenshen(Actinidia valvata Dunn),Shishangbai(Selaginella doederleinii Hieron),Shijianchuan(Salvia chinensis Benth),Huangqi(Astragali Radix),Sanyeqing(Tetrastlgma hemsleyanum),Baimaoteng(HerbaSolani),Zhuling(Polyporus),Nvzhenzi(Ligustrum lucidum Ait)etc.Through the analysis of association rule,we find out the core prescription according to the support degree(10%,20%,30%)and the confidence degree,and set the correlation degree to 8 and punishment degree to 2.Finally:The core prescription efficacy of the drug group was analyzed and the theoretical connotation of preventive treatment of disease was obtained.Conclusion:The core prescription of preventive treatment of disease is to clear up heat and detoxification on the basis of strengthening the spleen,replenishing qi and tonifying kidney.The use of blood supplements and yin tonics throughout the course is reduced during treatments.The treatments also pay attention to the function of spleen and stomach of lung cancer patients,and adjust heat and detoxification force according to different conditions of patients and pay attention to the relief of patients’emotions.
基金Supported by National Natural Science Foundation of China: No.81072883
文摘Objective To establish a warehouse on acupuncture-moxibution (acup-mox) methods to explore valuable laws about research and clinical application of acup-mox in a great number of literature by use of data mining technique and to promote acup-mox research and effective treatment of diseases. Methods According to the acup-mox literature information of different types, different subjects of the aeup-mox literature are determined and the relevant database is established. In the continuously enriched subject database, the data warehouse catering to multi-subjects and multi-dimensions is set up so as to provide a platform for wider application of aeup-mox literature information. Results Based on characteristics of the acup-mox literature, many subject databases, such as needling with filiform needle, moxibustion, etc., are established and clinical treatment laws of acup-mox are revealed by use of data mining method in the database established. Conclusion Establishment of the acup-mox literature warehouse provides a standard data expression model, rich attributes and relation between different literature information for study of aeup-mox literature by more effective techniques, and a rich and standard data basis for acup-mox researches.