This paper presents a novel consensus clustering(CC)approach for a document repository concerning power substations(PSD)and contributes to the intangible asset management of power systems.A domain ontology model,i.e.,...This paper presents a novel consensus clustering(CC)approach for a document repository concerning power substations(PSD)and contributes to the intangible asset management of power systems.A domain ontology model,i.e.,substation ontology(SONT),is applied to modify the traditional vector space model(VSM)for document representation,which is concerned with the semantic relationship between terms.A new document representation is generated using a term mutual information matrix with the aid of SONT.In addition,compared with two other novel CC algorithms,i.e.,non-negative matrix factorisation-based CC(NNMF-CC)and information theory-based CC(INT-CC),weighted partition via kernel-based CC algorithm(WPK-CC)is utilised to solve the CC issue for PSD.Meanwhile,genetic algorithms(GA)were applied to WPK-CC for PSD,as there are limitations in the original WPK-CC for document clustering.Subsequently,selected mechanisms in each GA’s procedure are compared and improved,resulting in comprehensive parameter settings for the PSD CC.Four simulation studies have been designed,in which the results are evaluated by purity validation method and show that the SONT-based document representation and improved WPK-CC,via modified GA,significantly improve the performance of the PSD CC.展开更多
基金supported by the National Natural Science Foundation of China(No.51477054)Guangdong Innovative Research Team Program(No.201001N0104744201).
文摘This paper presents a novel consensus clustering(CC)approach for a document repository concerning power substations(PSD)and contributes to the intangible asset management of power systems.A domain ontology model,i.e.,substation ontology(SONT),is applied to modify the traditional vector space model(VSM)for document representation,which is concerned with the semantic relationship between terms.A new document representation is generated using a term mutual information matrix with the aid of SONT.In addition,compared with two other novel CC algorithms,i.e.,non-negative matrix factorisation-based CC(NNMF-CC)and information theory-based CC(INT-CC),weighted partition via kernel-based CC algorithm(WPK-CC)is utilised to solve the CC issue for PSD.Meanwhile,genetic algorithms(GA)were applied to WPK-CC for PSD,as there are limitations in the original WPK-CC for document clustering.Subsequently,selected mechanisms in each GA’s procedure are compared and improved,resulting in comprehensive parameter settings for the PSD CC.Four simulation studies have been designed,in which the results are evaluated by purity validation method and show that the SONT-based document representation and improved WPK-CC,via modified GA,significantly improve the performance of the PSD CC.