期刊文献+

P-ROCK: A Sustainable Clustering Algorithm for Large Categorical Datasets

下载PDF
导出
摘要 Data clustering is crucial when it comes to data processing and analytics.The new clustering method overcomes the challenge of evaluating and extracting data from big data.Numerical or categorical data can be grouped.Existing clustering methods favor numerical data clustering and ignore categorical data clustering.Until recently,the only way to cluster categorical data was to convert it to a numeric representation and then cluster it using current numeric clustering methods.However,these algorithms could not use the concept of categorical data for clustering.Following that,suggestions for expanding traditional categorical data processing methods were made.In addition to expansions,several new clustering methods and extensions have been proposed in recent years.ROCK is an adaptable and straightforward algorithm for calculating the similarity between data sets to cluster them.This paper aims to modify the algo-rithm by creating a parameterized version that takes specific algorithm parameters as input and outputs satisfactory cluster structures.The parameterized ROCK algorithm is the name given to the modified algorithm(P-ROCK).The proposed modification makes the original algorithm moreflexible by using user-defined parameters.A detailed hypothesis was developed later validated with experimental results on real-world datasets using our proposed P-ROCK algorithm.A comparison with the original ROCK algorithm is also provided.Experiment results show that the proposed algorithm is on par with the original ROCK algorithm with an accuracy of 97.9%.The proposed P-ROCK algorithm has improved the runtime and is moreflexible and scalable.
出处 《Intelligent Automation & Soft Computing》 SCIE 2023年第1期553-566,共14页 智能自动化与软计算(英文)
基金 supporting project number(RSP2022R498),King Saud University,Riyadh,Saudi Arabia.
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部