As various software bots are widely used in open source software repositories,some drawbacks are coming to light,such as giving newcomers non-positive feedback and misleading empirical studies of software engineering ...As various software bots are widely used in open source software repositories,some drawbacks are coming to light,such as giving newcomers non-positive feedback and misleading empirical studies of software engineering researchers.Several techniques have been proposed by researchers to perform bot detection,but most of them are limited to identifying bots performing specific activities,let alone distinguishing between GitHub App and OAuth App.In this paper,we propose a bot detection technique for OAuth App,named BDGOA.24 features are used in BDGOA,which can be divided into three dimensions:account information,account activity,and text similarity.To better explore the behavioral features,we define a fine-grained classification of behavioral events and introduce self-similarity to quantify the repeatability of behavioral sequence.We leverage five machine learning classifiers on the benchmark dataset to conduct bot detection,and finally choose random forest as the classifier,which achieves the highest F1-score of 95.83%.The experimental results comparing with the state-of-the-art approaches also demonstrate the superiority of BDGOA.展开更多
文摘As various software bots are widely used in open source software repositories,some drawbacks are coming to light,such as giving newcomers non-positive feedback and misleading empirical studies of software engineering researchers.Several techniques have been proposed by researchers to perform bot detection,but most of them are limited to identifying bots performing specific activities,let alone distinguishing between GitHub App and OAuth App.In this paper,we propose a bot detection technique for OAuth App,named BDGOA.24 features are used in BDGOA,which can be divided into three dimensions:account information,account activity,and text similarity.To better explore the behavioral features,we define a fine-grained classification of behavioral events and introduce self-similarity to quantify the repeatability of behavioral sequence.We leverage five machine learning classifiers on the benchmark dataset to conduct bot detection,and finally choose random forest as the classifier,which achieves the highest F1-score of 95.83%.The experimental results comparing with the state-of-the-art approaches also demonstrate the superiority of BDGOA.