As a complex hot problem in the financial field,stock trend forecasting uses a large amount of data and many related indicators;hence it is difficult to obtain sustainable and effective results only by relying on empi...As a complex hot problem in the financial field,stock trend forecasting uses a large amount of data and many related indicators;hence it is difficult to obtain sustainable and effective results only by relying on empirical analysis.Researchers in the field of machine learning have proved that random forest can form better judgements on this kind of problem,and it has an auxiliary role in the prediction of stock trend.This study uses historical trading data of four listed companies in the USA stock market,and the purpose of this study is to improve the performance of random forest model in medium-and long-term stock trend prediction.This study applies the exponential smoothing method to process the initial data,calculates the relevant technical indicators as the characteristics to be selected,and proposes the D-RF-RS method to optimize random forest.As the random forest is an ensemble learning model and is closely related to decision tree,D-RF-RS method uses a decision tree to screen the importance of features,and obtains the effective strong feature set of the model as input.Then,the parameter combination of the model is optimized through random parameter search.The experimental results show that the average accuracy of random forest is increased by 0.17 after the above process optimization,which is 0.18 higher than the average accuracy of light gradient boosting machine model.Combined with the performance of the ROC curve and Precision–Recall curve,the stability of the model is also guaranteed,which further demonstrates the advantages of random forest in medium-and long-term trend prediction of the stock market.展开更多
With the continuous development of machine learning and the increasing complexity of financial data analysis,it is more popular to use models in the field of machine learning to solve the hot and difficult problems in...With the continuous development of machine learning and the increasing complexity of financial data analysis,it is more popular to use models in the field of machine learning to solve the hot and difficult problems in the financial industry.To improve the effectiveness of stock trend prediction and solve the problems in time series data processing,this paper combines the fuzzy affiliation function with stock-related technical indicators to obtain nominal data that can widely reflect the constituent stocks in the case of time series changes by analysing the S&P 500 index.Meanwhile,in order to optimise the current machine learning algorithm in which the setting and adjustment of hyperparameters rely too much on empirical knowledge,this paper combines the deep forest model to train the stock data separately.The experimental results show that(1)the accuracy of the extreme random forest and the accuracy of the multi-grain cascade forest are both higher than that of the gated recurrent unit(GRU)model when the un-fuzzy index-adjusted dataset is used as features for input,(2)the accuracy of the extreme random forest and the accuracy of the multigranular cascade forest are improved by using the fuzzy index-adjusted dataset as features for input,(3)the accuracy of the fuzzy index-adjusted dataset as features for inputting the extreme random forest is improved by 18.89% compared to that of the un-fuzzy index-adjusted dataset as features for inputting the extreme random forest and(4)the average accuracy of the fuzzy index-adjusted dataset as features for inputting multi-grain cascade forest increased by 5.67%.展开更多
Unsupervised feature selection has become an important and challenging problem faced with vast amounts of unlabeled and high-dimension data in machine learning. We propose a novel unsupervised feature selection method...Unsupervised feature selection has become an important and challenging problem faced with vast amounts of unlabeled and high-dimension data in machine learning. We propose a novel unsupervised feature selection method using Structured Self-Representation( SSR) by simultaneously taking into account the selfrepresentation property and local geometrical structure of features. Concretely,according to the inherent selfrepresentation property of features,the most representative features can be selected. Mean while,to obtain more accurate results,we explore local geometrical structure to constrain the representation coefficients to be close to each other if the features are close to each other. Furthermore,an efficient algorithm is presented for optimizing the objective function. Finally,experiments on the synthetic dataset and six benchmark real-world datasets,including biomedical data,letter recognition digit data and face image data,demonstrate the encouraging performance of the proposed algorithm compared with state-of-the-art algorithms.展开更多
基金National Natural Science Foundation of China,Grant/Award Numbers:61673084,National Natural Science Foundation of ChinaThe Fundamental Research Foundation for Universities of Heilongjiang Province,Grant/Award Number:LGYC2018JC017。
文摘As a complex hot problem in the financial field,stock trend forecasting uses a large amount of data and many related indicators;hence it is difficult to obtain sustainable and effective results only by relying on empirical analysis.Researchers in the field of machine learning have proved that random forest can form better judgements on this kind of problem,and it has an auxiliary role in the prediction of stock trend.This study uses historical trading data of four listed companies in the USA stock market,and the purpose of this study is to improve the performance of random forest model in medium-and long-term stock trend prediction.This study applies the exponential smoothing method to process the initial data,calculates the relevant technical indicators as the characteristics to be selected,and proposes the D-RF-RS method to optimize random forest.As the random forest is an ensemble learning model and is closely related to decision tree,D-RF-RS method uses a decision tree to screen the importance of features,and obtains the effective strong feature set of the model as input.Then,the parameter combination of the model is optimized through random parameter search.The experimental results show that the average accuracy of random forest is increased by 0.17 after the above process optimization,which is 0.18 higher than the average accuracy of light gradient boosting machine model.Combined with the performance of the ROC curve and Precision–Recall curve,the stability of the model is also guaranteed,which further demonstrates the advantages of random forest in medium-and long-term trend prediction of the stock market.
基金Fundamental Research Foundation for Universities of Heilongjiang Province,Grant/Award Number:LGYC2018JQ003。
文摘With the continuous development of machine learning and the increasing complexity of financial data analysis,it is more popular to use models in the field of machine learning to solve the hot and difficult problems in the financial industry.To improve the effectiveness of stock trend prediction and solve the problems in time series data processing,this paper combines the fuzzy affiliation function with stock-related technical indicators to obtain nominal data that can widely reflect the constituent stocks in the case of time series changes by analysing the S&P 500 index.Meanwhile,in order to optimise the current machine learning algorithm in which the setting and adjustment of hyperparameters rely too much on empirical knowledge,this paper combines the deep forest model to train the stock data separately.The experimental results show that(1)the accuracy of the extreme random forest and the accuracy of the multi-grain cascade forest are both higher than that of the gated recurrent unit(GRU)model when the un-fuzzy index-adjusted dataset is used as features for input,(2)the accuracy of the extreme random forest and the accuracy of the multigranular cascade forest are improved by using the fuzzy index-adjusted dataset as features for input,(3)the accuracy of the fuzzy index-adjusted dataset as features for inputting the extreme random forest is improved by 18.89% compared to that of the un-fuzzy index-adjusted dataset as features for inputting the extreme random forest and(4)the average accuracy of the fuzzy index-adjusted dataset as features for inputting multi-grain cascade forest increased by 5.67%.
基金Sponsored by the Major Program of National Natural Science Foundation of China(Grant No.13&ZD162)the Applied Basic Research Programs of China National Textile and Apparel Council(Grant No.J201509)
文摘Unsupervised feature selection has become an important and challenging problem faced with vast amounts of unlabeled and high-dimension data in machine learning. We propose a novel unsupervised feature selection method using Structured Self-Representation( SSR) by simultaneously taking into account the selfrepresentation property and local geometrical structure of features. Concretely,according to the inherent selfrepresentation property of features,the most representative features can be selected. Mean while,to obtain more accurate results,we explore local geometrical structure to constrain the representation coefficients to be close to each other if the features are close to each other. Furthermore,an efficient algorithm is presented for optimizing the objective function. Finally,experiments on the synthetic dataset and six benchmark real-world datasets,including biomedical data,letter recognition digit data and face image data,demonstrate the encouraging performance of the proposed algorithm compared with state-of-the-art algorithms.