Detecting naturally arising structures in data is central to knowledge extraction from data. In most applications, the main challenge is in the choice of the appropriate model for exploring the data features. The choi...Detecting naturally arising structures in data is central to knowledge extraction from data. In most applications, the main challenge is in the choice of the appropriate model for exploring the data features. The choice is generally poorly understood and any tentative choice may be too restrictive. Growing volumes of data, disparate data sources and modelling techniques entail the need for model optimization via adaptability rather than comparability. We propose a novel two-stage algorithm to modelling continuous data consisting of an unsupervised stage whereby the algorithm searches through the data for optimal parameter values and a supervised stage that adapts the parameters for predictive modelling. The method is implemented on the sunspots data with inherently Gaussian distributional properties and assumed bi-modality. Optimal values separating high from lows cycles are obtained via multiple simulations. Early patterns for each recorded cycle reveal that the first 3 years provide a sufficient basis for predicting the peak. Multiple Support Vector Machine runs using repeatedly improved data parameters show that the approach yields greater accuracy and reliability than conventional approaches and provides a good basis for model selection. Model reliability is established via multiple simulations of this type.展开更多
从比利时皇家天文台(the Royal Observatory of Belgium)的太阳黑子指数数据中心(the Sunspot Index Data center)的网站获得了1700—2013每年的太阳黑子数的数据。利用R软件结合时间序列建模方法对观测值进行了分析和建模,并利用该模...从比利时皇家天文台(the Royal Observatory of Belgium)的太阳黑子指数数据中心(the Sunspot Index Data center)的网站获得了1700—2013每年的太阳黑子数的数据。利用R软件结合时间序列建模方法对观测值进行了分析和建模,并利用该模型对未来的太阳黑子数进行了预测。展开更多
文摘Detecting naturally arising structures in data is central to knowledge extraction from data. In most applications, the main challenge is in the choice of the appropriate model for exploring the data features. The choice is generally poorly understood and any tentative choice may be too restrictive. Growing volumes of data, disparate data sources and modelling techniques entail the need for model optimization via adaptability rather than comparability. We propose a novel two-stage algorithm to modelling continuous data consisting of an unsupervised stage whereby the algorithm searches through the data for optimal parameter values and a supervised stage that adapts the parameters for predictive modelling. The method is implemented on the sunspots data with inherently Gaussian distributional properties and assumed bi-modality. Optimal values separating high from lows cycles are obtained via multiple simulations. Early patterns for each recorded cycle reveal that the first 3 years provide a sufficient basis for predicting the peak. Multiple Support Vector Machine runs using repeatedly improved data parameters show that the approach yields greater accuracy and reliability than conventional approaches and provides a good basis for model selection. Model reliability is established via multiple simulations of this type.
文摘从比利时皇家天文台(the Royal Observatory of Belgium)的太阳黑子指数数据中心(the Sunspot Index Data center)的网站获得了1700—2013每年的太阳黑子数的数据。利用R软件结合时间序列建模方法对观测值进行了分析和建模,并利用该模型对未来的太阳黑子数进行了预测。