Spatial heterogeneity refers to the variation or differences in characteristics or features across different locations or areas in space. Spatial data refers to information that explicitly or indirectly belongs to a p...Spatial heterogeneity refers to the variation or differences in characteristics or features across different locations or areas in space. Spatial data refers to information that explicitly or indirectly belongs to a particular geographic region or location, also known as geo-spatial data or geographic information. Focusing on spatial heterogeneity, we present a hybrid machine learning model combining two competitive algorithms: the Random Forest Regressor and CNN. The model is fine-tuned using cross validation for hyper-parameter adjustment and performance evaluation, ensuring robustness and generalization. Our approach integrates Global Moran’s I for examining global autocorrelation, and local Moran’s I for assessing local spatial autocorrelation in the residuals. To validate our approach, we implemented the hybrid model on a real-world dataset and compared its performance with that of the traditional machine learning models. Results indicate superior performance with an R-squared of 0.90, outperforming RF 0.84 and CNN 0.74. This study contributed to a detailed understanding of spatial variations in data considering the geographical information (Longitude & Latitude) present in the dataset. Our results, also assessed using the Root Mean Squared Error (RMSE), indicated that the hybrid yielded lower errors, showing a deviation of 53.65% from the RF model and 63.24% from the CNN model. Additionally, the global Moran’s I index was observed to be 0.10. This study underscores that the hybrid was able to predict correctly the house prices both in clusters and in dispersed areas.展开更多
In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making d...In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making decisions based on the extracted knowledge is becoming increasingly important in all business domains. Nevertheless, high-dimensional data remains a major challenge for classification algorithms due to its high computational cost and storage requirements. The 2016 Demographic and Health Survey of Ethiopia (EDHS 2016) used as the data source for this study which is publicly available contains several features that may not be relevant to the prediction task. In this paper, we developed a hybrid multidimensional metrics framework for predictive modeling for both model performance evaluation and feature selection to overcome the feature selection challenges and select the best model among the available models in DM and ML. The proposed hybrid metrics were used to measure the efficiency of the predictive models. Experimental results show that the decision tree algorithm is the most efficient model. The higher score of HMM (m, r) = 0.47 illustrates the overall significant model that encompasses almost all the user’s requirements, unlike the classical metrics that use a criterion to select the most appropriate model. On the other hand, the ANNs were found to be the most computationally intensive for our prediction task. Moreover, the type of data and the class size of the dataset (unbalanced data) have a significant impact on the efficiency of the model, especially on the computational cost, and the interpretability of the parameters of the model would be hampered. And the efficiency of the predictive model could be improved with other feature selection algorithms (especially hybrid metrics) considering the experts of the knowledge domain, as the understanding of the business domain has a significant impact.展开更多
Viral infectious diseases significantly threaten the sustainability of freshwater fish aquaculture.The lack of studies on epidemic transmission patterns and mechanisms inhibits the development of containment strategie...Viral infectious diseases significantly threaten the sustainability of freshwater fish aquaculture.The lack of studies on epidemic transmission patterns and mechanisms inhibits the development of containment strategies from the viewpoint of veterinary public health.This study raises an epidemic mathematical model considering water transmission with the aim of analyzing the transmission process more accurately.The basic reproduction number R0 was derived by the model parameter including the water transmission coefficient and was used for the analysis of the virus transmission.Spring viremia of carp virus(SVCV)and zebrafish were used as model viruses and animals,respectively,to conduct the transmission experiment.Transmission through water was achieved by connecting two aquarium tanks with a water channel but blocking the fish movement between the tanks.With the collected experimental data,we determined the optimal hybrid machine learning algorithm to analyze the transmission process using an established mathematical model.In addition,future transmission was predicted and validated using the epidemic model and an optimal algorithm.Finally,the sensitivity of model parameters and the simulations of R0 variation were performed based on the modified complex epidemic model.This study is of significance in providing theoretical guidance for minimizing R0 by manipulating model parameters with containment measures.More importantly,since the modified model and algorithm demonstrated better performance in handling freshwater fish transmission problems,this study advances the future application of transmissible disease modeling with larger datasets in freshwater fish aquaculture.展开更多
In order to understand and organize the document in an efficient way,the multidocument summarization becomes the prominent technique in the Internet world.As the information available is in a large amount,it is necess...In order to understand and organize the document in an efficient way,the multidocument summarization becomes the prominent technique in the Internet world.As the information available is in a large amount,it is necessary to summarize the document for obtaining the condensed information.To perform the multi-document summarization,a new Bayesian theory-based Hybrid Learning Model(BHLM)is proposed in this paper.Initially,the input documents are preprocessed,where the stop words are removed from the document.Then,the feature of the sentence is extracted to determine the sentence score for summarizing the document.The extracted feature is then fed into the hybrid learning model for learning.Subsequently,learning feature,training error and correlation coefficient are integrated with the Bayesian model to develop BHLM.Also,the proposed method is used to assign the class label assisted by the mean,variance and probability measures.Finally,based on the class label,the sentences are sorted out to generate the final summary of the multi-document.The experimental results are validated in MATLAB,and the performance is analyzed using the metrics,precision,recall,F-measure and rouge-1.The proposed model attains 99.6%precision and 75%rouge-1 measure,which shows that the model can provide the final summary efficiently.展开更多
文摘Spatial heterogeneity refers to the variation or differences in characteristics or features across different locations or areas in space. Spatial data refers to information that explicitly or indirectly belongs to a particular geographic region or location, also known as geo-spatial data or geographic information. Focusing on spatial heterogeneity, we present a hybrid machine learning model combining two competitive algorithms: the Random Forest Regressor and CNN. The model is fine-tuned using cross validation for hyper-parameter adjustment and performance evaluation, ensuring robustness and generalization. Our approach integrates Global Moran’s I for examining global autocorrelation, and local Moran’s I for assessing local spatial autocorrelation in the residuals. To validate our approach, we implemented the hybrid model on a real-world dataset and compared its performance with that of the traditional machine learning models. Results indicate superior performance with an R-squared of 0.90, outperforming RF 0.84 and CNN 0.74. This study contributed to a detailed understanding of spatial variations in data considering the geographical information (Longitude & Latitude) present in the dataset. Our results, also assessed using the Root Mean Squared Error (RMSE), indicated that the hybrid yielded lower errors, showing a deviation of 53.65% from the RF model and 63.24% from the CNN model. Additionally, the global Moran’s I index was observed to be 0.10. This study underscores that the hybrid was able to predict correctly the house prices both in clusters and in dispersed areas.
文摘In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making decisions based on the extracted knowledge is becoming increasingly important in all business domains. Nevertheless, high-dimensional data remains a major challenge for classification algorithms due to its high computational cost and storage requirements. The 2016 Demographic and Health Survey of Ethiopia (EDHS 2016) used as the data source for this study which is publicly available contains several features that may not be relevant to the prediction task. In this paper, we developed a hybrid multidimensional metrics framework for predictive modeling for both model performance evaluation and feature selection to overcome the feature selection challenges and select the best model among the available models in DM and ML. The proposed hybrid metrics were used to measure the efficiency of the predictive models. Experimental results show that the decision tree algorithm is the most efficient model. The higher score of HMM (m, r) = 0.47 illustrates the overall significant model that encompasses almost all the user’s requirements, unlike the classical metrics that use a criterion to select the most appropriate model. On the other hand, the ANNs were found to be the most computationally intensive for our prediction task. Moreover, the type of data and the class size of the dataset (unbalanced data) have a significant impact on the efficiency of the model, especially on the computational cost, and the interpretability of the parameters of the model would be hampered. And the efficiency of the predictive model could be improved with other feature selection algorithms (especially hybrid metrics) considering the experts of the knowledge domain, as the understanding of the business domain has a significant impact.
基金the National Natural Science Foundation of China(U21A20268,31920103016,32173010)the fellowship of China Postdoctoral Science Foundation(No.2022M711128)+2 种基金Hunan Provincial Science and Technology Department(2021RC2076,2021NK2025,2022JJ40276,2022JJ30383)College Student Innovation and Entrepreneurship Training Program(2022123,2023227)the Modern Agricultural Industry Program of Hunan Province,and the Fish Disease and Vaccine Research and Development Platform for Postgraduates in Hunan Province.
文摘Viral infectious diseases significantly threaten the sustainability of freshwater fish aquaculture.The lack of studies on epidemic transmission patterns and mechanisms inhibits the development of containment strategies from the viewpoint of veterinary public health.This study raises an epidemic mathematical model considering water transmission with the aim of analyzing the transmission process more accurately.The basic reproduction number R0 was derived by the model parameter including the water transmission coefficient and was used for the analysis of the virus transmission.Spring viremia of carp virus(SVCV)and zebrafish were used as model viruses and animals,respectively,to conduct the transmission experiment.Transmission through water was achieved by connecting two aquarium tanks with a water channel but blocking the fish movement between the tanks.With the collected experimental data,we determined the optimal hybrid machine learning algorithm to analyze the transmission process using an established mathematical model.In addition,future transmission was predicted and validated using the epidemic model and an optimal algorithm.Finally,the sensitivity of model parameters and the simulations of R0 variation were performed based on the modified complex epidemic model.This study is of significance in providing theoretical guidance for minimizing R0 by manipulating model parameters with containment measures.More importantly,since the modified model and algorithm demonstrated better performance in handling freshwater fish transmission problems,this study advances the future application of transmissible disease modeling with larger datasets in freshwater fish aquaculture.
文摘In order to understand and organize the document in an efficient way,the multidocument summarization becomes the prominent technique in the Internet world.As the information available is in a large amount,it is necessary to summarize the document for obtaining the condensed information.To perform the multi-document summarization,a new Bayesian theory-based Hybrid Learning Model(BHLM)is proposed in this paper.Initially,the input documents are preprocessed,where the stop words are removed from the document.Then,the feature of the sentence is extracted to determine the sentence score for summarizing the document.The extracted feature is then fed into the hybrid learning model for learning.Subsequently,learning feature,training error and correlation coefficient are integrated with the Bayesian model to develop BHLM.Also,the proposed method is used to assign the class label assisted by the mean,variance and probability measures.Finally,based on the class label,the sentences are sorted out to generate the final summary of the multi-document.The experimental results are validated in MATLAB,and the performance is analyzed using the metrics,precision,recall,F-measure and rouge-1.The proposed model attains 99.6%precision and 75%rouge-1 measure,which shows that the model can provide the final summary efficiently.