We postulate and analyze a nonlinear subsampling accuracy loss(SSAL)model based on the root mean square error(RMSE)and two SSAL models based on the mean square error(MSE),suggested by extensive preliminary simulations...We postulate and analyze a nonlinear subsampling accuracy loss(SSAL)model based on the root mean square error(RMSE)and two SSAL models based on the mean square error(MSE),suggested by extensive preliminary simulations.The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped(FUD)and the fraction of items dropped(FID).We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering(CF)algorithm.The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items.Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics.The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user(or an item)vs.not dropping it.Moreover,one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient.Most importantly,the models are constant in the sense that they are written in closed-form using the considered data characteristics(densities and numbers of users and items).The models are validated through extensive simulations based on 850 synthetically generated primary(pre-subsampling)matrices derived from the 25M MovieLens dataset.Nearly 460000 subsampled rating matrices were then simulated and subjected to the singular value decomposition(SVD)CF algorithm.Further validation was conducted using the 1M MovieLens and the Yahoo!Music Rating datasets.The models were constant and significant across all 3 datasets.展开更多
Rasterization is a conversion process accompanied with information loss, which includes the loss of features' shape, structure, position, attribute and so on. Two chief factors that affect estimating attribute accura...Rasterization is a conversion process accompanied with information loss, which includes the loss of features' shape, structure, position, attribute and so on. Two chief factors that affect estimating attribute accuracy loss in rasterization are grid cell size and evaluating method. That is, attribute accuracy loss in rasterization has a close relationship with grid cell size; besides, it is also influenced by evaluating methods. Therefore, it is significant to analyze these two influencing factors comprehensively. Taking land cover data of Sichuan at the scale of 1:250,000 in 2005 as a case, in view of data volume and its processing time of the study region, this study selects 16 spatial scales from 600 m to 30 km, uses rasterizing method based on the Rule of Maximum Area (RMA) in ArcGIS and two evaluating methods of attribute accuracy loss, which are Normal Analysis Method (NAM) and a new Method Based on Grid Cell (MBGC), respectively, and analyzes the scale effect of attribute (it is area here) accuracy loss at 16 different scales by these two evaluating methods comparatively. The results show that: (1) At the same scale, average area accuracy loss of the entire study region evaluated by MBGC is significantly larger than the one estimated using NAM. Moreover, this discrepancy between the two is obvious in the range of 1 km to 10 km. When the grid cell is larger than 10 km, average area accuracy losses calculated by the two evaluating methods are stable, even tended to parallel. (2) MBGC can not only estimate RMA rasterization attribute accuracy loss accurately, but can express the spatial distribution of the loss objectively. (3) The suitable scale domain for RMA rasterization of land cover data of Sichuan at the scale of 1:250,000 in 2005 is better equal to or less than 800 m, in which the data volume is favorable and the processina time is not too Iona. as well as the area accuracv loss is less than 2.5%.展开更多
The necessity of recognizing handwritten characters is increasing day by day because of its various applications. The objective of this paper is to provide a sophisticated, effective and efficient way to recognize and...The necessity of recognizing handwritten characters is increasing day by day because of its various applications. The objective of this paper is to provide a sophisticated, effective and efficient way to recognize and classify Bangla handwritten characters. Here an extended convolutional neural network (CNN) model has been proposed to recognize Bangla handwritten characters. Our CNN model is tested on <span style="font-family:Verdana;">“</span><span style="font-family:Verdana;">BanglalLekha-Isolated</span><span style="font-family:Verdana;">”</span><span style="font-family:Verdana;"> dataset where there are 10 classes for digits, 11 classes for vowels and 39 classes for consonants. Our model shows accuracy of recognition as: 99.50% for Bangla digits, 93.18% for vowels, 90.00% for consonants and 92.25% for combined classes.</span>展开更多
文摘We postulate and analyze a nonlinear subsampling accuracy loss(SSAL)model based on the root mean square error(RMSE)and two SSAL models based on the mean square error(MSE),suggested by extensive preliminary simulations.The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped(FUD)and the fraction of items dropped(FID).We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering(CF)algorithm.The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items.Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics.The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user(or an item)vs.not dropping it.Moreover,one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient.Most importantly,the models are constant in the sense that they are written in closed-form using the considered data characteristics(densities and numbers of users and items).The models are validated through extensive simulations based on 850 synthetically generated primary(pre-subsampling)matrices derived from the 25M MovieLens dataset.Nearly 460000 subsampled rating matrices were then simulated and subjected to the singular value decomposition(SVD)CF algorithm.Further validation was conducted using the 1M MovieLens and the Yahoo!Music Rating datasets.The models were constant and significant across all 3 datasets.
基金The Independent Research of the State Key Laboratory of Resource and Environmental Information System,No.O88RA100SAThe Third Innovative and Cutting-edge Projects of Institute of Geographic Sciences andNatural Resources Research, CAS, No.O66U0309SZ
文摘Rasterization is a conversion process accompanied with information loss, which includes the loss of features' shape, structure, position, attribute and so on. Two chief factors that affect estimating attribute accuracy loss in rasterization are grid cell size and evaluating method. That is, attribute accuracy loss in rasterization has a close relationship with grid cell size; besides, it is also influenced by evaluating methods. Therefore, it is significant to analyze these two influencing factors comprehensively. Taking land cover data of Sichuan at the scale of 1:250,000 in 2005 as a case, in view of data volume and its processing time of the study region, this study selects 16 spatial scales from 600 m to 30 km, uses rasterizing method based on the Rule of Maximum Area (RMA) in ArcGIS and two evaluating methods of attribute accuracy loss, which are Normal Analysis Method (NAM) and a new Method Based on Grid Cell (MBGC), respectively, and analyzes the scale effect of attribute (it is area here) accuracy loss at 16 different scales by these two evaluating methods comparatively. The results show that: (1) At the same scale, average area accuracy loss of the entire study region evaluated by MBGC is significantly larger than the one estimated using NAM. Moreover, this discrepancy between the two is obvious in the range of 1 km to 10 km. When the grid cell is larger than 10 km, average area accuracy losses calculated by the two evaluating methods are stable, even tended to parallel. (2) MBGC can not only estimate RMA rasterization attribute accuracy loss accurately, but can express the spatial distribution of the loss objectively. (3) The suitable scale domain for RMA rasterization of land cover data of Sichuan at the scale of 1:250,000 in 2005 is better equal to or less than 800 m, in which the data volume is favorable and the processina time is not too Iona. as well as the area accuracv loss is less than 2.5%.
文摘The necessity of recognizing handwritten characters is increasing day by day because of its various applications. The objective of this paper is to provide a sophisticated, effective and efficient way to recognize and classify Bangla handwritten characters. Here an extended convolutional neural network (CNN) model has been proposed to recognize Bangla handwritten characters. Our CNN model is tested on <span style="font-family:Verdana;">“</span><span style="font-family:Verdana;">BanglalLekha-Isolated</span><span style="font-family:Verdana;">”</span><span style="font-family:Verdana;"> dataset where there are 10 classes for digits, 11 classes for vowels and 39 classes for consonants. Our model shows accuracy of recognition as: 99.50% for Bangla digits, 93.18% for vowels, 90.00% for consonants and 92.25% for combined classes.</span>