Let{Xn;n≥1}be a sequence of i.i.d, random variables with finite variance,Q(n)be the related R/S statistics. It is proved that lim ε↓0 ε^2 ∑n=1 ^8 n log n/1 P{Q(n)≥ε√2n log log n}=2/1 EY^2,where Y=sup0≤t...Let{Xn;n≥1}be a sequence of i.i.d, random variables with finite variance,Q(n)be the related R/S statistics. It is proved that lim ε↓0 ε^2 ∑n=1 ^8 n log n/1 P{Q(n)≥ε√2n log log n}=2/1 EY^2,where Y=sup0≤t≤1B(t)-inf0≤t≤sB(t),and B(t) is a Brownian bridge.展开更多
In basketball, each player’s skill level is the key to a team’s success or failure, the skill level is affected by many personal and environmental factors. A physics-informed AI statistics has become extremely impor...In basketball, each player’s skill level is the key to a team’s success or failure, the skill level is affected by many personal and environmental factors. A physics-informed AI statistics has become extremely important. In this article, a complex non-linear process is considered by taking into account the average points per game of each player, playing time, shooting percentage, and others. This physics-informed statistics is to construct a multiple linear regression model with physics-informed neural networks. Based on the official data provided by the American Basketball League, and combined with specific methods of R program analysis, the regression model affecting the player’s average points per game is verified, and the key factors affecting the player’s average points per game are finally elucidated. The paper provides a novel window for coaches to make meaningful in-game adjustments to team members.展开更多
This paper establishes the phase space in the light of spacial series data , discusses the fractal structure of geological data in terms of correlated functions and studies the chaos of these data . In addition , it i...This paper establishes the phase space in the light of spacial series data , discusses the fractal structure of geological data in terms of correlated functions and studies the chaos of these data . In addition , it introduces the R/S analysis for time series analysis into spacial series to calculate the structural fractal dimensions of ranges and standard deviation for spacial series data -and to establish the fractal dimension matrix and the procedures in plotting the fractal dimension anomaly diagram with vector distances of fractal dimension . At last , it has examples of its application .展开更多
The performance of six statistical approaches,which can be used for selection of the best model to describe the growth of individual fish,was analyzed using simulated and real length-at-age data.The six approaches inc...The performance of six statistical approaches,which can be used for selection of the best model to describe the growth of individual fish,was analyzed using simulated and real length-at-age data.The six approaches include coefficient of determination(R2),adjusted coefficient of determination(adj.-R2),root mean squared error(RMSE),Akaike's information criterion(AIC),bias correction of AIC(AICc) and Bayesian information criterion(BIC).The simulation data were generated by five growth models with different numbers of parameters.Four sets of real data were taken from the literature.The parameters in each of the five growth models were estimated using the maximum likelihood method under the assumption of the additive error structure for the data.The best supported model by the data was identified using each of the six approaches.The results show that R2 and RMSE have the same properties and perform worst.The sample size has an effect on the performance of adj.-R2,AIC,AICc and BIC.Adj.-R2 does better in small samples than in large samples.AIC is not suitable to use in small samples and tends to select more complex model when the sample size becomes large.AICc and BIC have best performance in small and large sample cases,respectively.Use of AICc or BIC is recommended for selection of fish growth model according to the size of the length-at-age data.展开更多
In the past, victims of electrical and lightning injuries have been assessed in a manner lacking a system- atic formulation, and against ad hoc criteria, particularly in the area of neuropsychological disability. In t...In the past, victims of electrical and lightning injuries have been assessed in a manner lacking a system- atic formulation, and against ad hoc criteria, particularly in the area of neuropsychological disability. In this manner patients have, for example, only been partially treated, been poorly or incorrectly diagnosed, and have been denied the full benefit of compensation for their injuries. This paper contains a proposal for diagnostic criteria particularly for the neuropsychological aspects of the post injury syndrome. It pays attention to widely published consistent descriptions of the syndrome, and a new cluster analysis of post electrical injury patients. It formulates a proposal which could be incorporated into future editions of the American Psychiatric Association's Diagnostic and Statistical Manual (DSM). The major neuropsycholog- ical consequences include neurocognitive dysfunction, and memory subgroup dysfunction, with ongoing consequences, and sometimes including progressive or delayed psychiatric, cognitive, and/or neurological symptoms. The proposed diagnostic criteria insist on a demonstrated context for the injury, both specifying the shock circumstance, and also physical consequences. It allows for a certain delay in onset of symptoms. It recognizes exclusory conditions. The outcome is a proposal for a DSM classification for the post electrical or lightning injury syndrome. This proposal is considered important for grounding patient treatment, and for further treatment trials. Options for treatment in electrical or lightning injury are summarised, and future trials are foreshadowed.展开更多
Various networks exist in the world today including biological, social, information, and communication networks with the Internet as the largest network of all. One salient structural feature of these networks is the ...Various networks exist in the world today including biological, social, information, and communication networks with the Internet as the largest network of all. One salient structural feature of these networks is the formation of groups or communities of vertices that tend to be more connected to each other within the same group than to those outside. Therefore, the detection of these communities is a topic of great interest and importance in many applications and different algorithms including label propagation have been developed for such purpose. Speaker-listener label propagation algorithm (SLPA) enjoys almost linear time complexity, so desirable in dealing with large networks. As an extension of SLPA, this study presented a novel weighted label propagation algorithm (WLPA), which was tested on four real world social networks with known community structures including the famous Zachary's karate club network. Wilcoxon tests on the communities found in the karate club network by WLPA demonstrated an improved statistical significance over SLPA. Withthehelp of Wilcoxon tests again, we were able to determine the best possible formation of two communities in this network relative to the ground truth partition, which could be used as a new benchmark for assessing community detection algorithms. Finally WLPA predicted better communities than SLPA in two of the three additional real social networks, when compared to the ground truth.展开更多
Spatial autocorrelation is a measure of the correlation of an observation with other observations through space. Most statistical analyses are based on the assumption that the values of observations are independent of...Spatial autocorrelation is a measure of the correlation of an observation with other observations through space. Most statistical analyses are based on the assumption that the values of observations are independent of one another. Spatial autocorrelation violates this assumption, because observations at near-by locations are related to each other, and hence, the consideration of spatial autocorrelations has been gaining attention in crash data modeling in recent years, and research have shown that ignoring this factor may lead to a biased estimation of the modeling parameters. This paper examines two spatial autocorrelation indices: Moran’s Index;and Getis-Ord Gi* statistic to measure the spatial autocorrelation of vehicle crashes occurred in Boone County roads in the state of Missouri, USA for the years 2013-2015. Since each index can identify different clustering patterns of crashes, therefore this paper introduces a new hybrid method to identify the crash clustering patterns by combining both Moran’s Index and Gi*?statistic. Results show that the new method can effectively improve the number, extent, and type of crash clustering along roadways.展开更多
As the world's authoritative organization on energy information, the International Energy Agency (IEA), which was founded in 1974, releases Key World Energy Statistics every year from 1997 (hereinafter referred to...As the world's authoritative organization on energy information, the International Energy Agency (IEA), which was founded in 1974, releases Key World Energy Statistics every year from 1997 (hereinafter referred to as the "Key Data"). The "Key Data" released in 2007 announced the 2005 statistics, and also provided the 1973 statistics for comparison. From the published data, we can clearly find the development path and trend of the world energy and power industry. Also, China's strong development momentum, high- speed growth of energy consumption and the enormous challenges in the sustainable energy supply are especially noticeable. This paper reviewed the "Key Data" to perceive China's Energy Development. The analysis and interpretation of these data are purely from the author's point of view.展开更多
A new robust proportional-integral-derivative (PID) tracking control framework is considered for stochastic systems with non-Gaussian variable based on B-spline neural network approximation and T-S fuzzy model ident...A new robust proportional-integral-derivative (PID) tracking control framework is considered for stochastic systems with non-Gaussian variable based on B-spline neural network approximation and T-S fuzzy model identification. The tracked object is the statistical information of a given target probability density function (PDF), rather than a deterministic signal. Following B-spline approximation to the integrated performance function, the concerned problem is transferred into the tracking of given weights. Different from the previous related works, the time delay T-S fuzzy models with the exogenous disturbances are applied to identify the nonlinear weighting dynamics. Meanwhile, the generalized PID controller structure and the improved convex linear matrix inequalities (LMI) algorithms are proposed to fulfil the tracking problem. Furthermore, in order to enhance the robust performance, the peak-to-peak measure index is applied to optimize the tracking performance. Simulations are given to demonstrate the efficiency of the proposed approach.展开更多
In the theory of random fractal, there are two important classes of random sets, one is the class of fractals generated by the paths of stochastic processes and another one is the class of factals generated by statist...In the theory of random fractal, there are two important classes of random sets, one is the class of fractals generated by the paths of stochastic processes and another one is the class of factals generated by statistical contraction operators. Now we will introduce some things about the probability basis and fractal properties of fractals in the last class. The probability basis contains (1) the convergence and measurability of a random recursive setK(ω) as a random element, (2) martingals property. The fractal properties include (3) the character of various similarity, (4) the separability property, (5) the support and zero-one law of distributionP k =P·K ?1, (6) the Hausdorff dimension and Hausdorff exact measure function.展开更多
In 1998, facing the complicated and severe domestic and internationaleconomic environment, people of all nationalities, under the correct lead-ership of the Central Party Committee and the State Council, implement-ed ...In 1998, facing the complicated and severe domestic and internationaleconomic environment, people of all nationalities, under the correct lead-ership of the Central Party Committee and the State Council, implement-ed a series of policies aiming at increasing input and expanding domesticdemand. Difficulties brought about by the Asian financial crisis and dev-astating flooding were overcome, various reforms were further deepened,and economic growth was promoted, resulting in great achievements at-tracting worldwide attention.展开更多
Forecasting the movement of stock market is a long-time attractive topic. This paper implements different statistical learning models to predict the movement of S&P 500 index. The S&P 500 index is influenced b...Forecasting the movement of stock market is a long-time attractive topic. This paper implements different statistical learning models to predict the movement of S&P 500 index. The S&P 500 index is influenced by other important financial indexes across the world such as commodity price and financial technical indicators. This paper systematically investigated four supervised learning models, including Logistic Regression, Gaussian Discriminant Analysis (GDA), Naive Bayes and Support Vector Machine (SVM) in the forecast of S&P 500 index. After several experiments of optimization in features and models, especially the SVM kernel selection and feature selection for different models, this paper concludes that a SVM model with a Radial Basis Function (RBF) kernel can achieve an accuracy rate of 62.51% for the future market trend of the S&P 500 index.展开更多
After 30 years of economic development, the high-tech industry has played </span><span style="font-family:Verdana;">an </span><span style="font-family:Verdana;">important ro...After 30 years of economic development, the high-tech industry has played </span><span style="font-family:Verdana;">an </span><span style="font-family:Verdana;">important role in China’s national economy. The development of high-level</span><span style="font-family:"font-size:10pt;"> </span><span style="font-family:Verdana;">technological industry plays a leading role in guiding the transformation of </span><span style="font-family:Verdana;">China’s economy from “investment-driven” to “technology-driven”. The</span><span style="font-family:Verdana;"> high-tech industry represents the future industrial development direction and plays a positive role in promoting the transformation of traditional industries. The rapid development of high-tech industry is the key to social progress. In this paper, the traditional analytical model of statistics is combined with principal component analysis and spatial analysis, and R language is used to express the analytical results intuitively on the map. Finally, a comprehensive evaluation is established.展开更多
XLR is an Excel add-in that unifies the user friendly, widely popular interface of Excel with the powerful and robust computational capability of the GNU statistical and graphical language R. The add-in attempts to ad...XLR is an Excel add-in that unifies the user friendly, widely popular interface of Excel with the powerful and robust computational capability of the GNU statistical and graphical language R. The add-in attempts to address the American Statistical Association’s comment that “Generic packages such as Excel are not sufficient even for the teaching of statistics, let alone for research and consulting.” R is the program of choice for researchers in statistical methodology that is freely available under the Free Software Foundation’s GNU General Public License (GPL) Agreement. By wedding the interactive mode of Excel with the power of statistical computing of R, XLR provides a solution to the problem of numerical inaccuracy of using Excel and its various internal statistical functions and procedures by harnessing the computational power of R. XLR will be distributed under the GNU GPL Agreement. The GPL puts students, instructors and researchers in control of their usage of the software by providing them with the freedom to run, copy, distribute, study, change and improve the software, thus, freeing them from the bondage of proprietary software. The creation of XLR will not only have a significant impact on the teaching of an Introductory Business Statistics course by providing a free alternative to the commercial proprietary software but also provide researchers in all disciplines who require sophisticated and cutting edge statistical and graphical procedures with a user-friendly interactive data analysis tool when the current set of available commands is expanded to include more advance procedures.展开更多
The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, and books with huge sizes. In this paper, we are interested in the probabilistic,...The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, and books with huge sizes. In this paper, we are interested in the probabilistic, and the statistical and the algorithmic aspects in studies of texts. We will be using the approach of k-shinglings, a k-shingling being defined as a sequence of k consecutive characters that are extracted from a text (k ≥ 1). The main stake in this field is to find accurate and quick algorithms to compute the similarity in short times. This will be achieved in using approximation methods. The first approximation method is statistical and, is based on the theorem of Glivenko-Cantelli. The second is the banding technique. And the third concerns a modification of the algorithm proposed by Rajaraman et al. ([1]), denoted here as (RUM). The Jaccard index is the one being used in this paper. We finally illustrate these results of the paper on the four Gospels. The results are very conclusive.展开更多
Outlier detection is an important data screening type. RIM is a mechanism of outlier detection that identifies the contribution of data points in a regression model. A BIC-based RIM is essentially a technique develope...Outlier detection is an important data screening type. RIM is a mechanism of outlier detection that identifies the contribution of data points in a regression model. A BIC-based RIM is essentially a technique developed in this work to simultaneously detect influential data points and select optimal predictor variables. It is an addition to the body of existing literature in this area of study to both having an alternative to the AIC and Mallow’s Cp Statistic-based RIM as well as conditions of no influence, some sort of influence and perfectly single outlier data point in an entire data set which are proposed in this work. The method is implemented in R by an algorithm that iterates over all data points;deleting data points one at a time while computing BICs and selecting optimal predictors alongside RIMs. From the analyses done using evaporation data to compare the proposed method and the existing methods, the results show that the same data cases selected as having high influences by the two existing methods are also selected by the proposed method. The three methods show same performance;hence the relevance of the BIC-based RIM cannot be undermined.展开更多
In literature, features based on First and Second Order Statistics that characterizes textures are used for classification of images. Features based on statistics of texture provide far less number of relevant and dis...In literature, features based on First and Second Order Statistics that characterizes textures are used for classification of images. Features based on statistics of texture provide far less number of relevant and distinguishable features in comparison to existing methods based on wavelet transformation. In this paper, we investigated performance of texture-based features in comparison to wavelet-based features with commonly used classifiers for the classification of Alzheimer’s disease based on T2-weighted MRI brain image. The performance is evaluated in terms of sensitivity, specificity, accuracy, training and testing time. Experiments are performed on publicly available medical brain images. Experimental results show that the performance with First and Second Order Statistics based features is significantly better in comparison to existing methods based on wavelet transformation in terms of all performance measures for all classifiers.展开更多
基金Project Supported by NSFC (10131040)SRFDP (2002335090)
文摘A law of iterated logarithm for R/S statistics with the help of the strong approximations of R/S statistics by functions of a Wiener process is shown.
文摘Let{Xn;n≥1}be a sequence of i.i.d, random variables with finite variance,Q(n)be the related R/S statistics. It is proved that lim ε↓0 ε^2 ∑n=1 ^8 n log n/1 P{Q(n)≥ε√2n log log n}=2/1 EY^2,where Y=sup0≤t≤1B(t)-inf0≤t≤sB(t),and B(t) is a Brownian bridge.
文摘In basketball, each player’s skill level is the key to a team’s success or failure, the skill level is affected by many personal and environmental factors. A physics-informed AI statistics has become extremely important. In this article, a complex non-linear process is considered by taking into account the average points per game of each player, playing time, shooting percentage, and others. This physics-informed statistics is to construct a multiple linear regression model with physics-informed neural networks. Based on the official data provided by the American Basketball League, and combined with specific methods of R program analysis, the regression model affecting the player’s average points per game is verified, and the key factors affecting the player’s average points per game are finally elucidated. The paper provides a novel window for coaches to make meaningful in-game adjustments to team members.
文摘This paper establishes the phase space in the light of spacial series data , discusses the fractal structure of geological data in terms of correlated functions and studies the chaos of these data . In addition , it introduces the R/S analysis for time series analysis into spacial series to calculate the structural fractal dimensions of ranges and standard deviation for spacial series data -and to establish the fractal dimension matrix and the procedures in plotting the fractal dimension anomaly diagram with vector distances of fractal dimension . At last , it has examples of its application .
基金Supported by the High Technology Research and Development Program of China (863 Program,No2006AA100301)
文摘The performance of six statistical approaches,which can be used for selection of the best model to describe the growth of individual fish,was analyzed using simulated and real length-at-age data.The six approaches include coefficient of determination(R2),adjusted coefficient of determination(adj.-R2),root mean squared error(RMSE),Akaike's information criterion(AIC),bias correction of AIC(AICc) and Bayesian information criterion(BIC).The simulation data were generated by five growth models with different numbers of parameters.Four sets of real data were taken from the literature.The parameters in each of the five growth models were estimated using the maximum likelihood method under the assumption of the additive error structure for the data.The best supported model by the data was identified using each of the six approaches.The results show that R2 and RMSE have the same properties and perform worst.The sample size has an effect on the performance of adj.-R2,AIC,AICc and BIC.Adj.-R2 does better in small samples than in large samples.AIC is not suitable to use in small samples and tends to select more complex model when the sample size becomes large.AICc and BIC have best performance in small and large sample cases,respectively.Use of AICc or BIC is recommended for selection of fish growth model according to the size of the length-at-age data.
文摘In the past, victims of electrical and lightning injuries have been assessed in a manner lacking a system- atic formulation, and against ad hoc criteria, particularly in the area of neuropsychological disability. In this manner patients have, for example, only been partially treated, been poorly or incorrectly diagnosed, and have been denied the full benefit of compensation for their injuries. This paper contains a proposal for diagnostic criteria particularly for the neuropsychological aspects of the post injury syndrome. It pays attention to widely published consistent descriptions of the syndrome, and a new cluster analysis of post electrical injury patients. It formulates a proposal which could be incorporated into future editions of the American Psychiatric Association's Diagnostic and Statistical Manual (DSM). The major neuropsycholog- ical consequences include neurocognitive dysfunction, and memory subgroup dysfunction, with ongoing consequences, and sometimes including progressive or delayed psychiatric, cognitive, and/or neurological symptoms. The proposed diagnostic criteria insist on a demonstrated context for the injury, both specifying the shock circumstance, and also physical consequences. It allows for a certain delay in onset of symptoms. It recognizes exclusory conditions. The outcome is a proposal for a DSM classification for the post electrical or lightning injury syndrome. This proposal is considered important for grounding patient treatment, and for further treatment trials. Options for treatment in electrical or lightning injury are summarised, and future trials are foreshadowed.
文摘Various networks exist in the world today including biological, social, information, and communication networks with the Internet as the largest network of all. One salient structural feature of these networks is the formation of groups or communities of vertices that tend to be more connected to each other within the same group than to those outside. Therefore, the detection of these communities is a topic of great interest and importance in many applications and different algorithms including label propagation have been developed for such purpose. Speaker-listener label propagation algorithm (SLPA) enjoys almost linear time complexity, so desirable in dealing with large networks. As an extension of SLPA, this study presented a novel weighted label propagation algorithm (WLPA), which was tested on four real world social networks with known community structures including the famous Zachary's karate club network. Wilcoxon tests on the communities found in the karate club network by WLPA demonstrated an improved statistical significance over SLPA. Withthehelp of Wilcoxon tests again, we were able to determine the best possible formation of two communities in this network relative to the ground truth partition, which could be used as a new benchmark for assessing community detection algorithms. Finally WLPA predicted better communities than SLPA in two of the three additional real social networks, when compared to the ground truth.
文摘Spatial autocorrelation is a measure of the correlation of an observation with other observations through space. Most statistical analyses are based on the assumption that the values of observations are independent of one another. Spatial autocorrelation violates this assumption, because observations at near-by locations are related to each other, and hence, the consideration of spatial autocorrelations has been gaining attention in crash data modeling in recent years, and research have shown that ignoring this factor may lead to a biased estimation of the modeling parameters. This paper examines two spatial autocorrelation indices: Moran’s Index;and Getis-Ord Gi* statistic to measure the spatial autocorrelation of vehicle crashes occurred in Boone County roads in the state of Missouri, USA for the years 2013-2015. Since each index can identify different clustering patterns of crashes, therefore this paper introduces a new hybrid method to identify the crash clustering patterns by combining both Moran’s Index and Gi*?statistic. Results show that the new method can effectively improve the number, extent, and type of crash clustering along roadways.
文摘As the world's authoritative organization on energy information, the International Energy Agency (IEA), which was founded in 1974, releases Key World Energy Statistics every year from 1997 (hereinafter referred to as the "Key Data"). The "Key Data" released in 2007 announced the 2005 statistics, and also provided the 1973 statistics for comparison. From the published data, we can clearly find the development path and trend of the world energy and power industry. Also, China's strong development momentum, high- speed growth of energy consumption and the enormous challenges in the sustainable energy supply are especially noticeable. This paper reviewed the "Key Data" to perceive China's Energy Development. The analysis and interpretation of these data are purely from the author's point of view.
基金supported by National Natural Science Foundationof China (No. 60472065, No. 60774013).
文摘A new robust proportional-integral-derivative (PID) tracking control framework is considered for stochastic systems with non-Gaussian variable based on B-spline neural network approximation and T-S fuzzy model identification. The tracked object is the statistical information of a given target probability density function (PDF), rather than a deterministic signal. Following B-spline approximation to the integrated performance function, the concerned problem is transferred into the tracking of given weights. Different from the previous related works, the time delay T-S fuzzy models with the exogenous disturbances are applied to identify the nonlinear weighting dynamics. Meanwhile, the generalized PID controller structure and the improved convex linear matrix inequalities (LMI) algorithms are proposed to fulfil the tracking problem. Furthermore, in order to enhance the robust performance, the peak-to-peak measure index is applied to optimize the tracking performance. Simulations are given to demonstrate the efficiency of the proposed approach.
文摘In the theory of random fractal, there are two important classes of random sets, one is the class of fractals generated by the paths of stochastic processes and another one is the class of factals generated by statistical contraction operators. Now we will introduce some things about the probability basis and fractal properties of fractals in the last class. The probability basis contains (1) the convergence and measurability of a random recursive setK(ω) as a random element, (2) martingals property. The fractal properties include (3) the character of various similarity, (4) the separability property, (5) the support and zero-one law of distributionP k =P·K ?1, (6) the Hausdorff dimension and Hausdorff exact measure function.
文摘In 1998, facing the complicated and severe domestic and internationaleconomic environment, people of all nationalities, under the correct lead-ership of the Central Party Committee and the State Council, implement-ed a series of policies aiming at increasing input and expanding domesticdemand. Difficulties brought about by the Asian financial crisis and dev-astating flooding were overcome, various reforms were further deepened,and economic growth was promoted, resulting in great achievements at-tracting worldwide attention.
文摘Forecasting the movement of stock market is a long-time attractive topic. This paper implements different statistical learning models to predict the movement of S&P 500 index. The S&P 500 index is influenced by other important financial indexes across the world such as commodity price and financial technical indicators. This paper systematically investigated four supervised learning models, including Logistic Regression, Gaussian Discriminant Analysis (GDA), Naive Bayes and Support Vector Machine (SVM) in the forecast of S&P 500 index. After several experiments of optimization in features and models, especially the SVM kernel selection and feature selection for different models, this paper concludes that a SVM model with a Radial Basis Function (RBF) kernel can achieve an accuracy rate of 62.51% for the future market trend of the S&P 500 index.
文摘After 30 years of economic development, the high-tech industry has played </span><span style="font-family:Verdana;">an </span><span style="font-family:Verdana;">important role in China’s national economy. The development of high-level</span><span style="font-family:"font-size:10pt;"> </span><span style="font-family:Verdana;">technological industry plays a leading role in guiding the transformation of </span><span style="font-family:Verdana;">China’s economy from “investment-driven” to “technology-driven”. The</span><span style="font-family:Verdana;"> high-tech industry represents the future industrial development direction and plays a positive role in promoting the transformation of traditional industries. The rapid development of high-tech industry is the key to social progress. In this paper, the traditional analytical model of statistics is combined with principal component analysis and spatial analysis, and R language is used to express the analytical results intuitively on the map. Finally, a comprehensive evaluation is established.
文摘XLR is an Excel add-in that unifies the user friendly, widely popular interface of Excel with the powerful and robust computational capability of the GNU statistical and graphical language R. The add-in attempts to address the American Statistical Association’s comment that “Generic packages such as Excel are not sufficient even for the teaching of statistics, let alone for research and consulting.” R is the program of choice for researchers in statistical methodology that is freely available under the Free Software Foundation’s GNU General Public License (GPL) Agreement. By wedding the interactive mode of Excel with the power of statistical computing of R, XLR provides a solution to the problem of numerical inaccuracy of using Excel and its various internal statistical functions and procedures by harnessing the computational power of R. XLR will be distributed under the GNU GPL Agreement. The GPL puts students, instructors and researchers in control of their usage of the software by providing them with the freedom to run, copy, distribute, study, change and improve the software, thus, freeing them from the bondage of proprietary software. The creation of XLR will not only have a significant impact on the teaching of an Introductory Business Statistics course by providing a free alternative to the commercial proprietary software but also provide researchers in all disciplines who require sophisticated and cutting edge statistical and graphical procedures with a user-friendly interactive data analysis tool when the current set of available commands is expanded to include more advance procedures.
文摘The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, and books with huge sizes. In this paper, we are interested in the probabilistic, and the statistical and the algorithmic aspects in studies of texts. We will be using the approach of k-shinglings, a k-shingling being defined as a sequence of k consecutive characters that are extracted from a text (k ≥ 1). The main stake in this field is to find accurate and quick algorithms to compute the similarity in short times. This will be achieved in using approximation methods. The first approximation method is statistical and, is based on the theorem of Glivenko-Cantelli. The second is the banding technique. And the third concerns a modification of the algorithm proposed by Rajaraman et al. ([1]), denoted here as (RUM). The Jaccard index is the one being used in this paper. We finally illustrate these results of the paper on the four Gospels. The results are very conclusive.
文摘Outlier detection is an important data screening type. RIM is a mechanism of outlier detection that identifies the contribution of data points in a regression model. A BIC-based RIM is essentially a technique developed in this work to simultaneously detect influential data points and select optimal predictor variables. It is an addition to the body of existing literature in this area of study to both having an alternative to the AIC and Mallow’s Cp Statistic-based RIM as well as conditions of no influence, some sort of influence and perfectly single outlier data point in an entire data set which are proposed in this work. The method is implemented in R by an algorithm that iterates over all data points;deleting data points one at a time while computing BICs and selecting optimal predictors alongside RIMs. From the analyses done using evaporation data to compare the proposed method and the existing methods, the results show that the same data cases selected as having high influences by the two existing methods are also selected by the proposed method. The three methods show same performance;hence the relevance of the BIC-based RIM cannot be undermined.
文摘In literature, features based on First and Second Order Statistics that characterizes textures are used for classification of images. Features based on statistics of texture provide far less number of relevant and distinguishable features in comparison to existing methods based on wavelet transformation. In this paper, we investigated performance of texture-based features in comparison to wavelet-based features with commonly used classifiers for the classification of Alzheimer’s disease based on T2-weighted MRI brain image. The performance is evaluated in terms of sensitivity, specificity, accuracy, training and testing time. Experiments are performed on publicly available medical brain images. Experimental results show that the performance with First and Second Order Statistics based features is significantly better in comparison to existing methods based on wavelet transformation in terms of all performance measures for all classifiers.