This paper outlines a theory of estimation,where optimality is defined for all sizes of data—not only asymptotically.Also one principle is needed to cover estimation of both real-valued parameters and their number.To...This paper outlines a theory of estimation,where optimality is defined for all sizes of data—not only asymptotically.Also one principle is needed to cover estimation of both real-valued parameters and their number.To achieve this we have to abandon the traditional assumption that the observed data have been generated by a“true”distribution,and that the objective of estimation is to recover this from the data.Instead,the objective in this theory is to fit‘models’as distributions to the data in order to find the regular statistical features.The performance of the fitted models is measured by the probability they assign to the data:a large probability means a good fit and a small probability a bad fit.Equivalently,the negative logarithm of the probability should be minimized,which has the interpretation of code length.There are three equivalent characterizations of optimal estimators,the first defined by estimation capacity,the second to satisfy necessary conditions for optimality for all data,and the third by the complete Minimum Description Length(MDL)principle.展开更多
文摘This paper outlines a theory of estimation,where optimality is defined for all sizes of data—not only asymptotically.Also one principle is needed to cover estimation of both real-valued parameters and their number.To achieve this we have to abandon the traditional assumption that the observed data have been generated by a“true”distribution,and that the objective of estimation is to recover this from the data.Instead,the objective in this theory is to fit‘models’as distributions to the data in order to find the regular statistical features.The performance of the fitted models is measured by the probability they assign to the data:a large probability means a good fit and a small probability a bad fit.Equivalently,the negative logarithm of the probability should be minimized,which has the interpretation of code length.There are three equivalent characterizations of optimal estimators,the first defined by estimation capacity,the second to satisfy necessary conditions for optimality for all data,and the third by the complete Minimum Description Length(MDL)principle.