Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-11T07:48:02.475Z Has data issue: false hasContentIssue false

A comparison of automatic histogram constructions

Published online by Cambridge University Press:  11 June 2009

Laurie Davies
Affiliation:
Department of Mathematics, University Duisburg-Essen; Department of Mathematics, Technical University Eindhoven, Germany.
Ursula Gather
Affiliation:
Department of Statistics, Technische Universität Dortmund, Germany; gather@statistik.uni.dortmund.de
Dan Nordman
Affiliation:
Department of Statistics, Iowa State University, USA.
Henrike Weinert
Affiliation:
Department of Statistics, Technische Universität Dortmund, Germany.
Get access

Abstract

Even for a well-trained statistician the construction of a histogramfor a given real-valued data set is a difficult problem. It is evenmore difficult to construct a fully automatic procedure whichspecifies the number and widths of the bins in a satisfactory mannerfor a wide range of data sets. In this paper we compare severalhistogram construction procedures by means of a simulationstudy. The study includes plug-in methods, cross-validation,penalized maximum likelihood and the taut string procedure. Their performance on different test beds is measured by their ability to identify the peaks of an underlying density as well as by Hellinger distance.

Type
Research Article
Copyright
© EDP Sciences, SMAI, 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akaike, H., A new look at the statistical model identification. IEEE Trans. Automatic Control 19 (1973) 716723. CrossRef
Azzalini, A. and Bowman, A.W., A look at some data on the Old Faithful geyser. Appl. Statist. 39 (1990) 357365. CrossRef
Barron, A., Birgé, L. and Massart, P., Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113 (1999) 301413. CrossRef
Birgé, L. and Rozenholc, Y., How many bins should be put in a regular histogram? ESAIM: PS 10 (2006) 2445. CrossRef
J.E. Daly, Construction of optimal histograms. Commun. Stat., Theory Methods 17 (1988) 2921–2931.
Davies, P.L. and Kovac, A., Local extremes, runs, strings and multiresolution (with discussion). Ann. Stat. 29 (2001) 165. CrossRef
Davies, P.L. and Kovac, A., Densities, spectral densities and modality. Ann. Stat. 32 (2004) 10931136.
P.L. Davies and A. Kovac, ftnonpar, R-package, version 0.1-82, http://www.r-project.org (2008).
L. Devroye and L. Györfi, Nonparametric density estimation: the L 1 view. John Wiley, New York (1985).
Dümbgen, L. and Walther, G., Multiscale inference about a density. Ann. Stat. 36 (2008) 17581785. CrossRef
Engel, J., The multiresolution histogram. Metrika 46 (1997) 4157. CrossRef
Freedman, D. and Diaconis, P., On the histogram as a density estimator: L 2 theory. Z. Wahr. Verw. Geb. 57 (1981) 453476. CrossRef
Good, I.J. and Gaskins, R.A., Density estimation and bump-hunting by the penalizes likelihood method exemplified by scattering and meteorite data. J. Amer. Statist. Assoc. 75 (1980) 4273. CrossRef
Hall, P., Akaike's information criterion and Kullback-Leibler loss for histogram density estimation. Probab. Theory Relat. Fields 85 (1990) 449467. CrossRef
Hall, P. and Hannan, E.J., On stochastic complexity and nonparametric density estimation. Biometrika 75 (1988) 705714. CrossRef
Hall, P. and Wand, M.P., Minimizing L 1 distance in nonparametric density estimation. J. Multivariate Anal. 26 (1988) 5988. CrossRef
He, K. and Meeden, G., Selecting the number of bins in a histogram: A decision theoretic approach. J. Stat. Plann. Inference 61 (1997) 4959. CrossRef
Y. Kanazawa, An optimal variable cell histogram. Commun. Stat., Theory Methods 17 (1988) 1401–1422.
Kanazawa, Y., An optimal variable cell histogram based on the sample spacings. Ann. Stat. 20 (1992) 291304. CrossRef
Kanazawa, Y., Hellinger distance and Akaike's information criterion for the histogram. Statist. Probab. Lett. 17 (1993) 293298. CrossRef
Loader, C.R., Bandwidth selection: classical or plug-in? Ann. Stat. 27 (1999) 415438. CrossRef
Marron, J.S. and Wand, M.P., Exact mean integrated squared error. Ann. Stat. 20 (1992) 712736. CrossRef
M. Postman, J.P. Huchra and M.J. Geller, Probes of large-scale structures in the Corona Borealis region. Astrophys. J. 92, (1986) 1238–1247.
Rissanen, J., A universal prior for integers and estimation by minimum description length. Ann. Stat. 11 (1983) 416431. CrossRef
Rissanen, J., Stochastic Complexity (with discussion). J. R. Statist. Soc. B 49 (1987) 223239.
J. Rissanen, Stochastic complexity in statistical inquiry. World Scientific, New Jersey (1989).
Rissanen, J., Fisher information and stochastic complexity. IEEE Trans. Inf. Theory 42 (1996) 4047. CrossRef
Rissanen, J., Speed, T.P. and Density, B. Yu estimation by stochastic complexity. IEEE Trans. Inf. Theory 38 (1992) 315323. CrossRef
Roeder, K., Density estimation with confidence sets exemplified by superclusters and voids in galaxies. J. Amer. Statist. Assoc. 85 (1990) 617624. CrossRef
M. Rudemo, Empirical choice of histograms and kernel density estimators. Scand. J. Statist. 9 (1982)65–78.
Schwartz, G., Estimating the dimension of a model. Ann. Stat. 6 (1978) 461464. CrossRef
Scott, D.W., On optimal and data-based histograms. Biometrika 66 (1979) 605610. CrossRef
D.W. Scott, Multivariate density estimation: theory, practice, and visualization. Wiley, New York (1992).
Silverman, B.W., Choosing the window width when estimating a density. Biometrika 65 (1978) 111. CrossRef
B.W. Silverman, Density estimation for statistics and data analysis. Chapman and Hall, London (1985).
Simonoff, J.S. and Udina, F., Measuring the stability of histogram appearance when the anchor position is changed. Comput. Stat. Data Anal. 23 (1997) 335353. CrossRef
Sturges, H., The choice of a class-interval. J. Amer. Statist. Assoc. 21 (1926) 6566. CrossRef
Szpankowski, W., On asymptotics of certain recurrences arising in universal coding. Prob. Inf. Trans. 34 (1998) 142146.
Wand, M.P., Data-based choice of histogram bin width. American Statistician 51 (1997) 5964.
M.P. Wand and B. Ripley, KernSmooth, R-package, version 2.22-21, http://www.r-project.org (2007).