Archive for the ‘Stat’ Category.

Why Gaussianity?

Physicists believe that the Gaussian law has been proved in mathematics while mathematicians think that it was experimentally established in physics — Henri Poincare

Continue reading ‘Why Gaussianity?’ »

NR, the 3rd edition

Talking about limits in Numerical Recipes in my PyIMSL post, I couldn’t resist checking materials, particularly updates in the new edition of Numerical Recipes by Press, et al. (2007). Continue reading ‘NR, the 3rd edition’ »

A lecture note of great utility

I didn’t realize this post was sitting for a month during which I almost neglected the slog. As if great books about probability and information theory for statisticians and engineers exist, I believe there are great statistical physics books for physicists. On the other hand, relatively less exist that introduce one subject to the other kind audience. In this regard, I thought the lecture note can be useful.

[arxiv:physics.data-an:0808.0012]
Lectures on Probability, Entropy, and Statistical Physics by Ariel Caticha
Abstract:

Continue reading ‘A lecture note of great utility’ »

Background Subtraction, the Sequel [Eqn]

As mentioned before, background subtraction plays a big role in astrophysical analyses. For a variety of reasons, it is not a good idea to subtract out background counts from source counts, especially in the low-counts Poisson regime. What Bayesians recommend instead is to set up a model for the intensity of the source and the background and to infer these intensities given the data. Continue reading ‘Background Subtraction, the Sequel [Eqn]’ »

loess and lowess and locfit, oh my

Diab Jerius follows up on LOESS techniques with a very nice summary update and finds LOCFIT to be very useful, but there are still questions about how it deals with measurement errors and combining observations from different experiments:

Continue reading ‘loess and lowess and locfit, oh my’ »

The Banff Challenge [Eqn]

With the LHC coming on line anon, it is appropriate to highlight the Banff Challenge, which was designed as a way to figure out how to place bounds on the mass of the Higgs boson. The equations that were to be solved are quite general, and are in fact the first attempt that I know of where calibration data are directly and explicitly included in the analysis. Continue reading ‘The Banff Challenge [Eqn]’ »

chi-square distribution [Eqn]

The Χ2 distribution plays an incredibly important role in astronomical data analysis, but it is pretty much a black box to most astronomers. How many people know, for instance, that its form is exactly the same as the γ distribution? A Χ2 distribution with ν degrees of freedom is

p(z|ν) = (1/Γ(ν/2)) (1/2)ν/2 zν/2-1 e-z/2 ≡ γ(z;ν/2,1/2) , where z=Χ2.

Continue reading ‘chi-square distribution [Eqn]’ »

Kaplan-Meier Estimator (Equation of the Week)

The Kaplan-Meier (K-M) estimator is the non-parametric maximum likelihood estimator of the survival probability of items in a sample. “Survival” here is a historical holdover because this method was first developed to estimate patient survival chances in medicine, but in general it can be thought of as a form of cumulative probability. It is of great importance in astronomy because so much of our data are limited and this estimator provides an excellent way to estimate the fraction of objects that may be below (or above) certain flux levels. The application of K-M to astronomy was explored in depth in the mid-80’s by Jurgen Schmitt (1985, ApJ, 293, 178), Feigelson & Nelson (1985, ApJ 293, 192), and Isobe, Feigelson, & Nelson (1986, ApJ 306, 490). [See also Hyunsook's primer.] It has been coded up and is available for use as part of the ASURV package. Continue reading ‘Kaplan-Meier Estimator (Equation of the Week)’ »

Survival Analysis: A Primer

Astronomers confront with various censored and truncated data. Often these types of data are called after famous scientists who generalized them, like Eddington bias. When these censored or truncated data become the subject of study in statistics, instead of naming them, statisticians try to model them so that the uncertainty can be quantified. This area is called survival analysis. If your library has The American Statistician subscription and you are an astronomer handles censored or truncated data sets, this primer would be useful for briefly conceptualizing statistics jargon in survival analysis and for characterizing uncertainties residing in your data. Continue reading ‘Survival Analysis: A Primer’ »

Poisson Likelihood [Equation of the Week]

Astrophysics, especially high-energy astrophysics, is all about counting photons. And this, it is said, naturally leads to all our data being generated by a Poisson process. True enough, but most astronomers don’t know exactly how it works out, so this derivation is for them. Continue reading ‘Poisson Likelihood [Equation of the Week]’ »

A test for global maximum

If getting the first derivative (score function) and the second derivative (empirical Fisher information) of a (pseudo) likelihood function is feasible and checking regularity conditions is viable, a test for global maximum (Li and Jiang, JASA, 1999, Vol. 94, pp. 847-854) seems to be a useful reference for verifying the best fit solution. Continue reading ‘A test for global maximum’ »

On the history and use of some standard statistical models

What if R. A. Fisher was hired by the Royal Observatory in spite that his interest was biology and agriculture, or W. S. Gosset[1] instead of brewery? An article by E.L. Lehmann made me think this what if. If so, astronomers could have handled errors better than now. Continue reading ‘On the history and use of some standard statistical models’ »

  1. Gosset’s pen name was Student, from which the name, Student-t in t-distribution or t-test was spawned.[]

[ArXiv] 3rd week, June 2008

Likelihood Ratio Test Statistic [Equation of the Week]

From Protassov et al. (2002, ApJ, 571, 545), here is a formal expression for the Likelihood Ratio Test Statistic,

TLRT = -2 ln R(D,Θ0,Θ)

R(D,Θ0,Θ) = [ supθεΘ0 p(D|Θ0) ] / [ supθεΘ p(D|Θ) ]

where D are an independent data sample, Θ are model parameters {θi, i=1,..M,M+1,..N}, and Θ0 form a subset of the model where θi = θi0, i=1..M are held fixed at their nominal values. That is, Θ represents the full model and Θ0 represents the simpler model, which is a subset of Θ. R(D,Θ0,Θ) is the ratio of the maximal (technically, supremal) likelihoods of the simpler model to that of the full model.
Continue reading ‘Likelihood Ratio Test Statistic [Equation of the Week]’ »

[ArXiv] 2nd week, June 2008

As Prof. Speed said, PCA is prevalent in astronomy, particularly this week. Furthermore, a paper explicitly discusses R, a popular statistics package. Continue reading ‘[ArXiv] 2nd week, June 2008’ »