Archive for the ‘Fitting’ Category.

A lecture note of great utility

I didn’t realize this post was sitting for a month during which I almost neglected the slog. As if great books about probability and information theory for statisticians and engineers exist, I believe there are great statistical physics books for physicists. On the other hand, relatively less exist that introduce one subject to the other kind audience. In this regard, I thought the lecture note can be useful.

[arxiv:physics.data-an:0808.0012]
Lectures on Probability, Entropy, and Statistical Physics by Ariel Caticha
Abstract:

Continue reading ‘A lecture note of great utility’ »

loess and lowess and locfit, oh my

Diab Jerius follows up on LOESS techniques with a very nice summary update and finds LOCFIT to be very useful, but there are still questions about how it deals with measurement errors and combining observations from different experiments:

Continue reading ‘loess and lowess and locfit, oh my’ »

Survival Analysis: A Primer

Astronomers confront with various censored and truncated data. Often these types of data are called after famous scientists who generalized them, like Eddington bias. When these censored or truncated data become the subject of study in statistics, instead of naming them, statisticians try to model them so that the uncertainty can be quantified. This area is called survival analysis. If your library has The American Statistician subscription and you are an astronomer handles censored or truncated data sets, this primer would be useful for briefly conceptualizing statistics jargon in survival analysis and for characterizing uncertainties residing in your data. Continue reading ‘Survival Analysis: A Primer’ »

A test for global maximum

If getting the first derivative (score function) and the second derivative (empirical Fisher information) of a (pseudo) likelihood function is feasible and checking regularity conditions is viable, a test for global maximum (Li and Jiang, JASA, 1999, Vol. 94, pp. 847-854) seems to be a useful reference for verifying the best fit solution. Continue reading ‘A test for global maximum’ »

Likelihood Ratio Test Statistic [Equation of the Week]

From Protassov et al. (2002, ApJ, 571, 545), here is a formal expression for the Likelihood Ratio Test Statistic,

TLRT = -2 ln R(D,Θ0,Θ)

R(D,Θ0,Θ) = [ supθεΘ0 p(D|Θ0) ] / [ supθεΘ p(D|Θ) ]

where D are an independent data sample, Θ are model parameters {θi, i=1,..M,M+1,..N}, and Θ0 form a subset of the model where θi = θi0, i=1..M are held fixed at their nominal values. That is, Θ represents the full model and Θ0 represents the simpler model, which is a subset of Θ. R(D,Θ0,Θ) is the ratio of the maximal (technically, supremal) likelihoods of the simpler model to that of the full model.
Continue reading ‘Likelihood Ratio Test Statistic [Equation of the Week]’ »

Q: Lowess error bars?

It is somewhat surprising that astronomers haven’t cottoned on to Lowess curves yet. That’s probably a good thing because I think people already indulge in smoothing far too much for their own good, and Lowess makes for a very powerful hammer. But the fact that it is semi-parametric and is based on polynomial least-squares fitting does make it rather attractive.

And, of course, sometimes it is unavoidable, or so I told Brad W. When one has too many points for a regular polynomial fit, and they are too scattered for a spline, and too few to try a wavelet “denoising”, and no real theoretical expectation of any particular model function, and all one wants is “a smooth curve, damnit”, then Lowess is just the ticket.

Well, almost.

There is one major problem — how does one figure what the error bounds are on the “best-fit” Lowess curve? Clearly, each fit at each point can produce an estimate of the error, but simply collecting the separate errors is not the right thing to do because they would all be correlated. I know how to propagate Gaussian errors in boxcar smoothing a histogram, but this is a whole new level of complexity. Does anyone know if there is software that can calculate reliable error bands on the smooth curve? We will take any kind of error model — Gaussian, Poisson, even the (local) variances in the data themselves.

[ArXiv] 3rd week, May 2008

Not many this week, but there’s a great read. Continue reading ‘[ArXiv] 3rd week, May 2008’ »

[ArXiv] Pareto Distribution

Astronomy is ruled by Gaussian distribution with a Poisson distribution duchy. From time to time, ranks are awarded to other distributions without their own territories to be governed independently. Among these distributions, Pareto deserves a high rank. There is a preprint of this week on the Pareto distribution: Continue reading ‘[ArXiv] Pareto Distribution’ »

Astrometry.net

Astrometry.net, a cool website I heard from Harvard Astronomy Professor Doug Finkbeiner’s class (Principles of Astronomical Measurements), does a complex job of matching your images of unknown locations or coordinates to sources in catalogs. By providing your images in various formats, they provide astrometric calibration meta-data and lists of known objects falling inside the field of view. Continue reading ‘Astrometry.net’ »

[ArXiv] 1st week, Mar. 2008

Irrelevant to astrostatistics but interesting for baseball lovers.
    [stat.AP:0802.4317] Jensen, Shirley, & Wyner
    Bayesball: A Bayesian Hierarchical Model for Evaluating Fielding in Major League Baseball

With the 5th year WMAP data release, there were many WMAP related papers and among them, most statistical papers are listed. Continue reading ‘[ArXiv] 1st week, Mar. 2008’ »

[ArXiv] A fast Bayesian object detection

This is a quite long paper that I separated from [Arvix] 4th week, Feb. 2008:
      [astro-ph:0802.3916] P. Carvalho, G. Rocha, & M.P.Hobso
      A fast Bayesian approach to discrete object detection in astronomical datasets - PowellSnakes I
As the title suggests, it describes Bayesian source detection and provides me a chance to learn the foundation of source detection in astronomy. Continue reading ‘[ArXiv] A fast Bayesian object detection’ »

Everybody needs crampons

Sherpa is a fitting environment in which Chandra data (and really, X-ray data from any observatory) can be analyzed. It has just undergone a major update and now runs on python. Or allows python to run. Something like that. It is a very powerful tool, but I can never remember how to use it, and I have an amazing knack for not finding what I need in the documentation. So here is a little cheat sheet (which I will keep updating as and when if I learn more): Continue reading ‘Everybody needs crampons’ »

Signal Processing and Bootstrap

Astronomers have developed their ways of processing signals almost independent to but sometimes collaboratively with engineers, although the fundamental of signal processing is same: extracting information. Doubtlessly, these two parallel roads of astronomers’ and engineers’ have been pointing opposite directions: one toward the sky and the other to the earth. Nevertheless, without an intensive argument, we could say that somewhat statistics has played the medium of signal processing for both scientists and engineers. This particular issue of IEEE signal processing magazine may shed lights for astronomers interested in signal processing and statistics outside the astronomical society.

IEEE Signal Processing Magazine Jul. 2007 Vol 24 Issue 4: Bootstrap methods in signal processing

This link will show the table of contents and provide links to articles; however, the access to papers requires IEEE Xplore subscription via libraries or individual IEEE memberships). Here, I’d like to attempt to introduce some articles and tutorials.
Continue reading ‘Signal Processing and Bootstrap’ »

An example of chi2 bias in fitting the X-ray spectra.

The chi2 bias can affect the results of the X-ray spectral fitting and it
can be demonstrated in a simple way. The described simulations can be done
in Sherpa or XSPEC, the two software packages that allow for simulating the X-ray
spectra using a function called “fakeit”.

Here I assume an absorbed power law model with the sets of 3 parameters
(absorption column, photon index, and normalization) to simulate Chandra X-ray
spectrum given the instrument calibration files (RMF/ARF) and the Poisson noise.
The resulting simulated X-ray spectrum contains the model predicted counts with
the Poisson noise. This spectrum is then fit with the absorbed power law model to get
the best fit parameter values for NH, photon index and normalization.

I simulate 1000 spectra and fit each of them using different statistics: chi2 data variance,
chi2 model variance and Cash/C-statistics.

The next step is to plot the simulated distributions of the parameters and compare them
to the assumed values for the simulations. The figure shows the distribution of the photon
index parameter obtain from the fit of the spectra generated for the assumed simulated value
of 1.267. The chi2 bias is evident in this analysis, while the
CSTAT and Cash statistics based on the likelihood behave well. chi2 model variance
underestimates the simulated value, chi2 data variance overestimates this parameter.

 

Distributions of parameter values based on fitting the simulated X-ray data.

The plot shows the distribution of photon index parameters obtained by
fitting the simulated X-ray spectra with about 60000 counts and using the
three different statistics: chi2 with the model variance, chi2 with
data variance and C-statistics (Cash). The assumed value in the
simulations 1.267 is marked with the solid line.

[ArXiv] Bimodal Color Distribution in GCS, Sept. 7, 2007

From arxiv/astro-ph:0709.1073v1
On the Metallicity-Color Relations and Bimodal Color Distributions in Extragalactic Globular Cluster Systems by M. Cantiello and J. P. Blakeslee

Many observations on globular cluster systems (GCS) show bimodal distributions in color and metallicity space. The authors discussed the complication of non-linear metalicity and color relations and presented their careful study to suggest the optimal color(s) for revealing the presence of real bimodal GC metallicity distributions. Based on their simulation study, (V-H) and (V-K) are confirmed to be good colors for revealing unbiased bimodal metallicity distributions in GCS.
Continue reading ‘[ArXiv] Bimodal Color Distribution in GCS, Sept. 7, 2007’ »