This is an unusual Quote-of-the-week, in that I point you to [ABSTRACT] and a [VIDEO] of the recent talk at the Institute for Innovative Computing. See what you think! Continue reading ‘Quote of the Week, Oct 12, 2007’ »
This is a long comment on October 3, 2007 Quote of the Week, by Andrew Gelman. His “folk theorem” ascribes computational difficulties to problems with one’s model.
Model , for statisticians, has two meanings. A physicist or astronomer would automatically read this as pertaining to a model of the source, or physics, or sky. It has taken me a long time to be able to see it a little more from a statistics perspective, where it pertains to the full statistical model.
For example, in low-count high-energy physics, there had been a great deal of heated discussion over how to handle “negative confidence intervals”. (See for example PhyStat2003). That is, when using the statistical tools traditional to that community, one had such a large number of trials and such a low expected count rate that a significant number of “confidence intervals” for source intensity were wholly below zero. Further, there were more of these than expected (based on the assumptions in those traditional statistical tools). Statisticians such as David van Dyk pointed out that this was a sign of “model mis-match”. But (in my view) this was not understood at first — it was taken as a description of physics model mismatch. Of course what he (and others) meant was statistical model mismatch. That is, somewhere along the data-processing path, some Gauss-Normal assumptions had been made that were inaccurate for (essentially) low-count Poisson. If one took that into account, the whole “negative confidence interval” problem went away. In recent history, there has been a great deal of coordinated work to correct this and do all intervals properly.
This brings me to my second point. I want to raise a provocative corollary to Gelman’s folk theoreom:
When the “error bars” or “uncertainties” are very hard to calculate, it is usually because of a problem with the model, statistical or otherwise.
One can see this (I claim) in any method that allows one to get a nice “best estimate” or a nice “visualization”, but for which there is no clear procedure (or only an UNUSUALLY long one based on some kind of semi-parametric bootstrapping) for uncertainty estimates. This can be (not always!) a particular pitfall of “ad-hoc” methods, which may at first appear very speedy and/or visually compelling, but then may not have a statistics/probability structure through which to synthesize the significance of the results in an efficient way.
From the ever-quotable Andrew Gelman comes this gem, which he calls a Folk Theorem :
When things are hard to compute, often the model doesn’t fit the data. Difficulties in computation are therefore often model problems… [When the computation isn't working] we have the duty and freedom to think about models.
Once again, the middle of a recent (Aug 30-31, 2007) argument within CHASC, on why physicists and astronomers view “3 sigma” results with suspicion and expect (roughly) > 5 sigma; while statisticians and biologists typically assume 95% is OK:
David van Dyk (representing statistics culture):
Can’t you look at it again? Collect more data?
Vinay Kashyap (representing astronomy and physics culture):
…I can confidently answer this question: no, alas, we usually cannot look at it again!!
Ah. Hmm. To rephrase [the question]: if you have a “7.5 sigma” feature, with a day-long [imaging Markov Chain Monte Carlo] run you can only show that it is “>3sigma”, but is it possible, even with that day-long run, to tell that the feature is
reallyat 7.5sigma — is that the question? Well that would be nice, but I don’t understand how observing again will help?
No one believes any realistic test is properly calibrated that far into the tail. Using 5-sigma is really just a high bar, but the precise calibration will never be done. (This is a reason not to sweet the computation TOO much.)
Most other scientific areas set the bar lower (2 or 3 sigma) BUT don’t really believe the results unless they are replicated.
My assertion is that I find replicated results more convincing than extreme p-values. And the controversial part: Astronomers should aim for replication rather than worry about 5-sigma.
These are from two lively CHASC discussions on classification, or cluster analysis. The first was on Feb 7, 2006; the continuation on Dec 12, 2006, at the Harvard Statistics Department, as part of Stat 310 .
Don’t demand too much of the classes. You’re not going to say that all events can be well-classified…. It’s more descriptive. It gives you places to look. Then you look at your classes.
Then you’re saying the cluster analysis is more like -
It’s really like you have a propsal for classes. You then investigate the physical processes more thoroughly. You may have classes that divide it [up]
But it can make a difference, where you see the clusters, depending on your [parameter] transformation.You can squish the white spaces, and stretch out the crowded spaces; so it can change where you think the clusters are.
But that is interesting.
Yes, that is very interesting.
These are particularly in honor of Hyunsook Lee‘s recent posting of Chattopadhyay et. al.’s new work about possible intrinsic classes of gamma-ray bursts. Are they really physical classes — or do they only appear to be distinct clusters because we view them through the “squished” lens (parameter spaces) of our imperfect instruments?
Some of the lively discussion at the end of the first “Statistical Challenges in Modern Astronomy” conference, at Penn State in 1991, was captured in the proceedings (“General Discussion: Working on the Interface Between Statstics and Astronomy, Terry Speed (Moderator)”, in
Joseph Horowitz (Statistician):
…there should be serious collaboration between astronomers and statisticians. Statisticians should be involved from the beginning as real collaborators, not mere number crunchers. When I collaborate with anybody, astronomer or otherwise, I expect to be a full scientific equal and to get something out of it of value to statistics or mathematics, in addition to making a contribution to the collaborator’s field…
Jasper Wall (Astrophysicist):
…I feel strongly that the knowledge of statistics needs to come very early in the process. It is no good downstream when the paper is written. It is not even much good when you have built the instrument, because we should disabuse statisticians of any impression that the data coming from astronomical instruments are nice, pure, and clean. Each instrument has its very own particular filter, each person using that instrument puts another filter on it and each method of data acquisition does something else yet again. I get more and more concerned particularly at the present time  of data explosion (the observatory I work with is getting 700 MBy per night!). There is discussion of data compression, cleaning on-line, and other treatments even before the observing astronomer gets the data. The knowledge of statistics and the knowledge of what happens to the data need to come extremely early in the process.
“Bayesian” methods have, I think, rightly gained favor in astronomy
as they have in other fields of statistical application. I put “Bayesian” in quotation marks because I do not believe this marks a revival in the sciences in the belief in personal probability. To me it rather means that all information on hand should be used
in model construction, coupled with the view of Box[1979 etc], who considers himself a Bayesian:
Models, of course, are never true but fortunately it is only necessary that they be useful.
The Bayesian paradigm permits one to construct models and hence statistical methods which reflect such information in an, at least in principle, marvellously simple way. A frequentist such as myself feels as at home with these uses of Bayes principle
as any Bayesian.
From Bickel, P. J. “An Overview of SCMA II”, in
[Box 1979] Box, G. E. P. , 1979, “Some Problems of statistics and everyday life”.
Peter Bickle had so many interesting perspectives in his comments at these SCMA conferences that it was hard to choose just one set.
Ten years ago, Astrophysicist John Nousek had this answer to Hyunsook Lee’s question “What is so special about chi square in astronomy?”:
The astronomer must also confront the problem that results need to be published and defended. If a statistical technique has not been widely applied in astronomy before, then there are additional burdens of convincing the journal referees and the community at large that the statistical methods are valid.
Certain techniques which are widespread in astronomy and seem to be accepted without any special justification are: linear and non-linear regression (Chi-Square analysis in general), Kolmogorov-Smirnov tests, and bootstraps. It also appears that if you find it in
Numerical Recipes(Press etal. 1992) that it will be more likely to be accepted without comment.
…Note an insidious effect of this bias, astronomers will often choose to utilize a widely accepted statistical tool, even into regimes where the tool is known to be invalid, just to avoid the problem of developping or researching appropriate tools.
From pg 205, in “Discussion by John Nousek” (of Edward J. Wegman et. al., “Statistical Software, Siftware, and Astronomy”), in
This is from the very interesting Ingrid Daubechies interview by Dorian Devins,
www.nasonline.org/interviews_daubechies, National Academy of Sciences, U.S.A., 2004. It is from part 6, where Ingrid Daubechies speaks of her early mathematics paper on wavelets. She tries to put the impact into context:
I really explained in the paper where things came from. Because, well, the mathematicians wouldn’t have known. I mean, to them this would have been a question that really came out of nowhere. So, I had to explain it …
I was very happy with [the paper]; I had no inkling that it would take off like that… [Of course] the wavelets themselves are used. I mean, more than even that. I explained in the paper how I came to that. I explained both [a] mathematicians way of looking at it and then to some extent the applications way of looking at it. And I think engineers who read that had been emphasizing a lot the use of Fourier transforms. And I had been looking at the spatial domain. It generated a different way of considering this type of construction. I think, that was the major impact. Because then other constructions were made as well. But I looked at it differently. A change of paradigm. Well, paradigm, I never know what that means. A change of … a way of seeing it. A way of paying attention.
Jeff Scargle (in person [top] and in wavelet transform [bottom], left) weighs in on our continuing discussion on how well “automated fitting”/”Machine Learning” can really work (private communication, June 28, 2007):
It is clearly wrong to say that automated fitting of models to data is impossible. Such a view ignores progress made in the area of machine learning and data mining. Of course there can be problems, I believe mostly connected with two related issues:
* Models that are too fragile (that is, easily broken by unusual data)
* Unusual data (that is, data that lie in some sense outside the arena that one expects)
The antidotes are:
(1) careful study of model sensitivity
(2) if the context warrants, preprocessing to remove “bad” points
(3) lots and lots of trial and error experiments, with both data sets that are as realistic as possible and ones that have extremes (outliers, large errors, errors with unusual properties, etc.)
Trial … error … fix error … retry …
You can quote me on that.
This ilustration is from Jeff Scargle’s First GLAST Symposium (June 2007) talk, pg 14, demonstrating the use of inverse area of Voroni tesselations, weighted by the PSF density, as an automated measure of the density of Poisson Gamma-Ray counts on the sky.
I want to use this short quote by Andrew Gelman to highlight many interesting topics at the recent Third Workshop on Monte Carlo Methods. This is part of Andrew Gelman’s empahsis on the fundamental importance of thinking through priors. He argues that “non-informative” priors (explicit, as in Bayes, or implicit, as in some other methods) can in fact be highly constraining, and that weakly informative priors are more honest. At his talk on Monday, May 14, 2007 Andrew Gelman explained:
You want to supply enough structure to let the data speak,
but that’s a tricky thing.
These quotes are in the opposite spirit of the last two Bayesian quotes.
They are from the excellent “R”-based , Tutorial on Non-Parametrics given by
Chad Schafer and Larry Wassserman at the 2006 SAMSI Special Semester on AstroStatistics (or here ).
Chad and Larry were explaining trees:
For more sophistcated tree-searches, you might try Robert Nowak [and his former student, Becca Willett --- especially her "software" pages]. There is even Bayesian CART — Classifcation And Regression Trees. These can take 8 or 9 hours to “do it right”, via MCMC. BUT [these results] tend to be very close to [less rigorous] methods that take only minutes.
Trees are used primarily by doctors, for patients: it is much easier to follow a tree than a kernel estimator, in person.
Trees are much more ad-hoc than other methods we talked about, BUT they are very user friendly, very flexible.
In machine learning, which is only statistics done by computer scientists, they love trees.
This is the second a series of quotes by
Xiao Li Meng , from an introduction to Markov Chain Monte Carlo (MCMC), given to a room full of astronomers, as part of the April 25, 2006 joint meeting of Harvard’s “Stat 310″ and the California-Harvard Astrostatistics Collaboration. This one has a long summary as the lead-in, but hang in there!
Summary first (from earlier in Xiao Li Meng’s presentation):
Let us tackle a harder problem, with the Metropolis Hastings Algorithm.
An example: a tougher distribution, not Normal in [at least one of the dimensions], and multi-modal… FIRST I propose a draw, from an approximate distribution. THEN I compare it to true distribution, using the ratio of proposal to target distribution. The next draw: tells whether to accept the new draw or stay with the old draw.
1/ For original Metropolis algorithm, it looks “geometric” (In the example, we are sampling “x,z”; if the point falls under our xz curve, accept it.)
2/ The speed of algorithm depends on how close you are with the approximation. There is a trade-off with “stickiness”.
How large should say, N be? This is NOT AN EASY PROBLEM! The KEY difficulty: multiple modes in unknown area. We want to know all (major) modes first, as well as estimates of the surrounding areas… [To handle this,] don’t run a single chain; run multiple chains.
Look at between-chain variance; and within-chain variance. BUT there is no “foolproof” here… The starting point should be as broad as possible. Go somewhere crazy. Then combine, either simply as these are independent; or [in a more complicated way as in Meng and Gellman].
And here’s the Actual Quote of the Week:
[Astrophysicist] Aneta Siemiginowska: How do you make these proposals?
[Statistician] Xiao Li Meng: Call a professional statistician like me.
But seriously – it can be hard. But really you don’t need something perfect. You just need something decent.
This is one in a series of quotes by Xiao Li Meng, from an introduction to Markov Chain Monte Carlo (MCMC), given to a room full of astronomers, as part of the April 25, 2006 joint meeting of Harvard’s “Stat 310″ and the California-Harvard Astrostatistics Collaboration:
These MCMC [Markov Chain Monte Carlo] methods are very general.
BUT anytime it is incredibally general, there is something to worry about.
The same is true for bootstrap – it is very general; and easy to misuse.
You can’t think about source detection and feature detection
without thinking of what you are going to use them for. The
ultimate inference problem and source/feature detection need to