This was written more than a year ago, and I forgot to post it.
Continue reading ‘[Book] The Elements of Statistical Learning, 2nd Ed.’ »
Archive for the ‘Uncertainty’ Category.
This was written more than a year ago, and I forgot to post it.
This question came to the CfA Public Affairs office, and I am sharing it with y’all because I think the solution is instructive.
A student had to figure out the name of a stellar object as part of an assignment. He was given the following information about it:
- apparent [V] magnitude = 5.76
- B-V = 0.02
- E(B-V) = 0.00
- parallax = 0.0478 arcsec
- radial velocity = -18 km/s
- redshift = 0 km/s
He looked in all the stellar databases but was unable to locate it, so he asked the CfA for help.
Just to help you out, here are a couple of places where you can find comprehensive online catalogs:
See if you can find it!
I often feel irksome whenever I see a function being normalized over a feasible parameter space and it being used as a probability density function (pdf) for further statistical inference. In order to be a suitable pdf, normalization has to be done over a measurable space not over a feasible space. Such practice often yields biased best fits (biased estimators) and improper error bars. On the other hand, validating a measurable space under physics seems complicated. To be precise, we often lost in translation. Continue reading ‘A short note on Probability for astronomers’ »
by Emanuel Parzen in Statistical Science 2004, Vol 19(4), pp.652-662 JSTOR
I teach that statistics (done the quantile way) can be simultaneously frequentist and Bayesian, confidence intervals and credible intervals, parametric and nonparametric, continuous and discrete data. My first step in data modeling is identification of parametric models; if they do not fit, we provide nonparametric models for fitting and simulating the data. The practice of statistics, and the modeling (mining) of data, can be elegant and provide intellectual and sensual pleasure. Fitting distributions to data is an important industry in which statisticians are not yet vendors. We believe that unifications of statistical methods can enable us to advertise, “What is your question? Statisticians have answers!”
I couldn’t help liking this paragraph because of its bitter-sweetness. I hope you appreciate it as much as I did.
I watched a movie in which one of the characters said, “country A has nukes with 80% chance” (perhaps, not 80% but it was a high percentage). One of the statements in that episode is that people will not eat lettuce only if the 1% chance of e coli is reported, even lower. Therefore, with such a high percentage of having nukes, it is right to send troops to A. This episode immediately brought me a thought about astronomers’ null hypothesis probability and their ways of concluding chi-square goodness of fit tests, likelihood ratio tests, or F-tests.
First of all, I’d like to ask how you would like to estimate the chance of having nukes in a country? What this 80% implies here? But, before getting to the question, I’d like to discuss computing the chance of e coli infection, first. Continue reading ‘The chance that A has nukes is p%’ »
Astronomers rely on scatter plots to illustrate correlations and trends among many pairs of variables more than any scientists. Pages of scatter plots with regression lines are often found from which the slope of regression line and errors bars are indicators of degrees of correlation. Sometimes, too many of such scatter plots makes me think that, overall, resources for drawing nice scatter plots and papers where those plots are printed are wasted. Why not just compute correlation coefficients and its error and publicize the processed data for computing correlations, not the full data, so that others can verify the computation results for the sake of validation? A couple of scatter plots are fine but when I see dozens of them, I lost my focus. This is another cultural difference. Continue reading ‘Scatter plots and ANCOVA’ »
- This is not an assuring absolute statement but a personal impression after reading articles of various fields in addition to astronomy. My readings of other fields tell that many rely on correlation statistics but less scatter plots by adding straight lines going through data sets for the purpose of imposing relationships within variable pairs[↩]
ARCH (autoregressive conditional heteroscedasticity) is a statistical model that considers the variance of the current error term to be a function of the variances of the previous time periods’ error terms. I heard that this model made Prof. Engle a Nobel prize recipient. Continue reading ‘[MADS] ARCH’ »
I was at the SUSY 09 public lecture given by a Nobel laureate, Frank Wilczek of QCD (quantum chromodynamics). As far as I know SUSY is the abbreviation of SUperSYmetricity in particle physics. Finding such antimatter(? I’m afraid I read “Angels and Demons” too quickly) will explain the unification theory among electromagnetic, weak, and strong forces and even the gravitation according to the speaker’s graph. I’ll not go into the details of particle physics and the standard model. The reason is too obvious. Instead, I’d like to show this image from wikipedia and to discuss my related questions.
Continue reading ‘how to trace?’ »
Even though I traced the astronomers’ casual usage of the null hypothesis probability in a fashion of reporting outputs from data analysis packages of their choice, there were still some curious cases of the null hypothesis probability that I couldn’t solve. They are quite mysterious to me. Sometimes too much creativity harms the original intention. Here are some examples. Continue reading ‘Curious Cases of the Null Hypothesis Probability’ »
This simple law, despite my trial of full text search, was not showing in ADS. As discussed in systematic errors, astronomers, like physicists, show their error components in two additive terms; statistical error + systematic error. To explain such decomposition and to make error analysis statistically rigorous, the law of total variance (LTV) seems indispensable. Continue reading ‘[MADS] Law of Total Variance’ »
My understandings of “robustness” from the education in statistics and from communicating with astronomers are hard to find a mutual interest. Can anyone help me to build a robust bridge to get over this abyss? Continue reading ‘Robust Statistics’ »
Almost 100 years ago, A.S. Eddington stated in his book Stellar Movements (1914) that
…in calculating the mean error of a series of observations it is preferable to use the simple mean residual irrespective of sign rather than the mean square residual
Such eminent astronomer said already least absolute deviation over chi-square, if I match simple mean residual and mean square residual to relevant methodologies, in order. Continue reading ‘a century ago’ »
I was reading Lehmann’s memoir on his friends and colleagues who influence a great deal on establishing his career. I’m happy to know that his meeting Landau, Courant, and Evans led him to be a statistician; otherwise, we, including astronomers, would have had very different textbooks and statistical thinking would have been different. On the other hand, I was surprised to know that he chose statistics over physics due to his experience from Cambridge (UK). I thought becoming a physicist is more preferred than becoming a statistician during the first half of the 20th century. At least I felt that way, probably it’s because more general science books in physics and physics related historic events were well exposed so that I became to think that physicists are more cooler than other type scientists. Continue reading ‘[Book] The Physicists’ »