Posts tagged ‘nonparametric’

[ArXiv] Voronoi Tessellations

As a part of exploring spatial distribution of particles/objects, not to approximate via Poisson process or Gaussian process (parametric), nor to impose hypotheses such as homogenous, isotropic, or uniform, various nonparametric methods somewhat dragged my attention for data exploration and preliminary analysis. Among various nonparametric methods, the one that I fell in love with is tessellation (state space approaches are excluded here). Computational speed wise, I believe tessellation is faster than kernel density estimation to estimate level sets for multivariate data. Furthermore, conceptually constructing polygons from tessellation is intuitively simple. However, coding and improving algorithms is beyond statistical research (check books titled or key-worded partially by computational geometry). Good news is that for computation and getting results, there are some freely available softwares, packages, and modules in various forms. Continue reading ‘[ArXiv] Voronoi Tessellations’ »

Scatter plots and ANCOVA

Astronomers rely on scatter plots to illustrate correlations and trends among many pairs of variables more than any scientists[1]. Pages of scatter plots with regression lines are often found from which the slope of regression line and errors bars are indicators of degrees of correlation. Sometimes, too many of such scatter plots makes me think that, overall, resources for drawing nice scatter plots and papers where those plots are printed are wasted. Why not just compute correlation coefficients and its error and publicize the processed data for computing correlations, not the full data, so that others can verify the computation results for the sake of validation? A couple of scatter plots are fine but when I see dozens of them, I lost my focus. This is another cultural difference. Continue reading ‘Scatter plots and ANCOVA’ »

  1. This is not an assuring absolute statement but a personal impression after reading articles of various fields in addition to astronomy. My readings of other fields tell that many rely on correlation statistics but less scatter plots by adding straight lines going through data sets for the purpose of imposing relationships within variable pairs[]

Goodness-of-fit tests

When it comes to applying statistics for measuring goodness-of-fit, the Pearson χ2 test is the dominant player in a race and the Kolmogorov-Smirnoff test statistic trails far behind. Although it seems almost invisible in this race, there are more various non-parametric statistics for testing goodness-of-fit and for comparing the sampling distribution to a reference distribution as legitimate race participants trained by many statisticians. Listing their names probably useful to some astronomers when they find the underlying assumptions for the χ2 test do not match the data. Perhaps, some astronomers want to try other nonparametric test statistics other than the K-S test. I’ve seen other test statistics in astronomical journals from time to time. Depending on data and statistical properties, one test statistic could work better than the other; therefore, it’s worthwhile to keep the variety in one’s mind that there are other tests beyond the χ2 test goodness-of-fit test statistic. Continue reading ‘Goodness-of-fit tests’ »

[MADS] data depth

How would you assign orders to multivariate data? If you have your strategy to achieve this ordering task, I’d like to ask, “is your strategy affine invariant?” meaning that shift and rotation invariant. Continue reading ‘[MADS] data depth’ »

Robust Statistics

My understandings of “robustness” from the education in statistics and from communicating with astronomers are hard to find a mutual interest. Can anyone help me to build a robust bridge to get over this abyss? Continue reading ‘Robust Statistics’ »

[MADS] Semiparametric

There were (only) four articles from ADS whose abstracts contain the word semiparametric (none in titles). Therefore, semiparametric is not exactly [MADS] but almost [MADS]. One would like to say it is virtually [MADS] or quasi [MADS]. By introducing the term and providing rare examples in astronomy, I hope this scarce term semiparametric to be used adequately against its misguidance of astronomers to inappropriate usage for statistical inference with their data. Continue reading ‘[MADS] Semiparametric’ »

missing data

The notions of missing data are overall different between two communities. I tend to think missing data carry as good amount of information as observed data. Astronomers…I’m not sure how they think but my impression so far is that a missing value in one attribute/variable from a object/observation/informant, all other attributes related to that object become useless because that object is not considered in scientific data analysis or model evaluation process. For example, it is hard to find any discussion about imputation in astronomical publication or statistical justification of missing data with respect to inference strategies. On the contrary, they talk about incompleteness within different variables. Putting this vague argument with a concrete example, consider a catalog of multiple magnitudes. To draw a color magnitude diagram, one needs both color and magnitude. If one attribute is missing, that star will not appear in the color magnitude diagram and any inference methods from that diagram will not include that star. Nonetheless, one will trying to understand how different proportions of stars are observed according to different colors and magnitudes. Continue reading ‘missing data’ »

[ArXiv] 2nd week, Nov. 2007

There should be at least one paper that drags your attention. Various statistics topics appeared in astro-ph this week.
Continue reading ‘[ArXiv] 2nd week, Nov. 2007’ »

[ArXiv] An unbiased estimator, May 29, 2007

From arxiv/astro-ph:0705.4199v1
In search of an unbiased temperature estimator for statistically poor X-ray spectra
A. Leccardi and S. Molendi

There was a delay of writing about this paper, which by accident was lying under the pile of papers irrelevant to astrostatistics. (It has been quite overwhelming to track papers with various statistical applications and papers with rooms left for statistical improvements from arxiv:astro-ph). Although there is a posting about this paper (see Vinay’s posting), I’d like to give a shot. I was very excited because I haven’t seen any astronomical papers discussing unbiased estimators solely.
Continue reading ‘[ArXiv] An unbiased estimator, May 29, 2007’ »

Astrostatistics: Goodness-of-Fit and All That!

During the International X-ray Summer School, as a project presentation, I tried to explain the inadequate practice of χ^2 statistics in astronomy. If your best fit is biased (any misidentification of a model easily causes such bias), do not use χ^2 statistics to get 1σ error for the 68% chance of capturing the true parameter.

Later, I decided to do further investigation on that subject and this paper came along: Astrostatistics: Goodness-of-Fit and All That! by Babu and Feigelson.
Continue reading ‘Astrostatistics: Goodness-of-Fit and All That!’ »

[ArXiv] SDSS DR6, July 23, 2007

From arxiv/astro-ph:0707.3413
The Sixth Data Release of the Sloan Digital Sky Survey by … many people …

The sixth data release of the Sloan Digital Sky Survey (SDSS DR6) is available at Additionally, Catalog Archive Service (CAS) and
SQL interface to access the catalog would be useful to data searching statisticians. Simple SQL commends, which are well documented, could narrow down the size of data and the spatial coverage.
Continue reading ‘[ArXiv] SDSS DR6, July 23, 2007’ »