The full description is given http://cxc.harvard.edu/ciao3.4/ahelp/bayes.html about “bayes” under sherpa/ciao. Some sentences kept bothering me and here’s my account for the reason given outside of quotes.
A Bayesian maximum likelihood function.
Maximum likelihood function is common for both Bayesian and frequentist methods. I don’t know get the point why “Bayesian” is particularly added with “maximum likelihood function.”
We can relate this likelihood to the Bayesian posterior density for S(i) and B(i)
using Bayes’ Theorem:
p[S(i),B(i) | N(i,S)] = p[S(i)|B(i)] * p[B(i)] * p[N(i,S) | S(i),B(i)] / p[D] .
The factor p[S(i)|B(i)] is the Bayesian prior probability for the source model
amplitude, which is assumed to be constant, and p[D] is an ignorable normalization
constant. The prior probability p[B(i)] is treated differently; we can specify it
using the posterior probability for B(i) off-source:
p[B(i)] = [ A (A B(i))^N(i,B) / N(i,B)! ] * exp[-A B(i)] ,
where A is an “area” factor that rescales the number of predicted background
counts B(i) to the off-source region.
IMPORTANT: this formula is derived assuming that the background is constant as a
function of spatial area, time, etc. If the background is not constant, the Bayes
function should not be used.
Why not? If I rephrase it, what it said is that B(i) is a constant. Then why one bothers to write p[B(i)], a probability density of a constant? The statement sounds self contradictory to me. I guess B(i) is a constant parameter. It would be suitable to write that Background is homogeneous and the Background is describable with homogeneous Poisson process if the above pdf is a correct model for Background. Also, a slight notation change is required. Assuming the Poisson process, we could estimate the background rate (constant parameter) and its density p[B(i)], and this estimate is a constant as stated for p[S(i)|B(i)], a prior probability for the constant source model amplitude.
I think the reason for “Bayes should not used” is that the current sherpa is not capable of executing hierarchical modeling. Nevertheless, I believe one can script the MCMC methodologies with S-Lang/Python to be aggregated with existing sherpa tools to incorporate a possible space dependent density, p[B(i,x,y)]. I was told that currently a constant background regardless of locations and background subtraction is commonly practiced.
To take into account all possible values of B(i), we integrate, or marginalize,
the posterior density p[S(i),B(i) | N(i,S)] over all allowed values of B(i):
p[S(i) | N(i,S)] = (integral)_0^(infinity) p[S(i),B(i) | N(i,S)] dB(i) .
For the constant background case, this integral may be done analytically. We do
not show the final result here; see Loredo. The function -log p[S(i)|N(i,S)] is
minimized to find the best-fit value of S(i). The magnitude of this function
depends upon the number of bins included in the fit and the values of the data
themselves. Hence one cannot analytically assign a `goodness-of-fit’ measure to a
given value of this function. Such a measure can, in principle, be computed by
performing Monte Carlo simulations. One would repeatedly sample new datasets from
the best-fit model, and fit them, and note where the observed function minimum
lies within the derived distribution of minima. (The ability to perform Monte
Carlo simulations is a feature that will be included in a future version of
Note on Background Subtraction
Bayesian computation means one way or the other that one is able to get posterior distributions in the presence of various parameters regardless of their kinds: source or background. I wonder why there’s a discrimination such that source parameter has uncertainty whereas the background is constant and is subtracted (yet marginalization is emulated by subtracting different background counts with corresponding weights). It fell awkward to me. Background counts as well as source counts are Poisson random. I would like to know what justifies constant background while one uses probabilistic approaches via Bayesian methods. I would like to know why the mixture model approach – a mixture of source model and background model with marginalization over background by treating B(i) as a nuisance parameter – has not been tried. By casting eye sights broadly on Bayesian modeling methods and basics of probability, more robustly estimating the source model and their parameters is tractable without subtracting background prior to fitting a source model.
The background should not be subtracted from the data when this function is used
The background only needs to be specified, as in this example:
Specify the fitting statistic and then confirm it has been set. The method is then
changed from “Levenberg-Marquardt” (the default), since this statistic does not
work with that algorithm.
sherpa> STATISTIC BAYES
sherpa> SHOW STATISTIC
sherpa> METHOD POWELL
I would like to know why it’s not working with Levenberg-Marquardt (LM) but working with Powell. Any references that explain why LM does not work with Bayes?
I do look forward your comments and references, particularly reasons for Bayesian maximum likelihood function and Bugs with LM. Also, I look forward to see off the norm approaches such as modeling fully in Bayesian ways (like van Dyk et al. 2001, yet I see its application rarely) or marginalizing Background without subtraction but simultaneously fitting the source model. There are plenty of rooms to be improved in source model fitting under contamination and distortion of x-ray photon incidents through space, telescope, and signal transmission.
- Note that the current sherpa is beta under ciao 4.0 not under ciao 3.4 and a description about “bayes” from the most recent sherpa is not available yet, which means this post needs updates one new release is available[↩]