Gamma-Poisson Model

Regression
Count Data
GLM
An extension of the Poisson model for count data that exhibits overdispersion.

General Principles

To model the relationship between a count outcome variable and one or more independent variables with overdispersion 🛈, we can use the Negative Binomial model.

Considerations

Caution
  • We have the same considerations as for the Poisson model.

  • Overdispersion is handled because the Gamma-Poisson model assumes that each Poisson count observation has its own rate. This is an additional parameter specified in the model (in the code, it is log_days).

Example

Below is an example code snippet demonstrating a Bayesian Gamma-Poisson model using the Bayesian Inference (BI) package.

Code
from BI import bi
import jax.numpy as jnp

# Setup device ------------------------------------------------
m = bi(platform='cpu') # Import

# Import Data & Data Manipulation ------------------------------------------------
# Import
from importlib.resources import files
data_path =  m.load.sim_gamma_poisson(only_path=True)
m.data(data_path, sep=',') 


# Define model ------------------------------------------------
def model(log_days, monastery, y):
    a = m.dist.normal(0, 1, name = 'a', shape=(1,))
    b = m.dist.normal(0, 1, name = 'b', shape=(1,))
    phi = m.dist.exponential(1, name = 'phi', shape=(1,))
    mu = jnp.exp(log_days + a + b * monastery)
    Lambda =  m.dist.gamma(rate = mu*phi, concentration = phi, name = 'Lambda')
    m.dist.poisson(rate = Lambda, obs=y)
# Run MCMC ------------------------------------------------
m.fit(model, progress_bar=False) # Optimize model parameters through MCMC sampling

# Summary ------------------------------------------------
m.summary() # Get posterior distributions
/home/sosa/work/3.12venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning:

IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
BI v 0.0.45 package loaded
jax.local_device_count 32
mean sd hdi_5.5% hdi_94.5% mcse_mean mcse_sd ess_bulk ess_tail r_hat
Lambda[0] 1.46 0.34 0.92 2.01 0.00 0.01 7637.67 2212.89 1.00
Lambda[1] 1.47 0.34 0.89 1.96 0.00 0.01 7582.21 2835.22 1.00
Lambda[2] 1.63 0.36 1.04 2.17 0.00 0.01 6870.91 2699.35 1.00
Lambda[3] 1.55 0.35 0.99 2.06 0.00 0.01 7910.03 2574.77 1.00
Lambda[4] 1.46 0.35 0.92 2.01 0.00 0.01 8511.18 2837.00 1.01
... ... ... ... ... ... ... ... ... ...
Lambda[3398] 3.13 0.72 2.01 4.26 0.01 0.01 6698.62 2692.70 1.00
Lambda[3399] 2.97 0.71 1.94 4.17 0.01 0.01 5646.74 2587.29 1.00
a[0] -0.41 0.02 -0.44 -0.38 0.00 0.00 593.86 1081.71 1.01
b[0] -2.75 0.03 -2.80 -2.69 0.00 0.00 1049.48 1921.53 1.00
phi[0] 17.39 2.28 14.09 21.36 0.22 0.10 111.96 238.76 1.02

3403 rows × 9 columns

library(BayesianInference)

# Setup platform------------------------------------------------
m=importBI(platform='cpu')

# Import data ------------------------------------------------
m$data(m$load$sim_gamma_poisson(only_path=T), sep = '')
m$data_to_model(list('log_days', 'monastery', 'y' )) # Send to model (convert to jax array)

# Define model ------------------------------------------------
model <- function(log_days, monastery, y){
  # Parameter prior distributions
  alpha = bi.dist.normal(0, 1, name='alpha', shape=c(1))
  beta = bi.dist.normal(0, 1, name='beta', shape=c(1))
  phi = bi.dist.exponential(1, name='phi', shape=c(1))
  mu = jnp$exp(log_days + alpha + beta * monastery)
  Lambda =  bi.dist.gamma(rate = mu*phi, concentration = phi, name = 'Lambda')
  # Likelihood
  bi.dist.poisson(rate=Lambda, obs=y)
}
# Run MCMC ------------------------------------------------
m$fit(model) # Optimize model parameters through MCMC sampling

# Summary ------------------------------------------------
m$summary() # Get posterior distributions
using BayesianInference

# Setup device------------------------------------------------
m = importBI(platform="cpu")

# Import Data & Data Manipulation ------------------------------------------------
# Import
data_path = m.load.sim_gamma_poisson(only_path = true)
m.data(data_path, sep=',')

# Define model ------------------------------------------------
@BI function model(log_days, monastery, y)
    a = m.dist.normal(0, 1, name = "a", shape=(1,))
    b = m.dist.normal(0, 1, name = "b", shape=(1,))
    phi = m.dist.exponential(1, name = "phi", shape=(1,))
    mu = jnp.exp(log_days + a + b * monastery)
    Lambda =  m.dist.gamma(rate = mu*phi, concentration = phi, name = "Lambda")
    m.dist.poisson(rate = Lambda, obs=y)
end

# Run mcmc ------------------------------------------------
m.fit(model)  # Optimize model parameters through MCMC sampling

# Summary ------------------------------------------------
m.summary() # Get posterior distributions

Mathematical Details

Bayesian model

In the Bayesian formulation, we define each parameter with priors 🛈. We can express the Bayesian regression model accounting for prior distributions as follows:

Y_i \sim \text{Poisson}(\lambda_i)

\lambda_i \sim \text{Gamma}(\mu_i \phi, \phi)

\log(\mu_i) = \text{rates}_i + \alpha + \beta X_i

\alpha \sim \text{Normal}(0,1)

\beta \sim \text{Normal}(0,1)

\phi \sim \text{Exponential}(1)

Where:

  • Y_i is the dependent variable for observation i.

  • \lambda_i is the rate parameter of the Poisson distribution for observation i, assuming that each Poisson count observation has its own rate_i.

  • \mu_i is the mean rate parameter.

  • \phi controls the level of overdispersion in the rates.

  • \alpha is the intercept term.

  • \beta is the regression coefficient.

  • X_i is the value of the predictor variable for observation i.

Notes

Note

Reference(s)