Modeling binary outcomes (e.g., successes/failures) across multiple independent trials.
General Principles
To model the relationship between a binary dependent variable—e.g., counts of successes/failures, yes/no, or 1/0—and one or more independent variables, we can use a Binomial model.
This is the first model for which we need a link function: e.g., the logit function. The logit link function converts the linear combination of predictor variables into probabilities, making it suitable for modeling the probability of binary outcomes. It helps to estimate the relationship between predictors and the probability of success, ensuring that model predictions fall within the bounds of the binomial distribution’s success parameter \in(0,1).
Example
Below is an example code snippet that demonstrates Bayesian binomial regression using the Bayesian Inference (BI) package. The data consist of one binary dependent variable (pulled_left), which represents which lever each chimpanzee pulled in an experimental setup. The goal is to evaluate the probability of pulling the left side. This example is based on McElreath (2018).
from BI import bi# setup platform------------------------------------------------m = bi(platform='cpu')# import data ------------------------------------------------# Importfrom importlib.resources import filesdata_path = m.load.chimpanzees(only_path =True)m.data(data_path, sep=';')m.data_to_model(['pulled_left'])# Define model ------------------------------------------------def model(pulled_left): alpha = m.dist.normal( 0, 10, name ='alpha') m.dist.binomial(total_count =1, logits=alpha, obs=pulled_left)# Run sampler ------------------------------------------------m.fit(model, progress_bar=False)# Diagnostic ------------------------------------------------m.summary()
/home/sosa/.local/lib/python3.10/site-packages/arviz/__init__.py:39: FutureWarning:
ArviZ is undergoing a major refactor to improve flexibility and extensibility while maintaining a user-friendly interface.
Some upcoming changes may be backward incompatible.
For details and migration guidance, visit: https://python.arviz.org/en/latest/user_guide/migration_guide.html
BI v 0.0.26 package loaded
E0424 14:56:29.718087 1316322 cuda_dnn.cc:523] Loaded runtime CuDNN library: 9.1.0 but source was compiled with: 9.8.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
E0424 14:56:29.722156 1316322 cuda_dnn.cc:523] Loaded runtime CuDNN library: 9.1.0 but source was compiled with: 9.8.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
jax.local_device_count 32
mean
sd
hdi_5.5%
hdi_94.5%
mcse_mean
mcse_sd
ess_bulk
ess_tail
r_hat
alpha
0.32
0.09
0.18
0.47
0.0
0.0
1758.15
2257.24
1.0
library(BayesianInference)# setup platform------------------------------------------------m=importBI(platform='cpu')# import data ------------------------------------------------m$data(m$load$chimpanzees(only_path = T), sep=';')m$data_to_model(list('pulled_left')) # Send to model (convert to jax array)# Define model ------------------------------------------------model <-function(pulled_left){# Parameters priors distributions alpha =bi.dist.normal( 0, 10, name ='alpha')# Likelihoodbi.dist.binomial(total_count =1, logits = alpha, obs=pulled_left)}# Run MCMC ------------------------------------------------m$fit(model) # Optimize model parameters through MCMC sampling# Summary ------------------------------------------------m$summary() # Get posterior distribution
usingBayesianInference# Setup device------------------------------------------------m =importBI(platform="cpu")# Import Data & Data Manipulation ------------------------------------------------# Importdata_path = m.load.chimpanzees(only_path =true)m.data(data_path, sep=';')# Define model ------------------------------------------------@BIfunctionmodel(pulled_left) a = m.dist.normal( 0, 10, shape=(1,), name ="a") m.dist.binomial(total_count =1, logits=a[0], obs=pulled_left)end# Run mcmc ------------------------------------------------m.fit(model) # Optimize model parameters through MCMC sampling# Summary ------------------------------------------------m.summary() # Get posterior distributions
Mathematical Details
Bayesian formulation
We can express the Bayesian Binomial regression model including prior distributions as follows:
Y_i \sim \text{Binomial}(N_i, p_i)
logit(p_i) = \alpha + \beta X_i
\alpha \sim \text{Normal}(0,1)
\beta \sim \text{Normal}(0,1)
Where:
Y_i is the count of successes for observation i (often a binary 0 or 1).
N_i is the count of trials for observation i (1 in the case of binary outcomes, as in the example for total_count above).
p_i is the probability of success (0 < p_i < 1) for observation i, the probability of a success.
logit(p_i) is the log-odds of success, calculated as the log of the odds ratio of success. Through this link function, the relationship between the independent variables and the log-odds of success is modeled linearly, allowing us to interpret the effect of each independent variable on the log-odds of success for observation i.
\beta and \alpha are the regression coefficient and intercept, respectively.
Notes
Note
We can apply multiple variables similarly to chapter 2.
We can apply interaction terms similarly to chapter 3.
We can apply categorical variables similarly to chapter 4.
Below is an example code snippet demonstrating a Bayesian binomial model for multiple categorical variables using the Bayesian Inference (BI) package. The data consist of one binary dependent variable (pulled_left), which represents which side individuals pulled, and three independent variables (actor, side, cond). The goal is to evaluate, for each individual, the probability of pulling the left side, accounting for whether the individual is left-handed or right-handed, as well as for the different conditions.
from BI import bim = bi(platform='cpu')m.data('../resources/data/chimpanzees.csv', sep=';')m.df['treatment'] = m.df.prosoc_left +2* m.df.conditionm.df['actor'] = m.df['actor'] -1m.data_to_model(['actor', 'treatment', 'pulled_left'])def model(actor, treatment, pulled_left): a = m.dist.normal(0, 1.5, shape = (7,), name='a') b = m.dist.normal(0, 0.5, shape = (4,), name='b') p = a[actor] + b[treatment] m.dist.binomial(total_count =1, logits=p, obs=pulled_left)# Run sampler ------------------------------------------------m.fit(model)# Diagnostic ------------------------------------------------m.summary()
library(BayesianInference)# setup platform------------------------------------------------m=importBI(platform='cpu')# import data ------------------------------------------------m$data(paste(system.file(package ="BayesianInference"),"/data/chimpanzees.csv", sep =''), sep=';')m$data_to_model(list('pulled_left')) # Send to model (convert to jax array)# Define model ------------------------------------------------model <-function(pulled_left){# Parameters priors distributions a =bi.dist.normal( 0, 1.5, name ='a') b =bi.dist.normal( 0, 0.5, name ='b') p = a[actor] + b[treatment]# Likelihood m$binomial(total_count =1, logits = alpha, obs=pulled_left)}# Run MCMC ------------------------------------------------m$fit(model) # Optimize model parameters through MCMC sampling# Summary ------------------------------------------------m$summary() # Get posterior distribution
Reference(s)
McElreath, Richard. 2018. Statistical Rethinking: A Bayesian course with examples in R and Stan. Chapman; Hall/CRC.