Extending linear regression to model a continuous outcome using multiple predictor variables.
General Principles
To study relationships between multiple continuous independent variables (e.g., the effect of weight and age on height), we can use a multiple regression approach. Essentially, we extend Linear Regression for continuous variable by adding a regression coefficient \beta_x for each continuous variable (e.g., \beta_{weight} and \beta_{age}).
The model interpretation of the regression coefficients \beta_x is considered for fixed values of the other independent variable(s)’ regression coefficients—i.e., for a given age, \beta_{weight} represents the expected change in the dependent variable (height) for each one-unit increase in weight, holding all other variables (e.g., age) constant.
Example
Below is example code demonstrating Bayesian multiple linear regression using the Bayesian Inference (BI) package. Data consist of three continuous variables (height, weight, age), and the goal is to estimate the effect of weight and age on height. This example is based on McElreath (2018).
from BI import bi# Setup device------------------------------------------------m = bi(platform='cpu')# Import Data & Data Manipulation ------------------------------------------------from importlib.resources import files# Importdata_path = m.load.howell1(only_path =True)m.data(data_path, sep=';') m.df = m.df[m.df.age >18] # Subset data to adultsm.scale(['weight', 'age']) # Normalize# Define model ------------------------------------------------def model(height, weight, age):# Parameter prior distributions alpha = m.dist.normal(0, 0.5, name ='alpha') beta1 = m.dist.normal(0, 0.5, name ='beta1') beta2 = m.dist.normal(0, 0.5, name ='beta2') sigma = m.dist.uniform(0, 50, name ='sigma')# Likelihood m.dist.normal(alpha + beta1 * weight + beta2 * age, sigma, obs = height)# Run MCMC ------------------------------------------------m.fit(model, progress_bar=False) # Optimize model parameters through MCMC sampling# Summary ------------------------------------------------m.summary()
/home/sosa/work/3.12venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning:
IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
BI v 0.0.45 package loaded
jax.local_device_count 32
mean
sd
hdi_5.5%
hdi_94.5%
mcse_mean
mcse_sd
ess_bulk
ess_tail
r_hat
alpha
5.17
0.49
4.39
5.94
0.01
0.01
4409.28
2652.03
1.0
beta1
0.19
0.50
-0.52
1.09
0.01
0.01
4201.55
2920.05
1.0
beta2
-0.03
0.49
-0.80
0.75
0.01
0.01
3731.14
2847.59
1.0
sigma
49.98
0.02
49.96
50.00
0.00
0.00
3204.17
1962.63
1.0
library(BayesianInference)m=importBI(platform='cpu')# Import Data & Data Manipulation ------------------------------------------------m$data(m$load$howell1(only_path = T), sep=';')# Importm$df = m$df[m$df$age >18,] # Subset data to adultsm$scale(list('weight', 'age')) # Normalizem$data_to_model(list('weight', 'height', 'age')) # Send to model (convert to jax array)# Define model ------------------------------------------------model <-function(height, weight, age){# Parameter prior distributions alpha =bi.dist.normal(0, 0.5, name ='a') beta1 =bi.dist.normal(0, 0.5, name ='b1') beta2 =bi.dist.normal(0, 0.5, name ='b2') sigma =bi.dist.uniform(0, 50, name ='s')# Likelihoodbi.dist.normal(alpha + beta1 * weight + beta2 * age, sigma, obs=height)}# Run MCMC ------------------------------------------------m$fit(model) # Optimize model parameters through MCMC sampling# Summary ------------------------------------------------m$summary() # Get posterior distributions
usingBayesianInference# Setup device------------------------------------------------m =importBI(platform="cpu")# Import Data & Data Manipulation ------------------------------------------------# Importdata_path = m.load.howell1(only_path =true)m.data(data_path, sep=';') m.df = m.df[m.df.age >18] # Subset data to adultsm.scale(["weight", "age"]) # Normalize# Define model ------------------------------------------------@BIfunctionmodel(height, weight, age)# Parameter prior distributions alpha = m.dist.normal(0, 0.5, name ="alpha") beta1 = m.dist.normal(0, 0.5, name ="beta1") beta2 = m.dist.normal(0, 0.5, name ="beta2") sigma = m.dist.uniform(0, 50, name ="sigma")# Likelihood m.dist.normal(alpha + beta1 * weight + beta2 * age, sigma, obs = height)end# Run mcmc ------------------------------------------------m.fit(model) # Optimize model parameters through MCMC sampling# Summary ------------------------------------------------m.summary() # Get posterior distributions
Caution
For R users, if you create the regression coefficient in a single call:
betas = bi.dist.normal(0, 0.5, name ='regression_coefficients', shape = (2,))
We model the relationship between the independent variables (X_{1i}, X_{2i}, ..., X_{[K,i]}) and the dependent variable Y using the following equation:
X_{[1,i]}, X_{[2,i]}, …, X_{[K,i]} are the values of the independent variables for observation i.
\beta_1, \beta_2, …, \beta_K are the regression coefficients.
\epsilon_i is the error term for observation i, and the vector of the error terms, \epsilon, are assumed to be independent and identically distributed.
Bayesian formulation
In the Bayesian formulation, we define each parameter with priors 🛈. We can express the Bayesian model as follows: