Multivariate Linear Regression

Regression
Extending linear regression to model a continuous outcome using multiple predictor variables.

General Principles

To study relationships between multiple continuous independent variables (e.g., the effect of weight and age on height), we can use a multiple regression approach. Essentially, we extend Linear Regression for continuous variable by adding a regression coefficient \beta_x for each continuous variable (e.g., \beta_{weight} and \beta_{age}).

Considerations

Note
  • We have the same considerations as for the Regression for continuous variable.

  • The model interpretation of the regression coefficients \beta_x is considered for fixed values of the other independent variable(s)’ regression coefficientsβ€”i.e., for a given age, \beta_{weight} represents the expected change in the dependent variable (height) for each one-unit increase in weight, holding all other variables (e.g., age) constant.

Example

Below is example code demonstrating Bayesian multiple linear regression using the Bayesian Inference (BI) package. Data consist of three continuous variables (height, weight, age), and the goal is to estimate the effect of weight and age on height. This example is based on McElreath (2018).

Code
from BI import bi

# Setup device------------------------------------------------
m = bi(platform='cpu')

# Import Data & Data Manipulation ------------------------------------------------
from importlib.resources import files
# Import
data_path = m.load.howell1(only_path = True)
m.data(data_path, sep=';') 
m.df = m.df[m.df.age > 18] # Subset data to adults
m.scale(['weight', 'age']) # Normalize

# Define model ------------------------------------------------
def model(height, weight, age):
    # Parameter prior distributions
    alpha = m.dist.normal(0, 0.5, name = 'alpha')    
    beta1 = m.dist.normal(0, 0.5, name = 'beta1')
    beta2 = m.dist.normal(0, 0.5, name = 'beta2')
    sigma = m.dist.uniform(0, 50, name = 'sigma')
    # Likelihood
    m.dist.normal(alpha + beta1 * weight + beta2 * age, sigma, obs = height)

# Run MCMC ------------------------------------------------
m.fit(model) # Optimize model parameters through MCMC sampling

# Summary ------------------------------------------------
m.summary()
jax.local_device_count 16
  0%|          | 0/1000 [00:00<?, ?it/s]warmup:   0%|          | 1/1000 [00:01<26:22,  1.58s/it, 1 steps of size 2.34e+00. acc. prob=0.00]warmup:  11%|β–ˆ         | 107/1000 [00:01<00:10, 88.20it/s, 7 steps of size 6.58e-01. acc. prob=0.78]warmup:  17%|β–ˆβ–‹        | 169/1000 [00:01<00:05, 143.62it/s, 7 steps of size 1.56e+00. acc. prob=0.78]warmup:  24%|β–ˆβ–ˆβ–       | 238/1000 [00:04<00:14, 53.33it/s, 7 steps of size 6.97e-01. acc. prob=0.78] warmup:  34%|β–ˆβ–ˆβ–ˆβ–Ž      | 335/1000 [00:04<00:07, 93.86it/s, 15 steps of size 3.48e-01. acc. prob=0.78]warmup:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 452/1000 [00:04<00:03, 157.77it/s, 1 steps of size 1.30e+00. acc. prob=0.79]sample:  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 555/1000 [00:04<00:01, 225.90it/s, 7 steps of size 6.01e-01. acc. prob=0.93]sample:  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 662/1000 [00:04<00:01, 310.44it/s, 7 steps of size 6.01e-01. acc. prob=0.92]sample:  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 786/1000 [00:04<00:00, 426.85it/s, 7 steps of size 6.01e-01. acc. prob=0.92]sample:  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 910/1000 [00:04<00:00, 550.32it/s, 7 steps of size 6.01e-01. acc. prob=0.92]sample: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1000/1000 [00:04<00:00, 204.61it/s, 3 steps of size 6.01e-01. acc. prob=0.92]
arviz - WARNING - Shape validation failed: input_shape: (1, 500), minimum_shape: (chains=2, draws=4)
mean sd hdi_5.5% hdi_94.5% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 5.20 0.49 4.46 6.06 0.02 0.02 469.94 349.45 NaN
beta1 0.20 0.51 -0.60 1.03 0.02 0.03 570.81 264.55 NaN
beta2 -0.02 0.49 -0.89 0.69 0.02 0.02 576.00 338.15 NaN
sigma 49.98 0.02 49.96 50.00 0.00 0.00 579.62 271.13 NaN
library(BayesianInference)
m=importBI(platform='cpu')

# Import Data & Data Manipulation ------------------------------------------------
m$data(m$load$howell1(only_path = T), sep=';')# Import
m$df = m$df[m$df$age > 18,] # Subset data to adults
m$scale(list('weight', 'age')) # Normalize
m$data_to_model(list('weight', 'height', 'age')) # Send to model (convert to jax array)

# Define model ------------------------------------------------
model <- function(height, weight, age){
  # Parameter prior distributions
  alpha = bi.dist.normal(0, 0.5, name = 'a')
  beta1 = bi.dist.normal(0, 0.5, name = 'b1')
  beta2 = bi.dist.normal(0, 0.5, name = 'b2')   
  sigma = bi.dist.uniform(0, 50, name = 's')
  # Likelihood
  bi.dist.normal(alpha + beta1 * weight + beta2 * age, sigma, obs=height)
}

# Run MCMC ------------------------------------------------
m$fit(model) # Optimize model parameters through MCMC sampling

# Summary ------------------------------------------------
m$summary() # Get posterior distributions
using BayesianInference

# Setup device------------------------------------------------
m = importBI(platform="cpu")

# Import Data & Data Manipulation ------------------------------------------------
# Import
data_path = m.load.howell1(only_path = true)
m.data(data_path, sep=';') 
m.df = m.df[m.df.age > 18] # Subset data to adults
m.scale(["weight", "age"]) # Normalize

# Define model ------------------------------------------------
@BI function model(height, weight, age)
    # Parameter prior distributions
    alpha = m.dist.normal(0, 0.5, name = "alpha")    
    beta1 = m.dist.normal(0, 0.5, name = "beta1")
    beta2 = m.dist.normal(0, 0.5, name = "beta2")
    sigma = m.dist.uniform(0, 50, name = "sigma")
    # Likelihood
    m.dist.normal(alpha + beta1 * weight + beta2 * age, sigma, obs = height)
end

# Run mcmc ------------------------------------------------
m.fit(model)  # Optimize model parameters through MCMC sampling

# Summary ------------------------------------------------
m.summary() # Get posterior distributions
Caution

For R users, if you create the regression coefficient in a single call:

betas = bi.dist.normal(0, 0.5, name = 'regression_coefficients', shape = (2,))

you need to index them starting by 0:

 m$normal(alpha + betas[0] * weight + betas[1] * age, sigma, obs=height)

Mathematical Details

Frequentist formulation

We model the relationship between the independent variables (X_{1i}, X_{2i}, ..., X_{[K,i]}) and the dependent variable Y using the following equation:

π‘Œ_i = \alpha +\beta_1 𝑋_{[1,i]} + \beta_2 𝑋_{[2,i]} + ... + \beta_n 𝑋_{[K,i]} + \epsilon_i

Where:

  • Y_i is the dependent variable for observation i.

  • \alpha is the intercept term.

  • X_{[1,i]}, X_{[2,i]}, …, X_{[K,i]} are the values of the independent variables for observation i.

  • \beta_1, \beta_2, …, \beta_K are the regression coefficients.

  • \epsilon_i is the error term for observation i, and the vector of the error terms, \epsilon, are assumed to be independent and identically distributed.

Bayesian formulation

In the Bayesian formulation, we define each parameter with priors πŸ›ˆ. We can express the Bayesian model as follows:

π‘Œ_i \sim \text{Normal}(\alpha + \sum_{k=1}^K \beta_k X_{[K,i]}, Οƒ)

\alpha \sim \text{Normal}(0,1)

\beta_k \sim \text{Normal}(0,1)

Οƒ \sim \text{Uniform}(0, 50)

Where:

  • Y_i is the dependent variable for observation i.

  • \alpha is the intercept term, which in this case has a unit-normal prior.

  • \beta_k are slope coefficients for the K distinct independent variables, which also have unit-normal priors.

  • X_{[1,i]}, X_{[2,i]}, …, X_{[K,i]} are the values of the independent variables for observation i.

  • \sigma is a standard deviation parameter, which here has a Uniform prior that constrains it to be positive.

Reference(s)

McElreath, Richard. 2018. Statistical Rethinking: A Bayesian course with examples in R and Stan. Chapman; Hall/CRC.