Models that relate a set of observable variables to a set of unobserved (latent) variables.
General Principles
In some scenarios, the observed data does not directly reflect the underlying structure or factors influencing the outcome. Instead, latent variables—variables that are not directly observed but are inferred from the data—can help model this hidden structure. These latent variables capture unobserved factors that affect the relationship between predictors (X) and the outcome (Y).
We model the relationship between the predictor variables (X) and the outcome variable (Y) with a latent variable (Z) as follows:
Y = f(X, Z) + \epsilon
Where: - Y is the observed outcome variable. - X is the observed predictor variable(s). - Z is the latent (unobserved) variable, which we aim to infer. - f(X, Z) is the function that relates X and Z to Y. - is the error term, typically assumed to be normally distributed with mean 0 and variance ^2.
The latent variable Z can represent various phenomena, such as group-level effects, time-varying trends, or individual-level factors, that are not captured by the observed predictors alone.
Considerations
In Bayesian regression with latent variables, we consider the uncertainty in both the observed and latent variables. We declare prior distributions for the latent variables, in addition to the usual priors for regression coefficients and intercepts. These latent variables are often modeled using Gaussian distributions (Normal priors) or more flexible distributions such as Multivariate Normal for correlations among the latent variables.
The goal is to infer the posterior distribution over both the parameters and the latent variables, given the observed data.
Example
Below is an example code snippet demonstrating Bayesian regression with latent variables using TensorFlow Probability:
from BI import biimport jax.numpy as jnp# Setup device------------------------------------------------m = bi(platform='cpu')# Data Simulation ------------------------------------------------NY =4# Number of dependent variables or outcomes (e.g., dimensions for latent variables)NV =8# Number of observations or individual-level data points (e.g., subjects)N =100K =5a =0.5# Generate the means and offsets for the data# means: Generate random normal means for each of the NY outcomes# offsets: Generate random normal offsets for each of the NV observationsmeans = m.dist.normal(0, 1, shape=(NY,), sample=True, seed=10)offsets = m.dist.normal(0, 1, shape=(NV, 1), sample=True, seed=20)Y2 = offsets + means# Simulate individual-level random effects (e.g., random slopes or intercepts)# b_individual: A matrix of size (N, K) where N is the number of individuals and K is the number of covariatesb_individual = m.dist.normal(0, 1, shape=(N, K), sample=True, seed=0)# mu: Add an additional effect 'a' to the individual-level random effects 'b_individual'# 'a' could represent a population-level effect or a baselinemu = b_individual + a# Convert Y2 to a JAX array for further computation in a JAX-based frameworkY2 = jnp.array(Y2)# Set data ------------------------------------------------dat =dict( NY = NY, NV = NV, Y2 = Y2)m.data_on_model = dat# Define model ------------------------------------------------def model(NY, NV, Y2): means = m.dist.normal(0, 1, shape=(NY,), name='means') offset = m.dist.normal(0, 1, shape=(NV, 1), name='offset') sigma = m.dist.exponential(1, shape=(NY,), name='sigma') tmp = jnp.tile(means, (NV, 1)).reshape(NV, NY) mu_l = tmp + offset m.dist.normal(mu_l, jnp.tile(sigma, [NV, 1]), obs=Y2)# Run sampler ------------------------------------------------m.fit(model, progress_bar=False)# Summary ------------------------------------------------m.summary()
BI v 0.0.26 package loaded
E0426 12:29:27.527452 2207075 cuda_dnn.cc:523] Loaded runtime CuDNN library: 9.1.0 but source was compiled with: 9.8.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
E0426 12:29:27.529380 2207075 cuda_dnn.cc:523] Loaded runtime CuDNN library: 9.1.0 but source was compiled with: 9.8.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.