Atoms Module#
Atoms for modeling optimization problems.
Base Classes#
- class rlaopt.atoms.Atom(exprs, buffers, variable_names=None)[source]#
Bases:
Expression,ABCAbstract base class for optimization atoms.
An atom represents a mathematical function that can be used in optimization problems. Atoms have various properties (smooth, proxable) and can be composed to form more complex objective functions.
- Atoms extend Expression with:
Input registration system for Variables and Expressions
Buffer management for constants and hyperparameters
- Subclasses must implement:
is_smooth() - whether the function is differentiable everywhere
is_proxable() - whether the proximal operator is computable
forward() - evaluation of the atom
_prox() - prox operator of the atom
- Parameters:
- __init__(exprs, buffers, variable_names=None)[source]#
Initialize the atom.
Subclasses should call this constructor to ensure proper initialization.
- abstractmethod is_proxable()[source]#
Check if the expression has a computable proximal operator.
The proximal operator is used in proximal gradient methods and ADMM for non-smooth optimization. An expression is proxable if its proximal operator can be computed efficiently in closed form.
- Returns:
True if the expression is proxable, False otherwise.
- Return type:
Regularizers#
- class rlaopt.atoms.L1Norm(x, scaling=1.0)[source]#
Bases:
NonsmoothRegularizerL1-norm regularization atom.
Computes the scaled L1-norm: scaling * ||x||₁ = scaling * Σᵢ |xᵢ|
- Parameters:
x (Expression) – Expression to apply the L1-norm to.
scaling (float | Tensor) – Scaling factor for the L1-norm (default: 1.0).
Examples
>>> x = Variable((100,), name='weights') >>> l1 = L1Norm(x, scaling=0.01) >>> penalty = l1.forward()
- __init__(x, scaling=1.0)[source]#
Initializes the L1-norm atom with optional scaling.
- Parameters:
x (Expression) – Expression to apply the L1-norm to.
scaling (float | Tensor) – Scaling factor for the L1-norm (default: 1.0).
- class rlaopt.atoms.L2Norm(x, scaling=1.0)[source]#
Bases:
NonsmoothRegularizerL2-norm regularization atom.
Computes the scaled L2-norm: scaling * ||x||₂ = scaling * sqrt(Σᵢ xᵢ²)
- Parameters:
x (Expression) – Expression to apply the L2-norm to.
scaling (float | Tensor) – Scaling factor for the L2-norm (default: 1.0).
Examples
>>> x = Variable((100,), name='weights') >>> l2 = L2Norm(x, scaling=0.01) >>> penalty = l2.forward()
- __init__(x, scaling=1.0)[source]#
Initializes the L2-norm atom with optional scaling.
- Parameters:
x (Expression) – Expression to apply the L2-norm to.
scaling (float | Tensor) – Scaling factor for the L2-norm (default: 1.0).
- class rlaopt.atoms.SumSquares(x)[source]#
Bases:
AtomSum of squared elements atom.
- Parameters:
x (Expression)
- __init__(x)[source]#
Initializes the sum squared atom.
- Parameters:
x (Expression) – Expression to apply the sum of squares to.
- class rlaopt.atoms.ElasticNet(x, l1_scaling=1.0, l2_scaling=1.0)[source]#
Bases:
NonsmoothRegularizerElastic net regularization combining L1 and L2 penalties.
- The elastic net penalty is defined as:
l1_scaling * ||x||₁ + (l2_scaling / 2) * ||x||₂²
- Parameters:
x (Expression) – Expression to apply the elastic net penalty to.
l1_scaling (float | Tensor) – Scaling factor for the L1-norm penalty. Defaults to 1.0.
l2_scaling (float | Tensor) – Scaling factor for the L2-norm penalty. Defaults to 1.0.
- Raises:
TypeError – If x is not an Expression.
Examples
>>> x = Variable((100,), name='weights') >>> # Standard elastic net with equal L1 and L2 contribution >>> elastic = ElasticNet(x, l1_scaling=0.5, l2_scaling=0.5) >>> penalty = elastic.forward()
>>> # Lasso-like (emphasize sparsity) >>> elastic_lasso = ElasticNet(x, l1_scaling=1.0, l2_scaling=0.1)
>>> # Ridge-like (emphasize smoothness) >>> elastic_ridge = ElasticNet(x, l1_scaling=0.1, l2_scaling=1.0)
- __init__(x, l1_scaling=1.0, l2_scaling=1.0)[source]#
Initialize the elastic net atom.
- Parameters:
x (Expression) – Expression to apply the elastic net penalty to.
l1_scaling (float | Tensor) – Scaling factor for the L1-norm penalty. Defaults to 1.0.
l2_scaling (float | Tensor) – Scaling factor for the L2-norm penalty. Defaults to 1.0.
- class rlaopt.atoms.NucNorm(x, scaling=1.0)[source]#
Bases:
NonsmoothRegularizerNuclear norm (sum of singular values) of a matrix variable.
The nuclear norm is defined as the sum of the singular values of a matrix. It is commonly used as a convex relaxation of the rank function in low-rank matrix optimization problems.
The atom computes: scaling * ||X||_* = scaling * Σᵢ σᵢ(X) where σᵢ(X) are the singular values of X.
- Parameters:
x (Expression) – Expression to apply the nuclear norm to.
scaling (float | Tensor) – Scaling factor for the nuclear norm. Defaults to 1.0.
- Raises:
TypeError – If x is not an Expression.
Examples
>>> X = Variable((10, 5), name='X') >>> nuc_norm = NucNorm(X, scaling=0.1) >>> loss = nuc_norm.forward()
- __init__(x, scaling=1.0)[source]#
Initialize the nuclear norm atom.
- Parameters:
x (Expression) – 2D matrix expression to apply the nuclear norm to.
scaling (float | Tensor) – Scaling factor for the nuclear norm. Defaults to 1.0.
Constraints#
- class rlaopt.atoms.Box(x, lower=None, upper=None)[source]#
Bases:
PolyhedronBox constraint atom representing elementwise bounds.
A box constraint restricts each element of a variable to lie within specified lower and upper bounds: lower <= x <= upper.
This is a special case of Polyhedra with identity inequality constraints (C = I), but with an efficient closed-form proximal operator (projection via clamping).
- Parameters:
Examples
>>> # Standard box constraint: 0 <= x <= 1 >>> x = Variable((10,), name='x') >>> box = Box(x, lower=0.0, upper=1.0)
>>> # One-sided constraint: x >= 0 (non-negativity) >>> box_nonneg = Box(x, lower=0.0)
>>> # One-sided constraint: x <= 1 >>> box_upper = Box(x, upper=1.0)
>>> # Use proximal operator for projection >>> out_of_bounds = torch.randn(10) >>> projected = box.prox(out_of_bounds, prox_scaling=1.0)
- class rlaopt.atoms.NonNegative(x)[source]#
Bases:
BoxNon-negativity constraint atom enforcing x >= 0.
A convenience class for box constraints with lower bound of zero and no upper bound, enforcing that all elements of the variable are non-negative.
This is equivalent to Box(x, lower=0, upper=None) but provides a more semantic interface for this common constraint.
- Parameters:
x (Variable) – Variable to constrain to be non-negative.
Examples
>>> # Constrain variable to be non-negative >>> x = Variable((10,), name='x') >>> nonneg = NonNegative(x)
>>> # Use proximal operator for projection >>> point_with_negatives = torch.randn(10) >>> projected = nonneg.prox(point_with_negatives, prox_scaling=1.0) >>> # All negative values are clamped to 0
- class rlaopt.atoms.LinearEquality(x, A, b)[source]#
Bases:
PolyhedronLinear equality constraint atom enforcing A @ x = b.
Represents a system of linear equality constraints. Unlike the general Polyhedron class, this provides an efficient closed-form proximal operator (projection onto the affine subspace) via QR factorization. Checks for feasibility at initialization, by determining the rank of A via QR factorization. If A is not full-row rank, then an error is raised. When A is full-row rank, the R factor is cached and used for efficienctly computing the projection onto the constraint set.
The projection solves: argmin_z ||z - location||² subject to A @ z = b
- Parameters:
Examples
>>> # Single equality constraint: x[0] + x[1] = 1 >>> x = Variable((2,), name='x') >>> A = torch.tensor([[1.0, 1.0]]) >>> b = torch.tensor([1.0]) >>> constraint = LinearEquality(x, A, b)
>>> # Multiple equality constraints >>> x = Variable((5,), name='x') >>> A = torch.randn(3, 5) >>> b = torch.randn(3) >>> constraint = LinearEquality(x, A, b)
>>> # Use proximal operator for projection onto affine subspace >>> unconstrained_point = torch.randn(5) >>> projected = constraint.prox(unconstrained_point, prox_scaling=1.0) >>> # Verify: A @ projected should equal b
- class rlaopt.atoms.Halfspace(x, c, upper)[source]#
Bases:
PolyhedronHalfspace constraint atom representing a linear inequality.
A halfspace constraint restricts a variable to satisfy a linear inequality: c^T x <= upper, which defines a half-space in the variable space.
This is a special case of Polyhedra with a single linear inequality constraint, but with an efficient closed-form proximal operator (projection onto the halfspace).
- Parameters:
Examples
>>> # Constraint: c^T x <= 1 >>> x = Variable((5,), name='x') >>> c = torch.randn(5) >>> halfspace = Halfspace(x, c=c, upper=1.0)
>>> # Non-negativity for first coordinate: x[0] >= 0 >>> # Rewritten as: -x[0] <= 0, so c = [-1, 0, 0, ...], upper = 0 >>> c = torch.zeros(5) >>> c[0] = -1.0 >>> nonneg = Halfspace(x, c=c, upper=torch.tensor(0.0))
>>> # Use proximal operator for projection >>> violating_point = torch.randn(5) >>> projected = halfspace.prox(violating_point, prox_scaling=1.0)
- class rlaopt.atoms.Polyhedron(x, A=None, b=None, C=None, lower=None, upper=None)[source]#
Bases:
AtomPolyhedral constraint atom for linear equality and inequality constraints.
- A polyhedron is defined by:
Equality constraints: A @ x = b
Inequality constraints: lower <= C @ x <= upper
The atom evaluates to 0 if all constraints are satisfied, and infinity otherwise (indicator function of the polyhedral set).
- Parameters:
x (Variable) – Variable to constrain.
A (Tensor | None) – Equality constraint matrix (optional). If provided, b must also be provided.
b (Tensor | None) – Equality constraint vector (optional). Required if A is provided.
C (Tensor | None) – Inequality constraint matrix (optional). If None, uses identity (box constraints).
lower (Tensor | int | float | None) – Lower bound vector for inequalities (optional).
upper (Tensor | int | float | None) – Upper bound vector for inequalities (optional).
- Raises:
ValueError – If A is provided but b is None.
ValueError – If constraint dimensions are inconsistent.
ValueError – If no constraints are provided (trivial polyhedron).
- __init__(x, A=None, b=None, C=None, lower=None, upper=None)[source]#
Initialize the polyhedral constraint atom.
- Parameters:
- Raises:
ValueError – If A is provided but b is None.
ValueError – If constraint dimensions are inconsistent.
ValueError – If no constraints are provided.
- forward()[source]#
Evaluate the polyhedral constraint at the current variable value.
- Returns:
0.0 if constraints are satisfied, infinity otherwise.
- Return type:
Norm Balls#
- class rlaopt.atoms.L1NormBall(x, radius=1.0)[source]#
Bases:
_NormBallL1-norm ball constraint enforcing ||x||_1 <= radius.
- This atom represents the indicator function of the L1-norm ball:
0 if ||x||_1 <= radius, +inf otherwise.
- Parameters:
x (Expression) – Expression to constrain.
radius (float | int | Tensor) – Non-negative radius of the L1-norm ball (default: 1.0).
- class rlaopt.atoms.L2NormBall(x, radius=1.0)[source]#
Bases:
_NormBallL2-norm (Euclidean) ball constraint enforcing ||x||_2 <= radius.
- This atom represents the indicator function of the Euclidean ball:
0 if ||x||_2 <= radius, +inf otherwise.
- Parameters:
x (Expression) – Expression to constrain.
radius (float | int | Tensor) – Non-negative radius of the L2-norm ball (default: 1.0).
Regression Models#
- class rlaopt.atoms.LinearRegression(beta, dataloader, fit_intercept=True)[source]#
Bases:
BaseRegressorOrdinary least squares (OLS) linear regression model.
Linear regression models the relationship between features and a continuous target variable by minimizing the sum of squared residuals. This is the most common form of regression analysis.
- The squared error loss is defined as:
L(y, ŷ) = (y - ŷ)^2
- Parameters:
beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)
- __init__(beta, dataloader, fit_intercept=True)[source]#
Initialize linear regression model.
- Parameters:
beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader with training features and targets.
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.
- class rlaopt.atoms.LogisticRegression(beta, dataloader, fit_intercept=True)[source]#
Bases:
BaseClassifierBinary logistic regression model.
Logistic regression is used for binary classification tasks. It models the probability that an instance belongs to the positive class using the logistic (sigmoid) function.
- The binary cross-entropy loss is defined as:
L(y, p) = -[y * log(p) + (1-y) * log(1-p)]
where p = sigmoid(X @ β).
- Parameters:
beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)
- __init__(beta, dataloader, fit_intercept=True)[source]#
Initialize logistic regression model.
- Parameters:
beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader containing the training data with features and binary targets (0 or 1).
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.
- class rlaopt.atoms.PoissonRegression(beta, dataloader, fit_intercept=True)[source]#
Bases:
BaseGLMPoisson regression model with log link function.
Poisson regression is used for modeling count data and contingency tables. It assumes the target variable follows a Poisson distribution and uses a log link function to ensure predictions are positive.
- The Poisson loss (negative log-likelihood) is defined as:
L(y, ŷ) = ŷ - y * log(ŷ)
where ŷ = exp(X @ β) is the predicted rate parameter.
- Parameters:
beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)
- __init__(beta, dataloader, fit_intercept=True)[source]#
Initialize Poisson regression model.
- Parameters:
beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader containing the training data with features and count targets (must be non-negative).
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.
- class rlaopt.atoms.GammaRegression(beta, dataloader, fit_intercept=True)[source]#
Bases:
BaseGLMGamma regression model with log link function.
Gamma regression is used for modeling continuous, positive-valued target variables that are skewed. It assumes the target variable follows a Gamma distribution and uses a log link function to ensure predictions are positive.
- The Gamma loss (negative log-likelihood) is defined as:
L(y, ŷ) = log(ŷ) + y/ŷ
where ŷ = exp(X @ β) is the predicted mean.
- Parameters:
beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)
- __init__(beta, dataloader, fit_intercept=True)[source]#
Initialize Gamma regression model.
- Parameters:
beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader containing the training data with features and positive continuous targets.
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.
- class rlaopt.atoms.InverseGaussianRegression(beta, dataloader, fit_intercept=True)[source]#
Bases:
BaseGLMInverse Gaussian regression model with log link function.
Inverse Gaussian regression is used for modeling continuous, positive-valued target variables that are right-skewed. It assumes the target variable follows an Inverse Gaussian distribution and uses a log link function to ensure predictions are positive.
- The Inverse Gaussian loss (negative log-likelihood) is defined as:
L(y, ŷ) = (y - ŷ)^2 / (2 * y * ŷ^2)
where ŷ = exp(X @ β) is the predicted mean.
- Parameters:
beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)
- __init__(beta, dataloader, fit_intercept=True)[source]#
Initialize Inverse Gaussian regression model.
- Parameters:
beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader containing the training data with features and positive continuous targets.
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.
- class rlaopt.atoms.CompoundPoissonGammaRegression(beta, dataloader, fit_intercept=True, power=1.5)[source]#
Bases:
BaseGLMCompound Poisson-Gamma (Tweedie) regression model with log link function.
The Compound Poisson-Gamma distribution, also known as the Tweedie distribution with power parameter p ∈ (1, 2), is particularly useful for modeling positive continuous data with a point mass at zero. This makes it ideal for scenarios where many observations are exactly zero, but non-zero values are continuous and positive.
- Common applications:
Insurance claims: Many policies have zero claims, non-zero claims are continuous
Rainfall modeling: Many days have zero rainfall, rainy days have continuous amounts
Customer spending: Many customers spend nothing, active customers spend varying amounts
Healthcare costs: Many patients incur zero costs, others have continuous expenses
- The model uses a log link function to ensure predictions are always positive:
ŷ = exp(X @ β)
- The Tweedie distribution combines:
A Poisson process governing the number of events (including zero events)
A Gamma distribution for the size of each event when it occurs
Results in a distribution with Var(Y) = φμ^p where p is in (1,2)
Note
Unlike pure Poisson (p=1) or Gamma (p=2) regression, the Compound Poisson-Gamma with p in (1, 2) handles the mixed discrete-continuous nature of the data with a moderate mean-variance relationship.
- Parameters:
beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)
power (float)
- power#
The power parameter used to define the loss.
- __init__(beta, dataloader, fit_intercept=True, power=1.5)[source]#
Initialize Compound Poisson-Gamma regression model.
- Parameters:
beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader containing the training data with features and non-negative continuous targets (may include zeros).
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.
power (float) – Power parameter p ∈ (1, 2) defining the variance function. Defaults to 1.5.
- class rlaopt.atoms.HuberRegression(beta, dataloader, fit_intercept=True, delta=1.0)[source]#
Bases:
BaseRegressorHuber regression model (robust to outliers).
Huber regression combines the best properties of L2 (least squares) and L1 (absolute deviation) losses. It is quadratic for small residuals and linear for large residuals, making it robust to outliers while maintaining efficiency for normally distributed errors.
- The Huber loss is defined as:
L(r) = 0.5 * r^2 if |r| <= delta L(r) = delta * (|r| - 0.5*delta) if |r| > delta
- Parameters:
beta (Variable)
dataloader (DataLoader)
delta (float)
- delta#
The Huber loss threshold parameter.
- __init__(beta, dataloader, fit_intercept=True, delta=1.0)[source]#
Initialize Huber regression model.
- Parameters:
beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader with training features and targets.
fit_intercept – Whether to fit an intercept term. Defaults to True.
delta (float) – Threshold parameter that defines the point where the loss transitions from quadratic to linear. Smaller values increase robustness to outliers. Defaults to 1.0.
- property delta#
Returns Huber loss threshold parameter.
- class rlaopt.atoms.MultinomialRegression(beta, dataloader, fit_intercept=True)[source]#
Bases:
BaseClassifierMultinomial (softmax) regression model for multi-class classification.
Multinomial regression extends logistic regression to handle more than two classes. It uses the softmax function to model class probabilities and is trained using the cross-entropy loss.
- The cross-entropy loss is defined as:
L(y, p) = -log(p_y)
where p = softmax(X @ β) and p_y is the probability of the true class.
Note
For multinomial regression, beta should be a matrix of shape (n_features, n_classes) to produce logits for each class.
- Parameters:
beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)
- __init__(beta, dataloader, fit_intercept=True)[source]#
Initialize multinomial regression model.
- Parameters:
beta (Variable) – Model parameters variable representing regression coefficients. Shape should be (n_features, n_classes) for multi-class classification.
dataloader (DataLoader) – DataLoader containing the training data with features and class labels (integers from 0 to n_classes-1).
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.