Atoms Module

Atoms Module#

Atoms for modeling optimization problems.

Base Classes#

class rlaopt.atoms.Atom(exprs, buffers, variable_names=None)[source]#

Bases: Expression, ABC

Abstract base class for optimization atoms.

An atom represents a mathematical function that can be used in optimization problems. Atoms have various properties (smooth, proxable) and can be composed to form more complex objective functions.

Atoms extend Expression with:

Input registration system for Variables and Expressions
Buffer management for constants and hyperparameters

Subclasses must implement:

is_smooth() - whether the function is differentiable everywhere
is_proxable() - whether the proximal operator is computable
forward() - evaluation of the atom
_prox() - prox operator of the atom

Parameters:

exprs (dict[str, Expression])
buffers (dict[str, Tensor | float | None])
variable_names (list[str] | None)

__init__(exprs, buffers, variable_names=None)[source]#

Initialize the atom.

Subclasses should call this constructor to ensure proper initialization.

Parameters:

exprs (dict[str, Expression])
buffers (dict[str, Tensor | float | None])
variable_names (list[str] | None)

abstractmethod is_proxable()[source]#

Check if the expression has a computable proximal operator.

The proximal operator is used in proximal gradient methods and ADMM for non-smooth optimization. An expression is proxable if its proximal operator can be computed efficiently in closed form.

Returns:: True if the expression is proxable, False otherwise.
Return type:: bool

tree()[source]#

Return tree representation for Atom.

If the atom has input expressions, includes them in the tree. Otherwise, returns just the atom class name as a leaf node.

Returns:: Tree with atom class name and optional input child.
Return type:: ExprTree

Regularizers#

class rlaopt.atoms.L1Norm(x, scaling=1.0)[source]#

Bases: _Norm

L1-norm regularization atom.

Computes the scaled L1-norm: scaling * ||x||₁ = scaling * Σᵢ |xᵢ|

Parameters:

x (Expression) – Expression to apply the L1-norm to.
scaling (float | Tensor) – Scaling factor for the L1-norm (default: 1.0).

Examples

>>> x = Variable((100,), name='weights')
>>> l1 = L1Norm(x, scaling=0.01)
>>> penalty = l1.forward()

class rlaopt.atoms.L2Norm(x, scaling=1.0)[source]#

Bases: _Norm

L2-norm regularization atom.

Computes the scaled L2-norm: scaling * ||x||₂ = scaling * sqrt(Σᵢ xᵢ²)

Parameters:

x (Expression) – Expression to apply the L2-norm to.
scaling (float | Tensor) – Scaling factor for the L2-norm (default: 1.0).

Examples

>>> x = Variable((100,), name='weights')
>>> l2 = L2Norm(x, scaling=0.01)
>>> penalty = l2.forward()

class rlaopt.atoms.LInfNorm(x, scaling=1.0)[source]#

Bases: _Norm

L-infinity-norm regularization atom.

Computes the scaled L-infinity norm: scaling * ||x||_∞ = scaling * maxᵢ |xᵢ|

Parameters:

x (Expression) – Expression to apply the L-infinity norm to.
scaling (float | Tensor) – Scaling factor for the L-infinity norm (default: 1.0).

Examples

>>> x = Variable((100,), name='weights')
>>> linf = LInfNorm(x, scaling=0.01)
>>> penalty = linf.forward()

class rlaopt.atoms.SumSquares(x)[source]#

Bases: _QuadForm

Sum of squares atom, f(x) = sum(x**2).

Parameters:: x (Expression)

__init__(x)[source]#

Initializes the sum of squares atom.

Parameters:: x (Expression) – Expression to apply the sum of squares to.

is_proxable()[source]#

Returns True if the input is a Variable.

Return type:: bool

class rlaopt.atoms.ElasticNet(x, l1_scaling=1.0, l2_scaling=1.0)[source]#

Bases: NonsmoothRegularizer

Elastic net regularization combining L1 and L2 penalties.

The elastic net penalty is defined as:: l1_scaling * ||x||₁ + (l2_scaling / 2) * ||x||₂²

Parameters:

x (Expression) – Expression to apply the elastic net penalty to.
l1_scaling (float | Tensor) – Scaling factor for the L1-norm penalty. Defaults to 1.0.
l2_scaling (float | Tensor) – Scaling factor for the L2-norm penalty. Defaults to 1.0.

Raises:

TypeError – If x is not an Expression.

Examples

>>> x = Variable((100,), name='weights')
>>> # Standard elastic net with equal L1 and L2 contribution
>>> elastic = ElasticNet(x, l1_scaling=0.5, l2_scaling=0.5)
>>> penalty = elastic.forward()

>>> # Lasso-like (emphasize sparsity)
>>> elastic_lasso = ElasticNet(x, l1_scaling=1.0, l2_scaling=0.1)

>>> # Ridge-like (emphasize smoothness)
>>> elastic_ridge = ElasticNet(x, l1_scaling=0.1, l2_scaling=1.0)

__init__(x, l1_scaling=1.0, l2_scaling=1.0)[source]#

Initialize the elastic net atom.

Parameters:

x (Expression) – Expression to apply the elastic net penalty to.
l1_scaling (float | Tensor) – Scaling factor for the L1-norm penalty. Defaults to 1.0.
l2_scaling (float | Tensor) – Scaling factor for the L2-norm penalty. Defaults to 1.0.

forward()[source]#

Evaluate the elastic net penalty at the current variable value.

Returns:

The elastic net penalty value:: l1_scaling * ||x||₁ + (l2_scaling / 2) * ||x||₂²

Return type:

torch.Tensor

class rlaopt.atoms.NucNorm(x, scaling=1.0)[source]#

Bases: NonsmoothRegularizer

Nuclear norm (sum of singular values) of a matrix variable.

The nuclear norm is defined as the sum of the singular values of a matrix. It is commonly used as a convex relaxation of the rank function in low-rank matrix optimization problems.

The atom computes: scaling * ||X||_* = scaling * Σᵢ σᵢ(X) where σᵢ(X) are the singular values of X.

Parameters:

x (Expression) – Expression to apply the nuclear norm to.
scaling (float | Tensor) – Scaling factor for the nuclear norm. Defaults to 1.0.

Raises:

TypeError – If x is not an Expression.

Examples

>>> X = Variable((10, 5), name='X')
>>> nuc_norm = NucNorm(X, scaling=0.1)
>>> loss = nuc_norm.forward()

__init__(x, scaling=1.0)[source]#

Initialize the nuclear norm atom.

Parameters:

x (Expression) – 2D matrix expression to apply the nuclear norm to.
scaling (float | Tensor) – Scaling factor for the nuclear norm. Defaults to 1.0.

forward()[source]#

Evaluate the nuclear norm at the registered variable value.

Returns:: The scaled sum of singular values.
Return type:: torch.Tensor

Smooth Functions#

class rlaopt.atoms.SumSquares(x)[source]#

Sum of squares atom, f(x) = sum(x**2).

Parameters:: x (Expression)

__init__(x)[source]#

Initializes the sum of squares atom.

Parameters:: x (Expression) – Expression to apply the sum of squares to.

is_proxable()[source]#

Returns True if the input is a Variable.

Return type:: bool

class rlaopt.atoms.QuadForm(x, Q)[source]#

Quadratic form atom, f(x) = x^T Q x.

Parameters:

x (Expression)
Q (Tensor)

__init__(x, Q)[source]#

Initializes the quadratic form atom.

Parameters:

x (Expression) – Expression to apply the quadratic form to.
Q (Tensor) – Matrix defining the quadratic form.

is_proxable()[source]#

Quadratic form is not proxable.

Return type:: bool

Constraints#

class rlaopt.atoms.Box(x, lower=None, upper=None)[source]#

Bases: Polyhedron

Box constraint atom representing elementwise bounds.

A box constraint restricts each element of a variable to lie within specified lower and upper bounds: lower <= x <= upper.

This is a special case of Polyhedra with identity inequality constraints (C = I), but with an efficient closed-form proximal operator (projection via clamping).

Parameters:

x (Variable) – Variable to constrain.
lower (Tensor | int | float | None) – Lower bound vector (optional). If None, defaults to -infinity.
upper (Tensor | int | float | None) – Upper bound vector (optional). If None, defaults to +infinity.

Examples

>>> # Standard box constraint: 0 <= x <= 1
>>> x = Variable((10,), name='x')
>>> box = Box(x, lower=0.0, upper=1.0)

>>> # One-sided constraint: x >= 0 (non-negativity)
>>> box_nonneg = Box(x, lower=0.0)

>>> # One-sided constraint: x <= 1
>>> box_upper = Box(x, upper=1.0)

>>> # Use proximal operator for projection
>>> out_of_bounds = torch.randn(10)
>>> projected = box.prox(out_of_bounds, prox_scaling=1.0)

__init__(x, lower=None, upper=None)[source]#

Initialize the box constraint atom.

Parameters:

x (Variable) – Variable to constrain.
lower (torch.Tensor | int | float) – Lower bound vector (optional).
upper (Tensor | int | float | None) – (torch.Tensor | int | float): Upper bound vector (optional).

is_proxable()[source]#

Check if the box constraint has a computable proximal operator.

Returns:

Always True, as box constraints have a closed-form: proximal operator (projection via clamping).

Return type:

bool

class rlaopt.atoms.NonNegative(x)[source]#

Bases: Box

Non-negativity constraint atom enforcing x >= 0.

A convenience class for box constraints with lower bound of zero and no upper bound, enforcing that all elements of the variable are non-negative.

This is equivalent to Box(x, lower=0, upper=None) but provides a more semantic interface for this common constraint.

Parameters:: x (Variable) – Variable to constrain to be non-negative.

Examples

>>> # Constrain variable to be non-negative
>>> x = Variable((10,), name='x')
>>> nonneg = NonNegative(x)

>>> # Use proximal operator for projection
>>> point_with_negatives = torch.randn(10)
>>> projected = nonneg.prox(point_with_negatives, prox_scaling=1.0)
>>> # All negative values are clamped to 0

__init__(x)[source]#

Initialize the non-negativity constraint atom.

Parameters:: x (Variable) – Variable to constrain to be non-negative.

class rlaopt.atoms.LinearEquality(x, A, b)[source]#

Bases: Polyhedron

Linear equality constraint atom enforcing A @ x = b.

Represents a system of linear equality constraints. Unlike the general Polyhedron class, this provides an efficient closed-form proximal operator (projection onto the affine subspace) via QR factorization. Checks for feasibility at initialization, by determining the rank of A via QR factorization. If A is not full-row rank, then an error is raised. When A is full-row rank, the R factor is cached and used for efficienctly computing the projection onto the constraint set.

The projection solves: argmin_z ||z - location||² subject to A @ z = b

Parameters:

x (Variable) – Variable to constrain.
A (Tensor) – Constraint matrix defining the linear system.
b (Tensor) – Right-hand side vector of the equality constraints.

Examples

>>> # Single equality constraint: x[0] + x[1] = 1
>>> x = Variable((2,), name='x')
>>> A = torch.tensor([[1.0, 1.0]])
>>> b = torch.tensor([1.0])
>>> constraint = LinearEquality(x, A, b)

>>> # Multiple equality constraints
>>> x = Variable((5,), name='x')
>>> A = torch.randn(3, 5)
>>> b = torch.randn(3)
>>> constraint = LinearEquality(x, A, b)

>>> # Use proximal operator for projection onto affine subspace
>>> unconstrained_point = torch.randn(5)
>>> projected = constraint.prox(unconstrained_point, prox_scaling=1.0)
>>> # Verify: A @ projected should equal b

__init__(x, A, b)[source]#

Initialize the affine equality constraint atom.

Parameters:

x (Variable) – Variable to constrain.
A (Tensor) – Constraint matrix defining the linear system.
b (Tensor) – Right-hand side vector of the equality constraints.

is_proxable()[source]#

Check if the constraint has a computable proximal operator.

Returns:

Always True, as affine equality constraints have a: closed-form proximal operator (projection via Cholesky).

Return type:

bool

class rlaopt.atoms.Halfspace(x, c, upper)[source]#

Bases: Polyhedron

Halfspace constraint atom representing a linear inequality.

A halfspace constraint restricts a variable to satisfy a linear inequality: c^T x <= upper, which defines a half-space in the variable space.

This is a special case of Polyhedra with a single linear inequality constraint, but with an efficient closed-form proximal operator (projection onto the halfspace).

Parameters:

x (Variable) – Variable to constrain.
c (Tensor) – Normal vector defining the halfspace orientation.
upper (float) – Upper bound for the linear form c^T x.

Examples

>>> # Constraint: c^T x <= 1
>>> x = Variable((5,), name='x')
>>> c = torch.randn(5)
>>> halfspace = Halfspace(x, c=c, upper=1.0)

>>> # Non-negativity for first coordinate: x[0] >= 0
>>> # Rewritten as: -x[0] <= 0, so c = [-1, 0, 0, ...], upper = 0
>>> c = torch.zeros(5)
>>> c[0] = -1.0
>>> nonneg = Halfspace(x, c=c, upper=torch.tensor(0.0))

>>> # Use proximal operator for projection
>>> violating_point = torch.randn(5)
>>> projected = halfspace.prox(violating_point, prox_scaling=1.0)

__init__(x, c, upper)[source]#

Initialize the halfspace constraint atom.

Parameters:

x (Variable) – Variable to constrain.
c (Tensor) – Normal vector defining the halfspace orientation.
upper (float) – Upper bound for the linear form c^T x.

is_proxable()[source]#

Check if the halfspace constraint has a computable proximal operator.

Returns:

Always True, as halfspace constraints have a closed-form: proximal operator (projection onto the halfspace).

Return type:

bool

class rlaopt.atoms.Polyhedron(x, A=None, b=None, C=None, lower=None, upper=None)[source]#

Bases: Atom

Polyhedral constraint atom for linear equality and inequality constraints.

A polyhedron is defined by:

Equality constraints: A @ x = b
Inequality constraints: lower <= C @ x <= upper

The atom evaluates to 0 if all constraints are satisfied, and infinity otherwise (indicator function of the polyhedral set).

Parameters:

x (Variable) – Variable to constrain.
A (Tensor | None) – Equality constraint matrix (optional). If provided, b must also be provided.
b (Tensor | None) – Equality constraint vector (optional). Required if A is provided.
C (Tensor | None) – Inequality constraint matrix (optional). If None, uses identity (box constraints).
lower (Tensor | int | float | None) – Lower bound vector for inequalities (optional).
upper (Tensor | int | float | None) – Upper bound vector for inequalities (optional).

Raises:

ValueError – If A is provided but b is None.
ValueError – If constraint dimensions are inconsistent.
ValueError – If no constraints are provided (trivial polyhedron).

__init__(x, A=None, b=None, C=None, lower=None, upper=None)[source]#

Initialize the polyhedral constraint atom.

Parameters:

x (Variable) – Variable to constrain.
A (Tensor | None) – Equality constraint matrix (optional).
b (Tensor | None) – Equality constraint vector (optional).
C (Tensor | None) – Inequality constraint matrix (optional).
lower (Tensor | int | float | None) – Lower bound vector for inequalities (optional).
upper (Tensor | int | float | None) – Upper bound vector for inequalities (optional).

Raises:

ValueError – If A is provided but b is None.
ValueError – If constraint dimensions are inconsistent.
ValueError – If no constraints are provided.

forward()[source]#

Evaluate the polyhedral constraint at the current variable value.

Returns:: 0.0 if constraints are satisfied, infinity otherwise.
Return type:: torch.Tensor

is_smooth()[source]#

Check if the polyhedral constraint is smooth.

Returns:: Always False, as indicator functions are non-smooth.
Return type:: bool

is_proxable()[source]#

Check if the polyhedral constraint has a computable proximal operator.

Returns:: Always False (general polyhedral projection not implemented).
Return type:: bool

Norm Balls#

class rlaopt.atoms.L1NormBall(x, radius=1.0)[source]#

Bases: _NormBall

L1-norm ball constraint enforcing ||x||_1 <= radius.

This atom represents the indicator function of the L1-norm ball:: 0 if ||x||_1 <= radius, +inf otherwise.

Parameters:

x (Expression) – Expression to constrain.
radius (float | int | Tensor) – Non-negative radius of the L1-norm ball (default: 1.0).

class rlaopt.atoms.L2NormBall(x, radius=1.0)[source]#

Bases: _NormBall

L2-norm (Euclidean) ball constraint enforcing ||x||_2 <= radius.

This atom represents the indicator function of the Euclidean ball:: 0 if ||x||_2 <= radius, +inf otherwise.

Parameters:

x (Expression) – Expression to constrain.
radius (float | int | Tensor) – Non-negative radius of the L2-norm ball (default: 1.0).

class rlaopt.atoms.LInfNormBall(x, radius=1.0)[source]#

Bases: _NormBall

L-infinity norm ball constraint enforcing ||x||_inf <= radius.

This atom represents the indicator function of the L-infinity norm ball:: 0 if ||x||_inf <= radius, +inf otherwise.

Parameters:

x (Expression) – Expression to constrain.
radius (float | int | Tensor) – Non-negative radius of the L-infinity norm ball (default: 1.0).

Regression Models#

class rlaopt.atoms.LinearRegression(beta, dataloader, fit_intercept=True)[source]#

Bases: BaseRegressor

Ordinary least squares (OLS) linear regression model.

Linear regression models the relationship between features and a continuous target variable by minimizing the sum of squared residuals. This is the most common form of regression analysis.

The squared error loss is defined as:: L(y, ŷ) = (y - ŷ)^2

Parameters:

beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)

__init__(beta, dataloader, fit_intercept=True)[source]#

Initialize linear regression model.

Parameters:

beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader with training features and targets.
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.

class rlaopt.atoms.LogisticRegression(beta, dataloader, fit_intercept=True)[source]#

Bases: BaseClassifier

Binary logistic regression model.

Logistic regression is used for binary classification tasks. It models the probability that an instance belongs to the positive class using the logistic (sigmoid) function.

The binary cross-entropy loss is defined as:: L(y, p) = -[y * log(p) + (1-y) * log(1-p)]

where p = sigmoid(X @ β).

Parameters:

beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)

__init__(beta, dataloader, fit_intercept=True)[source]#

Initialize logistic regression model.

Parameters:

beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader containing the training data with features and binary targets (0 or 1).
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.

class rlaopt.atoms.PoissonRegression(beta, dataloader, fit_intercept=True)[source]#

Bases: BaseGLM

Poisson regression model with log link function.

Poisson regression is used for modeling count data and contingency tables. It assumes the target variable follows a Poisson distribution and uses a log link function to ensure predictions are positive.

The Poisson loss (negative log-likelihood) is defined as:: L(y, ŷ) = ŷ - y * log(ŷ)

where ŷ = exp(X @ β) is the predicted rate parameter.

Parameters:

beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)

__init__(beta, dataloader, fit_intercept=True)[source]#

Initialize Poisson regression model.

Parameters:

beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader containing the training data with features and count targets (must be non-negative).
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.

class rlaopt.atoms.GammaRegression(beta, dataloader, fit_intercept=True)[source]#

Bases: BaseGLM

Gamma regression model with log link function.

Gamma regression is used for modeling continuous, positive-valued target variables that are skewed. It assumes the target variable follows a Gamma distribution and uses a log link function to ensure predictions are positive.

The Gamma loss (negative log-likelihood) is defined as:: L(y, ŷ) = log(ŷ) + y/ŷ

where ŷ = exp(X @ β) is the predicted mean.

Parameters:

beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)

__init__(beta, dataloader, fit_intercept=True)[source]#

Initialize Gamma regression model.

Parameters:

beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader containing the training data with features and positive continuous targets.
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.

class rlaopt.atoms.InverseGaussianRegression(beta, dataloader, fit_intercept=True)[source]#

Bases: BaseGLM

Inverse Gaussian regression model with log link function.

Inverse Gaussian regression is used for modeling continuous, positive-valued target variables that are right-skewed. It assumes the target variable follows an Inverse Gaussian distribution and uses a log link function to ensure predictions are positive.

The Inverse Gaussian loss (negative log-likelihood) is defined as:: L(y, ŷ) = (y - ŷ)^2 / (2 * y * ŷ^2)

where ŷ = exp(X @ β) is the predicted mean.

Parameters:

beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)

__init__(beta, dataloader, fit_intercept=True)[source]#

Initialize Inverse Gaussian regression model.

Parameters:

beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader containing the training data with features and positive continuous targets.
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.

class rlaopt.atoms.CompoundPoissonGammaRegression(beta, dataloader, fit_intercept=True, power=1.5)[source]#

Bases: BaseGLM

Compound Poisson-Gamma (Tweedie) regression model with log link function.

The Compound Poisson-Gamma distribution, also known as the Tweedie distribution with power parameter p ∈ (1, 2), is particularly useful for modeling positive continuous data with a point mass at zero. This makes it ideal for scenarios where many observations are exactly zero, but non-zero values are continuous and positive.

Common applications:

Insurance claims: Many policies have zero claims, non-zero claims are continuous
Rainfall modeling: Many days have zero rainfall, rainy days have continuous amounts
Customer spending: Many customers spend nothing, active customers spend varying amounts
Healthcare costs: Many patients incur zero costs, others have continuous expenses

The model uses a log link function to ensure predictions are always positive:

ŷ = exp(X @ β)

The Tweedie distribution combines:

A Poisson process governing the number of events (including zero events)
A Gamma distribution for the size of each event when it occurs
Results in a distribution with Var(Y) = φμ^p where p is in (1,2)

Note

Unlike pure Poisson (p=1) or Gamma (p=2) regression, the Compound Poisson-Gamma with p in (1, 2) handles the mixed discrete-continuous nature of the data with a moderate mean-variance relationship.

Parameters:

beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)
power (float)

power#: The power parameter used to define the loss.

__init__(beta, dataloader, fit_intercept=True, power=1.5)[source]#

Initialize Compound Poisson-Gamma regression model.

Parameters:

beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader containing the training data with features and non-negative continuous targets (may include zeros).
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.
power (float) – Power parameter p ∈ (1, 2) defining the variance function. Defaults to 1.5.

property power: float#: Returns power used to define the loss.

class rlaopt.atoms.HuberRegression(beta, dataloader, fit_intercept=True, delta=1.0)[source]#

Bases: BaseRegressor

Huber regression model (robust to outliers).

Huber regression combines the best properties of L2 (least squares) and L1 (absolute deviation) losses. It is quadratic for small residuals and linear for large residuals, making it robust to outliers while maintaining efficiency for normally distributed errors.

The Huber loss is defined as:: L(r) = 0.5 * r^2 if |r| <= delta L(r) = delta * (|r| - 0.5*delta) if |r| > delta

Parameters:

beta (Variable)
dataloader (DataLoader)
delta (float)

delta#: The Huber loss threshold parameter.

__init__(beta, dataloader, fit_intercept=True, delta=1.0)[source]#

Initialize Huber regression model.

Parameters:

beta (Variable) – Model parameters variable representing regression coefficients.
dataloader (DataLoader) – DataLoader with training features and targets.
fit_intercept – Whether to fit an intercept term. Defaults to True.
delta (float) – Threshold parameter that defines the point where the loss transitions from quadratic to linear. Smaller values increase robustness to outliers. Defaults to 1.0.

property delta#: Returns Huber loss threshold parameter.

class rlaopt.atoms.MultinomialRegression(beta, dataloader, fit_intercept=True)[source]#

Bases: BaseClassifier

Multinomial (softmax) regression model for multi-class classification.

Multinomial regression extends logistic regression to handle more than two classes. It uses the softmax function to model class probabilities and is trained using the cross-entropy loss.

The cross-entropy loss is defined as:: L(y, p) = -log(p_y)

where p = softmax(X @ β) and p_y is the probability of the true class.

Note

For multinomial regression, beta should be a matrix of shape (n_features, n_classes) to produce logits for each class.

Parameters:

beta (Variable)
dataloader (DataLoader)
fit_intercept (bool)

__init__(beta, dataloader, fit_intercept=True)[source]#

Initialize multinomial regression model.

Parameters:

beta (Variable) – Model parameters variable representing regression coefficients. Shape should be (n_features, n_classes) for multi-class classification.
dataloader (DataLoader) – DataLoader containing the training data with features and class labels (integers from 0 to n_classes-1).
fit_intercept (bool) – Whether to fit an intercept term. Defaults to True.

Atoms Module

Contents

Atoms Module#

Base Classes#

Regularizers#

Smooth Functions#

Constraints#

Norm Balls#

Regression Models#