Approximations#

Classes and associated functionality to approximate kernels.

When a dataset is very large, methods which have to evaluate all pairwise combinations of the data, such as gramian_row_mean(), can become prohibitively expensive. To reduce this computational cost, such methods can instead be approximated (providing suitable approximation error can be achieved).

The ApproximateKernels in this module provide the functionality required to override specific methods of a base_kernel with their approximate counterparts. Because ApproximateKernels inherit from Kernel, with all functionality provided through composition with a base_kernel, they can be freely used in any place where a standard Kernel is expected.

coreax.approximation._random_indices(key, num_data_points, num_select, mode='kernel')[source]#

Select a random subset of indices.

Parameters:

key (ArrayLike) – RNG key for seeding the random selection
num_data_points (int) – The total number of indexable data points
num_select (int) – The number of indices to select
mode (Literal['kernel', 'train']) – The selection mode, used for error message formatting

Returns:

A randomly selected subset of indices, of size num_samples, for a dataset with num_data_points indexable entries.

coreax.approximation._random_least_squares(key, data, features, num_indices, target_map=<function <lambda>>)[source]#

Solve the least-square problem on a random subset of the system.

A linear system \(Ax = b\), solved via least-squares as \(x = A^+ b\), can be approximated by random least-square as x approx hat{x} = hat{A}^+ hat{b}, where \(\hat{A} = A_i\ \text{and}\ \hat{b} = b_i\, \forall i \in I]\). I is a random subset of indices for the original system of equations.

Parameters:

key (ArrayLike) – RNG key for seeding the random selection
data (Array) – The data \(z\); yields \(b\) when pushed through the target map
features (Array) – The feature matrix \(A\)
num_indices (int) – The size of the random subset of indices \(I\)
target_map (Callable[[Array], Array]) – The target map \(\phi\) which defines \(b := \phi(z)\), where \(z\) is the input data

Return type:

Array

Returns:

The push-forward of the approximate solution \(A\hat{x}\)

class coreax.approximation.ApproximateKernel(base_kernel)[source]#

Base class for approximated kernels.

Provides approximations of the methods in the base_kernel.

The gramian_row_mean() method is particularly amenable to approximation, with significant performance improvements possible depending on the acceptable levels of error.

Parameters:: base_kernel (Kernel) – a Kernel whose attributes/methods are to be approximated

compute_elementwise(x, y)[source]#

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]#

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x (ArrayLike) – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y (ArrayLike) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Trace of the Laplace-style operator; a real number

class coreax.approximation.RandomRegressionKernel(base_kernel, random_key, num_kernel_points=10000, num_train_points=10000)[source]#

An approximate kernel that requires the attributes for random regression.

Parameters:

base_kernel (Kernel) – a Kernel whose attributes/methods are to be approximated
random_key (ArrayLike) – Key for random number generation
num_kernel_points (int) – Number of kernel evaluation points
num_train_points (int) – Number of training points used to fit kernel regression

class coreax.approximation.MonteCarloApproximateKernel(base_kernel, random_key, num_kernel_points=10000, num_train_points=10000)[source]#

Approximate a base kernel via random subset selection.

Only the Gramian row-mean is approximated here, all other methods are inherited directly from the base_kernel.

Parameters:

base_kernel (Kernel) – a Kernel whose attributes/methods are to be approximated
random_key (ArrayLike) – Key for random number generation
num_kernel_points (int) – Number of kernel evaluation points
num_train_points (int) – Number of training points used to fit kernel regression

gramian_row_mean(x, **kwargs)[source]#

Approximate the Gramian row-mean by Monte-Carlo sampling.

A uniform random subset of x is used to approximate the base kernel’s Gramian row-mean.

Parameters:: x (Union[Array, ndarray, bool_, number, bool, int, float, complex, Data]) – Data matrix, \(n \times d\)
Return type:: Array
Returns:: Approximation of the base kernel’s Gramian row-mean

class coreax.approximation.ANNchorApproximateKernel(base_kernel, random_key, num_kernel_points=10000, num_train_points=10000)[source]#

Approximate a base kernel via random kernel regression on ANNchor selected points.

Only the base kernel’s Gramian row-mean is approximated here, all other methods are inherited directly from the base_kernel.

Parameters:

base_kernel (Kernel) – a Kernel whose attributes/methods are to be approximated
random_key (ArrayLike) – Key for random number generation
num_kernel_points (int) – Number of kernel evaluation points
num_train_points (int) – Number of training points used to fit kernel regression

gramian_row_mean(x, **kwargs)[source]#

Approximate the Gramian row-mean by random regression on ANNchor points.

A subset of x is selected via the ANNchor approach and random kernel regression used to approximate the base kernel’s Gramian row-mean. The ANNchor implementation used can be found here.

Parameters:: x (Union[Array, ndarray, bool_, number, bool, int, float, complex, Data]) – Data matrix, \(n \times d\)
Return type:: Array
Returns:: Approximation of the base kernel’s Gramian row-mean

class coreax.approximation.NystromApproximateKernel(base_kernel, random_key, num_kernel_points=10000, num_train_points=10000)[source]#

Approximate a base kernel via Nystrom approximation.

Only the base kernel’s Gramian row-mean is approximated here, all other methods are inherited directly from the base_kernel.

Parameters:

base_kernel (Kernel) – a Kernel whose attributes/methods are to be approximated
random_key (ArrayLike) – Key for random number generation
num_kernel_points (int) – Number of kernel evaluation points
num_train_points (int) – Number of training points used to fit kernel regression

gramian_row_mean(x, **kwargs)[source]#

Approximate the Gramian row-mean by Nystrom approximation.

We consider a \(n \times d\) dataset, and wish to use an \(m \times d\) subset of this to approximate the base kernel’s Gramian row-mean. The m points are selected uniformly at random, and the Nystrom estimator, as defined in [chatalic2022nystrom] is computed using this subset.

Parameters:: x (Union[Array, ndarray, bool_, number, bool, int, float, complex, Data]) – Data matrix, \(n \times d\)
Return type:: Array
Returns:: Approximation of the base kernel’s Gramian row-mean