Kernels¶

Solvers for generating coresets.

coreax.kernels.median_heuristic(x)[source]¶

Compute the median heuristic for setting kernel bandwidth.

Analysis of the performance of the median heuristic can be found in [garreau2018median].

Parameters:: x (Union[Shaped[Array, 'n d'], Shaped[Array, 'n'], Shaped[Array, ''], float, int]) – Input array of vectors
Return type:: Shaped[Array, '']
Returns:: Bandwidth parameter, computed from the median heuristic, as a zero-dimensional array

class coreax.kernels.ScalarValuedKernel[source]¶

Bases: Module

Abstract base class for scalar-valued kernels.

compute(x, y)[source]¶

Evaluate the kernel on input data x and y.

The ‘data’ can be any of:

floating numbers (so a single data-point in 1-dimension)
zero-dimensional arrays (so a single data-point in 1-dimension)
a vector (a single-point in multiple dimensions)
array (multiple vectors).

Evaluation is always vectorised.

Parameters:

x (Union[Shaped[Array, 'n d'], Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – An \(n \times d\) dataset (array) or a single value (point)
y (Union[Shaped[Array, 'm d'], Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – An \(m \times d\) dataset (array) or a single value (point)

Return type:

Union[Shaped[Array, 'n m'], Shaped[Array, '1 1']]

Returns:

Kernel evaluations between points in x and y. If x = y, then this is the Gram matrix corresponding to the RKHS inner product.

abstract compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x (Union[Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y (Union[Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Shaped[Array, '']

Returns:

Kernel evaluated at (x, y)

grad_x(x, y)[source]¶

Evaluate the gradient (Jacobian) of the kernel function w.r.t. x.

The function is vectorised, so x or y can be any of:

floating numbers (so a single data-point in 1-dimension)
zero-dimensional arrays (so a single data-point in 1-dimension)
a vector (a single-point in multiple dimensions)
array (multiple vectors).

Parameters:

x (Union[Shaped[Array, 'n d'], Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – An \(n \times d\) dataset (array) or a single value (point)
y (Union[Shaped[Array, 'm d'], Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – An \(m \times d\) dataset (array) or a single value (point)

Return type:

Union[Shaped[Array, 'n m d'], Shaped[Array, '1 1 d'], Shaped[Array, '1 1 1']]

Returns:

An \(n \times m \times d\) array of pairwise Jacobians

grad_y(x, y)[source]¶

Evaluate the gradient (Jacobian) of the kernel function w.r.t. y.

The function is vectorised, so x or y can be any of:

floating numbers (so a single data-point in 1-dimension)
zero-dimensional arrays (so a single data-point in 1-dimension)
a vector (a single-point in multiple dimensions)
array (multiple vectors).

Parameters:

x (Union[Shaped[Array, 'n d'], Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – An \(n \times d\) dataset (array) or a single value (point)
y (Union[Shaped[Array, 'm d'], Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – An \(m \times d\) dataset (array) or a single value (point)

Return type:

Union[Shaped[Array, 'n m d'], Shaped[Array, '1 1 d'], Shaped[Array, '1 1 1']]

Returns:

An \(m \times n \times d\) array of pairwise Jacobians

grad_x_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x (Union[Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y (Union[Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Union[Shaped[Array, 'd'], Shaped[Array, '']]

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x (Union[Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y (Union[Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Return type:

Union[Shaped[Array, 'd'], Shaped[Array, '']]

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y(x, y)[source]¶

Evaluate the divergence operator w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). This function is vectorised, so it accepts vectors or arrays.

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x (Union[Shaped[Array, 'n d'], Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y (Union[Shaped[Array, 'm d'], Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Union[Shaped[Array, 'n m'], Shaped[Array, '1 1']]

Returns:

Array of Laplace-style operator traces \(n \times m\) array

divergence_x_grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x (Union[Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y (Union[Shaped[Array, 'd'], Shaped[Array, ''], float, int]) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Shaped[Array, '']

Returns:

Trace of the Laplace-style operator; a real number

gramian_row_mean(x, *, block_size=None, unroll=1)[source]¶

Compute the (blocked) row-mean of the kernel’s Gramian matrix.

A convenience method for calling compute_mean(). Equivalent to the call compute_mean(x, x, axis=0, block_size=block_size, unroll=unroll).

Parameters:

x (Union[Shaped[Array, 'n d'], Shaped[Array, 'd'], Shaped[Array, ''], float, int, Data]) – Data matrix, \(n \times d\)
block_size (Union[int, None, tuple[Optional[int], Optional[int]]]) – Block size parameter passed to compute_mean()
unroll (Union[int, bool, tuple[Union[int, bool], Union[int, bool]]]) – Unroll parameter passed to compute_mean()

Return type:

Shaped[Array, 'n']

Returns:

Gramian ‘row/column-mean’, \(\frac{1}{n}\sum_{i=1}^{n} G_{ij}\).

compute_mean(x, y, axis=None, *, block_size=None, unroll=1)[source]¶

Compute the (blocked) mean of the matrix \(K_{ij} = k(x_i, y_j)\).

The \(n \times m\) kernel matrix \(K_{ij} = k(x_i, y_j)\), where x and y are respectively \(n \times d\) and \(m \times d\) (weighted) data matrices, has the following (weighted) means:

mean (axis=None) \(\frac{1}{n m}\sum_{i,j=1}^{n, m} K_{ij}\)
row-mean (axis=0) \(\frac{1}{n}\sum_{i=1}^{n} K_{ij}\)
column-mean (axis=1) \(\frac{1}{m}\sum_{j=1}^{m} K_{ij}\)

If x and y are of type Data, their weights are used to compute the weighted mean as defined in jax.numpy.average().

Note

The conventional ‘mean’ is a scalar, the ‘row-mean’ is an \(m\)-vector, while the ‘column-mean’ is an \(n\)-vector.

To avoid materializing the entire matrix (memory cost \(\mathcal{O}(n m)\)), we accumulate the mean over blocks (memory cost \(\mathcal{O}(B_x B_y)\), where B_x and B_y are user-specified block-sizes for blocking the x and y parameters respectively.

Note

The data x and/or y are padded with zero-valued and zero-weighted data points, when B_x and/or B_y are non-integer divisors of n and/or m. Padding does not alter the result, but does provide the block shape stability required by jax.lax.scan() (used for block iteration).

Parameters:

x (Union[Shaped[Array, 'n d'], Shaped[Array, 'd'], Shaped[Array, ''], float, int, Data]) – Data matrix, \(n \times d\)
y (Union[Shaped[Array, 'm d'], Shaped[Array, 'd'], Shaped[Array, ''], float, int, Data]) – Data matrix, \(m \times d\)
axis (Optional[int]) – Which axis of the kernel matrix to compute the mean over; a value of None computes the mean over both axes
block_size (Union[int, None, tuple[Optional[int], Optional[int]]]) – Size of matrix blocks to process; a value of None sets \(B_x = n\) and \(B_y = m\), effectively disabling the block accumulation; an integer value B sets \(B_y = B_x = B\); a tuple allows different sizes to be specified for B_x and B_y; to reduce overheads, it is often sensible to select the largest block size that does not exhaust the available memory resources
unroll (Union[int, bool, tuple[Union[int, bool], Union[int, bool]]]) – Unrolling parameter for the outer and inner jax.lax.scan() calls, allows for trade-offs between compilation and runtime cost; consult the JAX docs for further information

Return type:

Union[Shaped[Array, 'n'], Shaped[Array, 'm'], Shaped[Array, '']]

Returns:

The (weighted) mean of the kernel matrix \(K_{ij}\)

class coreax.kernels.UniCompositeKernel(base_kernel)[source]¶

Bases: ScalarValuedKernel

Abstract base class for kernels that compose/wrap one scalar-valued kernel.

Parameters:: base_kernel (ScalarValuedKernel) – kernel to be wrapped/used in composition

base_kernel: ScalarValuedKernel¶

class coreax.kernels.PowerKernel(base_kernel, power)[source]¶

Bases: UniCompositeKernel, ScalarValuedKernel

Define a kernel function which is an integer power of a base kernel function.

Given a kernel function \(k:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) define the power kernel \(p:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) where \(p(x,y) = k(x,y)^n\) and \(n\in\mathbb{N}\).

Parameters:

base_kernel (ScalarValuedKernel) – Instance of ScalarValuedKernel
power (int)

power: int¶

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernels.DuoCompositeKernel(first_kernel, second_kernel)[source]¶

Bases: ScalarValuedKernel

Abstract base class for kernels that compose/wrap two scalar-valued kernels.

Parameters:

first_kernel (ScalarValuedKernel) – Instance of ScalarValuedKernel
second_kernel (ScalarValuedKernel) – Instance of ScalarValuedKernel

first_kernel: ScalarValuedKernel¶

second_kernel: ScalarValuedKernel¶

class coreax.kernels.AdditiveKernel(first_kernel, second_kernel)[source]¶

Bases: DuoCompositeKernel

Define a kernel which is a summation of two kernels.

Given kernel functions \(k:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) and \(l:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), define the additive kernel \(p:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) where \(p(x,y) := k(x,y) + l(x,y)\)

Parameters:

first_kernel (ScalarValuedKernel) – Instance of ScalarValuedKernel
second_kernel (ScalarValuedKernel) – Instance of ScalarValuedKernel

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernels.ProductKernel(first_kernel, second_kernel)[source]¶

Bases: DuoCompositeKernel, ScalarValuedKernel

Define a kernel which is a product of two kernels.

Given kernel functions \(k:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) and \(l:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), define the product kernel \(p:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) where \(p(x,y) = k(x,y)l(x,y)\)

Parameters:

first_kernel (ScalarValuedKernel) – Instance of ScalarValuedKernel
second_kernel (ScalarValuedKernel) – Instance of ScalarValuedKernel

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernels.LinearKernel(output_scale=1.0, constant=0.0)[source]¶

Bases: ScalarValuedKernel

Define a linear kernel.

Given \(\rho`=\) output_scale``and :math:`a=` ``constant the linear kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = a + \rho (x)^T(y)\).

Parameters:

output_scale (Any) – Kernel normalisation constant, \(\rho\), must be positive
constant (Any) – Additive constant, \(a\), must be non-negative

output_scale: float = 1.0¶

constant: float = 0.0¶

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernels.PolynomialKernel(output_scale=1.0, constant=0.0, degree=2)[source]¶

Bases: ScalarValuedKernel

Define a polynomial kernel.

Given \(\rho =\) output_scale, \(c =\) constant, and \(d=\) degree, the polynomial kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho (x^Ty + c)^d\).

Parameters:

output_scale (float) – Kernel normalisation constant, \(\rho\), must be positive
constant (float) – Additive constant, \(c\), must be non-negative
degree (int) – Degree of kernel, must be a positive integer greater than 1

output_scale: float = 1.0¶

constant: float = 0.0¶

degree: int = 2¶

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernels.ExponentialKernel(length_scale=1.0, output_scale=1.0)[source]¶

Bases: ScalarValuedKernel

Define an exponential kernel.

Given \(\lambda =\) length_scale and \(\rho =\) output_scale, the exponential kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(-\frac{||x-y||}{2 \lambda^2})\) where \(||\cdot||\) is the usual \(L_2\)-norm.

Warning

The exponential kernel is not differentiable when \(x=y\).

Parameters:

length_scale (float) – Kernel smoothing/bandwidth parameter, \(\lambda\), must be positive
output_scale (float) – Kernel normalisation constant, \(\rho\), must be positive

length_scale: float = 1.0¶

output_scale: float = 1.0¶

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernels.LaplacianKernel(length_scale=1.0, output_scale=1.0)[source]¶

Bases: ScalarValuedKernel

Define a Laplacian kernel.

Given \(\lambda =\) length_scale and \(\rho =\) output_scale, the Laplacian kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(-\frac{||x-y||_1}{2 \lambda^2})\) where \(||\cdot||_1\) is the \(L_1\)-norm.

Parameters:

length_scale (Any) – Kernel smoothing/bandwidth parameter, \(\lambda\), must be positive
output_scale (Any) – Kernel normalisation constant, \(\rho\), must be positive

length_scale: float = 1.0¶

output_scale: float = 1.0¶

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernels.SquaredExponentialKernel(length_scale=1.0, output_scale=1.0)[source]¶

Bases: ScalarValuedKernel

Define a squared exponential kernel.

Given \(\lambda =\) length_scale and \(\rho =\) output_scale, the squared exponential kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(-\frac{||x-y||^2}{2 \lambda^2})\) where \(||\cdot||\) is the usual \(L_2\)-norm.

Parameters:

length_scale (Any) – Kernel smoothing/bandwidth parameter, \(\lambda\), must be positive
output_scale (Any) – Kernel normalisation constant, \(\rho\), must be positive

length_scale: float = 1.0¶

output_scale: float = 1.0¶

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernels.PCIMQKernel(length_scale=1.0, output_scale=1.0)[source]¶

Bases: ScalarValuedKernel

Define a pre-conditioned inverse multi-quadric (PCIMQ) kernel.

Given \(\lambda =\) length_scale and \(\rho =\) output_scale, the PCIMQ kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \frac{\rho}{\sqrt{1 + \frac{||x-y||^2}{2 \lambda^2}}} where :math:`||\cdot||\) is the usual \(L_2\)-norm.

Parameters:

length_scale (Any) – Kernel smoothing/bandwidth parameter, \(\lambda\), must be positive
output_scale (Any) – Kernel normalisation constant, \(\rho\), must be positive

length_scale: float = 1.0¶

output_scale: float = 1.0¶

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernels.RationalQuadraticKernel(length_scale=1.0, output_scale=1.0, relative_weighting=1.0)[source]¶

Bases: ScalarValuedKernel

Define a rational quadratic kernel.

Given \(\lambda =\) length_scale, \(\rho =\) output_scale, and \(\alpha =\) relative_weighting, the rational quadratic kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * (1 + \frac{||x-y||^2}{2 \alpha \lambda^2})^{-\alpha}\) where \(||\cdot||\) is the usual \(L_2\)-norm.

Parameters:

length_scale (float) – Kernel smoothing/bandwidth parameter, \(\lambda\), must be positive
output_scale (float) – Kernel normalisation constant, \(\rho\), must be positive
relative_weighting (float) – Parameter controlling the relative weighting of large-scale and small-scale variations, \(\alpha\). As \(alpha \to \infty\) the rational quadratic kernel is identical to the squared exponential kernel. Must be non-negative

length_scale: float = 1.0¶

output_scale: float = 1.0¶

relative_weighting: float = 1.0¶

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernels.MaternKernel(length_scale=1.0, output_scale=1.0, degree=1)[source]¶

Bases: ScalarValuedKernel

Define Matérn kernel with smoothness parameter a multiple of \(\frac{1}{2}\).

Given \(\lambda =\) length_scale and \(\rho =\) output_scale, the Matérn kernel with smoothness parameter \(\nu\) set to be a multiple of \(\frac{1}{2}\), i.e. \(\nu = p + \frac{1}{2}\) where \(p`=\) degree :`math:inmathbb{N}`, is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\),

\[k(x, y) = \rho^2 * \exp\left(-\frac{\sqrt{2p+1}||x-y||}{\lambda}\right) \frac{p!}{(2p)!}\sum_{i=0}^p\frac{(p+i)!}{i!(p-i)!} \left(2\sqrt{2p+1}\frac{||x-y||}{\lambda}\right)^{p-i}\]

where \(||\cdot||\) is the usual \(L_2\)-norm.

Parameters:

length_scale (Any) – Kernel smoothing/bandwidth parameter, \(\lambda\), must be positive
output_scale (Any) – Kernel normalisation constant, \(\rho\), must be positive
degree (int) – Kernel degree, \(p\), must be a non-negative integer

length_scale: float = 1.0¶

output_scale: float = 1.0¶

degree: int = 1¶

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)

class coreax.kernels.PeriodicKernel(length_scale=1.0, output_scale=1.0, periodicity=1.0)[source]¶

Bases: ScalarValuedKernel

Define a periodic kernel.

Given \(\lambda =\) length_scale, \(\rho =\) output_scale, and \(p =\) periodicity, the periodic kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(\frac{-2 \sin^2(\pi ||x-y||/p)}{\lambda^2})\) where \(||\cdot||\) is the usual \(L_2\)-norm.

Warning

The periodic kernel is not differentiable when \(x=y\).

Parameters:

length_scale (float) – Kernel smoothing/bandwidth parameter, \(\lambda\), must be positive
output_scale (float) – Kernel normalisation constant, \(\rho\), must be positive
periodicity (float) – Parameter controlling the periodicity of the kernel \(p\)

length_scale: float = 1.0¶

output_scale: float = 1.0¶

periodicity: float = 1.0¶

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernels.LocallyPeriodicKernel(periodic_length_scale=1.0, periodic_output_scale=1.0, periodicity=1.0, squared_exponential_length_scale=1.0, squared_exponential_output_scale=1.0)[source]¶

Bases: ProductKernel

Define a locally periodic kernel.

The periodic kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = r(x,y)l(x,y)\) where \(r\) is the periodic kernel and \(l\) is the squared exponential kernel.

Warning

The locally periodic kernel is not differentiable when \(x=y\).

Parameters:

periodic_length_scale (float) – Periodic kernel smoothing/bandwidth parameter
periodic_output_scale (float) – Periodic kernel normalisation constant
periodicity (float) – Parameter controlling the periodicity of the Periodic kernel
squared_exponential_length_scale (float) – SquaredExponential kernel smoothing/bandwidth parameter]
squared_exponential_output_scale (float) – SquaredExponential Kernel normalisation constant

class coreax.kernels.PoissonKernel(index=0.5, output_scale=1.0)[source]¶

Bases: ScalarValuedKernel

Define a Poisson kernel.

Given \(r=\) index, \(0 < r < 1\), and \(\rho =\) output_scale, the Poisson kernel is defined as \(k: [0, 2\pi) \times [0, 2\pi) \to \mathbb{R}\), \(k(x, y) = \frac{\rho}{1 - 2r\cos(x-y) + r^2}\).

Warning

Unlike many other kernels in Coreax, the Poisson kernel is not defined on arbitrary \(\mathbb{R}^d\), but instead a subset of the positive real line \([0, 2\pi)\). We do not check that inputs to methods in this class lie in the correct domain, therefore unexpected behaviour may occur. For example, passing \(n\)-vectors to the compute method will be interpreted as one observation of a :math:`n- dimensional vector, and not \(n\) observations of a one dimensional vector, and therefore would be an invalid use of this kernel function.

Parameters:

index (Any) – Kernel parameter indexing the family of Poisson kernel functions
output_scale (Any) – Kernel normalisation constant, \(\rho\), must be positive

index: float = 0.5¶

output_scale: float = 1.0¶

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernels.ScalarValuedKernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]¶

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:

x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernels.SteinKernel(base_kernel, score_function)[source]¶

Bases: UniCompositeKernel

Define the Stein kernel, i.e. the application of the Stein operator.

\[\mathcal{A}_\mathbb{P}(g(\mathbf{x})) := \nabla_\mathbf{x} g(\mathbf{x}) + g(\mathbf{x}) \nabla_\mathbf{x} \log f_X(\mathbf{x})^\intercal\]

w.r.t. probability measure \(\mathbb{P}\) to the base kernel \(k(\mathbf{x}, \mathbf{y})\). Here, differentiable vector-valued \(g: \mathbb{R}^d \to \mathbb{R}^d\), and \(\nabla_\mathbf{x} \log f_X(\mathbf{x})\) is the score function of measure \(\mathbb{P}\).

\(\mathbb{P}\) is assumed to admit a density function \(f_X\) w.r.t. d-dimensional Lebesgue measure. The score function is assumed to be Lipschitz.

The key property of a Stein operator is zero expectation under \(\mathbb{P}\), i.e. \(\mathbb{E}_\mathbb{P}[\mathcal{A}_\mathbb{P} f(\mathbf{x})]\), for positive differentiable \(f_X\).

The Stein kernel for base kernel \(k(\mathbf{x}, \mathbf{y})\) is defined as

\[k_\mathbb{P}(\mathbf{x}, \mathbf{y}) = \nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) + \nabla_\mathbf{x} \log f_X(\mathbf{x}) \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) + \nabla_\mathbf{y} \log f_X(\mathbf{y}) \cdot \nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) + (\nabla_\mathbf{x} \log f_X(\mathbf{x}) \cdot \nabla_\mathbf{y} \log f_X(\mathbf{y})) k(\mathbf{x}, \mathbf{y}).\]

This kernel requires a ‘base’ kernel to evaluate. The base kernel can be any other implemented subclass of the Kernel abstract base class; even another Stein kernel.

The score function \(\nabla_\mathbf{x} \log f_X: \mathbb{R}^d \to \mathbb{R}^d\) can be any suitable Lipschitz score function, e.g. one that is learned from score matching (ScoreMatching), computed explicitly from a density function, or known analytically.

Parameters:

base_kernel (ScalarValuedKernel) – Initialised kernel object with which to evaluate the Stein kernel
score_function (Callable[[Shaped[Array, 'n d']], Shaped[Array, 'n d']]) – A vector-valued callable defining a score function \(\mathbb{R}^d \to \mathbb{R}^d\)

score_function: Callable[[Shaped[Array, 'n d']], Shaped[Array, 'n d']]¶

compute_elementwise(x, y)[source]¶

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:

x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Returns:

Kernel evaluated at (x, y)