Kernels#

Classes and associated functionality to use kernel functions.

A kernel is a non-negative, real-valued integrable function that can take two inputs, x and y, and returns a value that decreases as x and y move further away in space from each other. Note that further here may account for cyclic behaviour in the data, for example.

In this library, we often use kernels as a smoothing tool: given a dataset of distinct points, we can reconstruct the underlying data generating distribution through smoothing of the data with kernels.

Some kernels are parameterizable and may represent other well known kernels when given appropriate parameter values. For example, the SquaredExponentialKernel,

\[k(x,y) = \text{output_scale} * \exp (-||x-y||^2/2 * \text{length_scale}^2),\]

which if parameterized with an output_scale of \(\frac{1}{\sqrt{2\pi} \,*\, \text{length_scale}}\), yields the well known Gaussian kernel.

A Kernel must implement a compute_elementwise() method, which returns the floating point value after evaluating the kernel on two floats, x and y. Additional methods, such as Kernel.grad_x_elementwise(), can optionally be overridden to improve performance. The canonical example is when a suitable closed-form representation of a higher-order gradient can be used to avoid the expense of performing automatic differentiation.

Such an example can be seen in SteinKernel, where the analytic forms of Kernel.divergence_x_grad_y() are significantly cheaper to compute that the automatic differentiated default.

coreax.kernel.median_heuristic(x)[source]#

Compute the median heuristic for setting kernel bandwidth.

Analysis of the performance of the median heuristic can be found in [garreau2018median].

Parameters:

x (ArrayLike) – Input array of vectors

Return type:

Array

Returns:

Bandwidth parameter, computed from the median heuristic, as a zero-dimensional array

class coreax.kernel.Kernel[source]#

Abstract base class for kernels.

compute(x, y)[source]#

Evaluate the kernel on input data x and y.

The ‘data’ can be any of:
  • floating numbers (so a single data-point in 1-dimension)

  • zero-dimensional arrays (so a single data-point in 1-dimension)

  • a vector (a single-point in multiple dimensions)

  • array (multiple vectors).

Evaluation is always vectorised.

Parameters:
  • x (ArrayLike) – An \(n \times d\) dataset (array) or a single value (point)

  • y (ArrayLike) – An \(m \times d\) dataset (array) or a single value (point)

Return type:

Array

Returns:

Kernel evaluations between points in x and y. If x = y, then this is the Gram matrix corresponding to the RKHS inner product.

abstract compute_elementwise(x, y)[source]#

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Kernel evaluated at (x, y)

grad_x(x, y)[source]#

Evaluate the gradient (Jacobian) of the kernel function w.r.t. x.

The function is vectorised, so x or y can be any of:
  • floating numbers (so a single data-point in 1-dimension)

  • zero-dimensional arrays (so a single data-point in 1-dimension)

  • a vector (a single-point in multiple dimensions)

  • array (multiple vectors).

Parameters:
  • x (ArrayLike) – An \(n \times d\) dataset (array) or a single value (point)

  • y (ArrayLike) – An \(m \times d\) dataset (array) or a single value (point)

Return type:

Array

Returns:

An \(n \times m \times d\) array of pairwise Jacobians

grad_y(x, y)[source]#

Evaluate the gradient (Jacobian) of the kernel function w.r.t. y.

The function is vectorised, so x or y can be any of:
  • floating numbers (so a single data-point in 1-dimension)

  • zero-dimensional arrays (so a single data-point in 1-dimension)

  • a vector (a single-point in multiple dimensions)

  • array (multiple vectors).

Parameters:
  • x (ArrayLike) – An \(n \times d\) dataset (array) or a single value (point)

  • y (ArrayLike) – An \(m \times d\) dataset (array) or a single value (point)

Return type:

Array

Returns:

An \(m \times n \times d\) array of pairwise Jacobians

grad_x_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\).

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y(x, y)[source]#

Evaluate the divergence operator w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). This function is vectorised, so it accepts vectors or arrays.

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:
  • x (ArrayLike) – First vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Array of Laplace-style operator traces \(n \times m\) array

divergence_x_grad_y_elementwise(x, y)[source]#

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:
  • x (ArrayLike) – First vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Trace of the Laplace-style operator; a real number

gramian_row_mean(x, *, block_size=None, unroll=1)[source]#

Compute the (blocked) row-mean of the kernel’s Gramian matrix.

A convenience method for calling compute_mean(). Equivalent to the call compute_mean(x, x, axis=0, block_size=block_size, unroll=unroll).

Parameters:
Return type:

Array

Returns:

Gramian ‘row/column-mean’, \(\frac{1}{n}\sum_{i=1}^{n} G_{ij}\).

compute_mean(x, y, axis=None, *, block_size=None, unroll=1)[source]#

Compute the (blocked) mean of the matrix \(K_{ij} = k(x_i, y_j)\).

The \(n \times m\) kernel matrix \(K_{ij} = k(x_i, y_j)\), where x and y are respectively \(n \times d\) and \(m \times d\) (weighted) data matrices, has the following (weighted) means:

  • mean (axis=None) \(\frac{1}{n m}\sum_{i,j=1}^{n, m} K_{ij}\)

  • row-mean (axis=0) \(\frac{1}{n}\sum_{i=1}^{n} K_{ij}\)

  • column-mean (axis=1) \(\frac{1}{m}\sum_{j=1}^{m} K_{ij}\)

If x and y are of type Data, their weights are used to compute the weighted mean as defined in jax.numpy.average().

Note

The conventional ‘mean’ is a scalar, the ‘row-mean’ is an \(m\)-vector, while the ‘column-mean’ is an \(n\)-vector.

To avoid materializing the entire matrix (memory cost \(\mathcal{O}(n m)\)), we accumulate the mean over blocks (memory cost \(\mathcal{O}(B_x B_y)\), where B_x and B_y are user-specified block-sizes for blocking the x and y parameters respectively.

Note

The data x and/or y are padded with zero-valued and zero-weighted data points, when B_x and/or B_y are non-integer divisors of n and/or m. Padding does not alter the result, but does provide the block shape stability required by jax.lax.scan() (used for block iteration).

Parameters:
  • x (Union[Array, ndarray, bool_, number, bool, int, float, complex, Data]) – Data matrix, \(n \times d\)

  • y (Union[Array, ndarray, bool_, number, bool, int, float, complex, Data]) – Data matrix, \(m \times d\)

  • axis (Optional[int]) – Which axis of the kernel matrix to compute the mean over; a value of None computes the mean over both axes

  • block_size (Union[int, None, tuple[Optional[int], Optional[int]]]) – Size of matrix blocks to process; a value of None sets \(B_x = n\) and \(B_y = m\), effectively disabling the block accumulation; an integer value B sets \(B_y = B_x = B\); a tuple allows different sizes to be specified for B_x and B_y; to reduce overheads, it is often sensible to select the largest block size that does not exhaust the available memory resources

  • unroll (Union[int, bool, tuple[Union[int, bool], Union[int, bool]]]) – Unrolling parameter for the outer and inner jax.lax.scan() calls, allows for trade-offs between compilation and runtime cost; consult the JAX docs for further information

Return type:

Array

Returns:

The (weighted) mean of the kernel matrix \(K_{ij}\)

class coreax.kernel.PairedKernel(first_kernel, second_kernel)[source]#

Abstract base class for kernels that compose/wrap two kernels.

Parameters:
class coreax.kernel.AdditiveKernel(first_kernel, second_kernel)[source]#

Define a kernel which is a summation of two kernels.

Given kernel functions \(k:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) and \(l:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), define the additive kernel \(p:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) where \(p(x,y) := k(x,y) + l(x,y)\)

Parameters:
compute_elementwise(x, y)[source]#

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\).

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]#

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:
  • x (ArrayLike) – First vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernel.ProductKernel(first_kernel, second_kernel)[source]#

Define a kernel which is a product of two kernels.

Given kernel functions \(k:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) and \(l:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), define the product kernel \(p:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) where \(p(x,y) = k(x,y)l(x,y)\)

Parameters:
compute_elementwise(x, y)[source]#

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\).

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]#

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:
  • x (ArrayLike) – First vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernel.LinearKernel(output_scale=1.0, constant=0.0)[source]#

Define a linear kernel.

Given \(\rho`=\)’output_scale` and \(c =\)’constant’, the linear kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho x^Ty + c\).

Parameters:
  • output_scale (Any) – Kernel normalisation constant, \(\rho\)

  • constant (Any) – Additive constant, \(c\)

compute_elementwise(x, y)[source]#

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\).

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]#

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:
  • x (ArrayLike) – First vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernel.PolynomialKernel(output_scale=1.0, constant=0.0, degree=2)[source]#

Define a polynomial kernel.

Given \(\rho =``output_scale\), \(c =\)’constant’, and \(d=\)’degree’, the polynomial kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho (x^Ty + c)^d\).

Parameters:
  • output_scale (float) – Kernel normalisation constant, \(\rho\)

  • constant (float) – Additive constant, \(c\)

  • degree (int) – Degree of kernel, must be a positive integer greater than 1.

compute_elementwise(x, y)[source]#

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\).

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]#

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:
  • x (ArrayLike) – First vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernel.SquaredExponentialKernel(length_scale=1.0, output_scale=1.0)[source]#

Define a squared exponential kernel.

Given \(\lambda =``length_scale\) and \(\rho =``output_scale\), the squared exponential kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(-\frac{||x-y||^2}{2 \lambda^2})\) where \(||\cdot||\) is the usual \(L_2\)-norm.

Parameters:
  • length_scale (Any) – Kernel smoothing/bandwidth parameter, \(\lambda\)

  • output_scale (Any) – Kernel normalisation constant, \(\rho\)

compute_elementwise(x, y)[source]#

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\).

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]#

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:
  • x (ArrayLike) – First vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernel.ExponentialKernel(length_scale=1.0, output_scale=1.0)[source]#

Define an exponential kernel.

Given \(\lambda =``length_scale\) and \(\rho =``output_scale\), the exponential kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(-\frac{||x-y||}{2 \lambda^2})\) where \(||\cdot||\) is the usual \(L_2\)-norm.

Note

Note that the Exponential kernel is not differentiable when \(x=y\).

Parameters:
  • length_scale (float) – Kernel smoothing/bandwidth parameter, \(\lambda\)

  • output_scale (float) – Kernel normalisation constant, \(\rho\)

compute_elementwise(x, y)[source]#

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\).

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]#

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:
  • x (ArrayLike) – First vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernel.RationalQuadraticKernel(length_scale=1.0, output_scale=1.0, relative_weighting=1.0)[source]#

Define a rational quadratic kernel.

Given \(\lambda =``length_scale\), \(\rho =``output_scale\), and \(\alpha =``relative_weighting\), the rational quadratic kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * (1 + \frac{||x-y||^2}{2 \alpha \lambda^2})^{-\alpha}\) where \(||\cdot||\) is the usual \(L_2\)-norm.

Parameters:
  • length_scale (float) – Kernel smoothing/bandwidth parameter, \(\lambda\)

  • output_scale (float) – Kernel normalisation constant, \(\rho\)

  • relative_weighting (float) – Parameter controlling the relative weighting of large-scale and small-scale variations, \(\alpha\). As \(alpha \to \infty\) the rational quadratic kernel is identical to the squared exponential kernel.

compute_elementwise(x, y)[source]#

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\).

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]#

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:
  • x (ArrayLike) – First vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernel.PeriodicKernel(length_scale=1.0, output_scale=1.0, periodicity=1.0)[source]#

Define a periodic kernel.

Given \(\lambda =``length_scale\), \(\rho =``output_scale\), and \(\p =``periodicity\), the periodic kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(\frac{-2 \sin^2(\pi ||x-y||/p)}{\lambda^2})\) where \(||\cdot||\) is the usual \(L_2\)-norm.

Note

Note that the Periodic kernel is not differentiable when \(x=y\).

Parameters:
  • length_scale (float) – Kernel smoothing/bandwidth parameter, \(\lambda\)

  • output_scale (float) – Kernel normalisation constant, \(\rho\)

  • periodicity (float) – Parameter controlling the periodicity of the kernel. :math: p

compute_elementwise(x, y)[source]#

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\).

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]#

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:
  • x (ArrayLike) – First vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernel.LocallyPeriodicKernel(periodic_length_scale=1.0, periodic_output_scale=1.0, periodicity=1.0, squared_exponential_length_scale=1.0, squared_exponential_output_scale=1.0)[source]#

Define a locally periodic kernel.

The periodic kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = r(x,y)l(x,y)\) where \(r\) is the periodic kernel and \(l\) is the squared exponential kernel.

Note

Note that the Periodic kernel is not differentiable when \(x=y\).

Parameters:
  • periodic_length_scale (float) – Periodic kernel smoothing/bandwidth parameter

  • periodic_output_scale (float) – Periodic kernel normalisation constant

  • periodicity (float) – Parameter controlling the periodicity of the Periodic kernel

  • squared_exponential_length_scale (float) – SquaredExponential kernel smoothing/bandwidth parameter]

  • squared_exponential_output_scale (float) – SquaredExponential Kernel normalisation constant

class coreax.kernel.LaplacianKernel(length_scale=1.0, output_scale=1.0)[source]#

Define a Laplacian kernel.

Given \(\lambda =``length_scale\) and \(\rho =``output_scale\), the Laplacian kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(-\frac{||x-y||_1}{2 \lambda^2})\) where \(||\cdot||_1\) is the \(L_1\)-norm.

Parameters:
  • length_scale (Any) – Kernel smoothing/bandwidth parameter, \(\lambda\)

  • output_scale (Any) – Kernel normalisation constant, \(\rho\)

compute_elementwise(x, y)[source]#

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\).

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]#

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:
  • x (ArrayLike) – First vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernel.PCIMQKernel(length_scale=1.0, output_scale=1.0)[source]#

Define a pre-conditioned inverse multi-quadric (PCIMQ) kernel.

Given \(\lambda =``length_scale\) and \(\rho =``output_scale\), the PCIMQ kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \frac{\rho}{\sqrt{1 + \frac{||x-y||^2}{2 \lambda^2}}} where :math:`||\cdot||\) is the usual \(L_2\)-norm.

Parameters:
  • length_scale (Any) – Kernel smoothing/bandwidth parameter, \(\lambda\)

  • output_scale (Any) – Kernel normalisation constant, \(\rho\)

compute_elementwise(x, y)[source]#

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Kernel evaluated at (x, y)

grad_x_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. x.

The gradient (Jacobian) of the kernel function w.r.t. x.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_x() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

grad_y_elementwise(x, y)[source]#

Evaluate the element-wise gradient of the kernel function w.r.t. y.

The gradient (Jacobian) of the kernel function w.r.t. y.

Only accepts single vectors x and y, i.e. not arrays. coreax.kernel.Kernel.grad_y() provides a vectorised version of this method for arrays.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\).

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\).

Return type:

Array

Returns:

Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)

divergence_x_grad_y_elementwise(x, y)[source]#

Evaluate the element-wise divergence w.r.t. x of Jacobian w.r.t. y.

\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors x and y. A vectorised version for arrays is computed in divergence_x_grad_y().

This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).

Parameters:
  • x (ArrayLike) – First vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Second vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Trace of the Laplace-style operator; a real number

class coreax.kernel.CompositeKernel(base_kernel)[source]#

Abstract base class for kernels that compose/wrap one kernel.

Parameters:

base_kernel (Kernel) – kernel to be wrapped/used in composition

class coreax.kernel.SteinKernel(base_kernel, score_function)[source]#

Define the Stein kernel, i.e. the application of the Stein operator.

\[\mathcal{A}_\mathbb{P}(g(\mathbf{x})) := \nabla_\mathbf{x} g(\mathbf{x}) + g(\mathbf{x}) \nabla_\mathbf{x} \log f_X(\mathbf{x})^\intercal\]

w.r.t. probability measure \(\mathbb{P}\) to the base kernel \(k(\mathbf{x}, \mathbf{y})\). Here, differentiable vector-valued \(g: \mathbb{R}^d \to \mathbb{R}^d\), and \(\nabla_\mathbf{x} \log f_X(\mathbf{x})\) is the score function of measure \(\mathbb{P}\).

\(\mathbb{P}\) is assumed to admit a density function \(f_X\) w.r.t. d-dimensional Lebesgue measure. The score function is assumed to be Lipschitz.

The key property of a Stein operator is zero expectation under \(\mathbb{P}\), i.e. \(\mathbb{E}_\mathbb{P}[\mathcal{A}_\mathbb{P} f(\mathbf{x})]\), for positive differentiable \(f_X\).

The Stein kernel for base kernel \(k(\mathbf{x}, \mathbf{y})\) is defined as

\[k_\mathbb{P}(\mathbf{x}, \mathbf{y}) = \nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) + \nabla_\mathbf{x} \log f_X(\mathbf{x}) \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) + \nabla_\mathbf{y} \log f_X(\mathbf{y}) \cdot \nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) + (\nabla_\mathbf{x} \log f_X(\mathbf{x}) \cdot \nabla_\mathbf{y} \log f_X(\mathbf{y})) k(\mathbf{x}, \mathbf{y}).\]

This kernel requires a ‘base’ kernel to evaluate. The base kernel can be any other implemented subclass of the Kernel abstract base class; even another Stein kernel.

The score function \(\nabla_\mathbf{x} \log f_X: \mathbb{R}^d \to \mathbb{R}^d\) can be any suitable Lipschitz score function, e.g. one that is learned from score matching (ScoreMatching), computed explicitly from a density function, or known analytically.

Parameters:
  • base_kernel (Kernel) – Initialised kernel object with which to evaluate the Stein kernel

  • score_function (Callable[[ArrayLike], Array]) – A vector-valued callable defining a score function \(\mathbb{R}^d \to \mathbb{R}^d\)

compute_elementwise(x, y)[source]#

Evaluate the kernel on individual input vectors x and y, not-vectorised.

Vectorisation only becomes relevant in terms of computational speed when we have multiple x or y.

Parameters:
  • x (ArrayLike) – Vector \(\mathbf{x} \in \mathbb{R}^d\)

  • y (ArrayLike) – Vector \(\mathbf{y} \in \mathbb{R}^d\)

Return type:

Array

Returns:

Kernel evaluated at (x, y)