Kernels¶
Solvers for generating coresets.
- coreax.kernels.median_heuristic(x)[source]¶
Compute the median heuristic for setting kernel bandwidth.
Analysis of the performance of the median heuristic can be found in [garreau2018median].
- class coreax.kernels.ScalarValuedKernel[source]¶
Bases:
ModuleAbstract base class for scalar-valued kernels.
- compute(x, y)[source]¶
Evaluate the kernel on input data
xandy.- The ‘data’ can be any of:
floating numbers (so a single data-point in 1-dimension)
zero-dimensional arrays (so a single data-point in 1-dimension)
a vector (a single-point in multiple dimensions)
array (multiple vectors).
Evaluation is always vectorised.
- Parameters:
- Return type:
Union[Shaped[Array, 'n m'],Shaped[Array, '1 1']]- Returns:
Kernel evaluations between points in
xandy. Ifx=y, then this is the Gram matrix corresponding to the RKHS inner product.
- abstract compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.
- grad_x(x, y)[source]¶
Evaluate the gradient (Jacobian) of the kernel function w.r.t.
x.- The function is vectorised, so
xorycan be any of: floating numbers (so a single data-point in 1-dimension)
zero-dimensional arrays (so a single data-point in 1-dimension)
a vector (a single-point in multiple dimensions)
array (multiple vectors).
- Parameters:
- Return type:
Union[Shaped[Array, 'n m d'],Shaped[Array, '1 1 d'],Shaped[Array, '1 1 1']]- Returns:
An \(n \times m \times d\) array of pairwise Jacobians
- The function is vectorised, so
- grad_y(x, y)[source]¶
Evaluate the gradient (Jacobian) of the kernel function w.r.t.
y.- The function is vectorised, so
xorycan be any of: floating numbers (so a single data-point in 1-dimension)
zero-dimensional arrays (so a single data-point in 1-dimension)
a vector (a single-point in multiple dimensions)
array (multiple vectors).
- Parameters:
- Return type:
Union[Shaped[Array, 'n m d'],Shaped[Array, '1 1 d'],Shaped[Array, '1 1 1']]- Returns:
An \(m \times n \times d\) array of pairwise Jacobians
- The function is vectorised, so
- grad_x_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_x()provides a vectorised version of this method for arrays.- Parameters:
- Return type:
Union[Shaped[Array, 'd'],Shaped[Array, '']]- Returns:
Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_y()provides a vectorised version of this method for arrays.- Parameters:
- Return type:
Union[Shaped[Array, 'd'],Shaped[Array, '']]- Returns:
Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- divergence_x_grad_y(x, y)[source]¶
Evaluate the divergence operator w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). This function is vectorised, so it accepts vectors or arrays.
This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
- Return type:
Union[Shaped[Array, 'n m'],Shaped[Array, '1 1']]- Returns:
Array of Laplace-style operator traces \(n \times m\) array
- divergence_x_grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
- Return type:
Shaped[Array, '']- Returns:
Trace of the Laplace-style operator; a real number
- gramian_row_mean(x, *, block_size=None, unroll=1)[source]¶
Compute the (blocked) row-mean of the kernel’s Gramian matrix.
A convenience method for calling
compute_mean(). Equivalent to the callcompute_mean(x, x, axis=0, block_size=block_size, unroll=unroll).- Parameters:
x (
Union[Shaped[Array, 'n d'],Shaped[Array, 'd'],Shaped[Array, ''],float,int,Data]) – Data matrix, \(n \times d\)block_size (
Union[int,None,tuple[Optional[int],Optional[int]]]) – Block size parameter passed tocompute_mean()unroll (
Union[int,bool,tuple[Union[int,bool],Union[int,bool]]]) – Unroll parameter passed tocompute_mean()
- Return type:
Shaped[Array, 'n']- Returns:
Gramian ‘row/column-mean’, \(\frac{1}{n}\sum_{i=1}^{n} G_{ij}\).
- compute_mean(x, y, axis=None, *, block_size=None, unroll=1)[source]¶
Compute the (blocked) mean of the matrix \(K_{ij} = k(x_i, y_j)\).
The \(n \times m\) kernel matrix \(K_{ij} = k(x_i, y_j)\), where
xandyare respectively \(n \times d\) and \(m \times d\) (weighted) data matrices, has the following (weighted) means:mean (
axis=None) \(\frac{1}{n m}\sum_{i,j=1}^{n, m} K_{ij}\)row-mean (
axis=0) \(\frac{1}{n}\sum_{i=1}^{n} K_{ij}\)column-mean (
axis=1) \(\frac{1}{m}\sum_{j=1}^{m} K_{ij}\)
If
xandyare of typeData, their weights are used to compute the weighted mean as defined injax.numpy.average().Note
The conventional ‘mean’ is a scalar, the ‘row-mean’ is an \(m\)-vector, while the ‘column-mean’ is an \(n\)-vector.
To avoid materializing the entire matrix (memory cost \(\mathcal{O}(n m)\)), we accumulate the mean over blocks (memory cost \(\mathcal{O}(B_x B_y)\), where
B_xandB_yare user-specified block-sizes for blocking thexandyparameters respectively.Note
The data
xand/oryare padded with zero-valued and zero-weighted data points, whenB_xand/orB_yare non-integer divisors ofnand/orm. Padding does not alter the result, but does provide the block shape stability required byjax.lax.scan()(used for block iteration).- Parameters:
x (
Union[Shaped[Array, 'n d'],Shaped[Array, 'd'],Shaped[Array, ''],float,int,Data]) – Data matrix, \(n \times d\)y (
Union[Shaped[Array, 'm d'],Shaped[Array, 'd'],Shaped[Array, ''],float,int,Data]) – Data matrix, \(m \times d\)axis (
Optional[int]) – Which axis of the kernel matrix to compute the mean over; a value of None computes the mean over both axesblock_size (
Union[int,None,tuple[Optional[int],Optional[int]]]) – Size of matrix blocks to process; a value ofNonesets \(B_x = n\) and \(B_y = m\), effectively disabling the block accumulation; an integer valueBsets \(B_y = B_x = B\); a tuple allows different sizes to be specified forB_xandB_y; to reduce overheads, it is often sensible to select the largest block size that does not exhaust the available memory resourcesunroll (
Union[int,bool,tuple[Union[int,bool],Union[int,bool]]]) – Unrolling parameter for the outer and innerjax.lax.scan()calls, allows for trade-offs between compilation and runtime cost; consult the JAX docs for further information
- Return type:
Union[Shaped[Array, 'n'],Shaped[Array, 'm'],Shaped[Array, '']]- Returns:
The (weighted) mean of the kernel matrix \(K_{ij}\)
- class coreax.kernels.UniCompositeKernel(base_kernel)[source]¶
Bases:
ScalarValuedKernelAbstract base class for kernels that compose/wrap one scalar-valued kernel.
- Parameters:
base_kernel (
ScalarValuedKernel) – kernel to be wrapped/used in composition
-
base_kernel:
ScalarValuedKernel¶
- class coreax.kernels.PowerKernel(base_kernel, power)[source]¶
Bases:
UniCompositeKernel,ScalarValuedKernelDefine a kernel function which is an integer power of a base kernel function.
Given a kernel function \(k:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) define the power kernel \(p:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) where \(p(x,y) = k(x,y)^n\) and \(n\in\mathbb{N}\).
- Parameters:
base_kernel (
ScalarValuedKernel) – Instance ofScalarValuedKernelpower (
int)
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)
- grad_x_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_x()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_y()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).
- Returns:
Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- divergence_x_grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Trace of the Laplace-style operator; a real number
- class coreax.kernels.DuoCompositeKernel(first_kernel, second_kernel)[source]¶
Bases:
ScalarValuedKernelAbstract base class for kernels that compose/wrap two scalar-valued kernels.
- Parameters:
first_kernel (
ScalarValuedKernel) – Instance ofScalarValuedKernelsecond_kernel (
ScalarValuedKernel) – Instance ofScalarValuedKernel
-
first_kernel:
ScalarValuedKernel¶
-
second_kernel:
ScalarValuedKernel¶
- class coreax.kernels.AdditiveKernel(first_kernel, second_kernel)[source]¶
Bases:
DuoCompositeKernelDefine a kernel which is a summation of two kernels.
Given kernel functions \(k:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) and \(l:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), define the additive kernel \(p:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) where \(p(x,y) := k(x,y) + l(x,y)\)
- Parameters:
first_kernel (
ScalarValuedKernel) – Instance ofScalarValuedKernelsecond_kernel (
ScalarValuedKernel) – Instance ofScalarValuedKernel
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)
- grad_x_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_x()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_y()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).
- Returns:
Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- divergence_x_grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Trace of the Laplace-style operator; a real number
- class coreax.kernels.ProductKernel(first_kernel, second_kernel)[source]¶
Bases:
DuoCompositeKernel,ScalarValuedKernelDefine a kernel which is a product of two kernels.
Given kernel functions \(k:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) and \(l:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), define the product kernel \(p:\mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\) where \(p(x,y) = k(x,y)l(x,y)\)
- Parameters:
first_kernel (
ScalarValuedKernel) – Instance ofScalarValuedKernelsecond_kernel (
ScalarValuedKernel) – Instance ofScalarValuedKernel
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)
- grad_x_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_x()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_y()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).
- Returns:
Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- divergence_x_grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Trace of the Laplace-style operator; a real number
- class coreax.kernels.LinearKernel(output_scale=1.0, constant=0.0)[source]¶
Bases:
ScalarValuedKernelDefine a linear kernel.
Given \(\rho`=\)
output_scale``and :math:`a=` ``constantthe linear kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = a + \rho (x)^T(y)\).- Parameters:
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)
- grad_x_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_x()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_y()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).
- Returns:
Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- divergence_x_grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Trace of the Laplace-style operator; a real number
- class coreax.kernels.PolynomialKernel(output_scale=1.0, constant=0.0, degree=2)[source]¶
Bases:
ScalarValuedKernelDefine a polynomial kernel.
Given \(\rho =\)
output_scale, \(c =\)constant, and \(d=\)degree, the polynomial kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho (x^Ty + c)^d\).- Parameters:
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)
- grad_x_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_x()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_y()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).
- Returns:
Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- divergence_x_grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Trace of the Laplace-style operator; a real number
- class coreax.kernels.ExponentialKernel(length_scale=1.0, output_scale=1.0)[source]¶
Bases:
ScalarValuedKernelDefine an exponential kernel.
Given \(\lambda =\)
length_scaleand \(\rho =\)output_scale, the exponential kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(-\frac{||x-y||}{2 \lambda^2})\) where \(||\cdot||\) is the usual \(L_2\)-norm.Warning
The exponential kernel is not differentiable when \(x=y\).
- Parameters:
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)
- grad_x_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_x()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_y()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).
- Returns:
Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- divergence_x_grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Trace of the Laplace-style operator; a real number
- class coreax.kernels.LaplacianKernel(length_scale=1.0, output_scale=1.0)[source]¶
Bases:
ScalarValuedKernelDefine a Laplacian kernel.
Given \(\lambda =\)
length_scaleand \(\rho =\)output_scale, the Laplacian kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(-\frac{||x-y||_1}{2 \lambda^2})\) where \(||\cdot||_1\) is the \(L_1\)-norm.- Parameters:
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)
- grad_x_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_x()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_y()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).
- Returns:
Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- divergence_x_grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Trace of the Laplace-style operator; a real number
- class coreax.kernels.SquaredExponentialKernel(length_scale=1.0, output_scale=1.0)[source]¶
Bases:
ScalarValuedKernelDefine a squared exponential kernel.
Given \(\lambda =\)
length_scaleand \(\rho =\)output_scale, the squared exponential kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(-\frac{||x-y||^2}{2 \lambda^2})\) where \(||\cdot||\) is the usual \(L_2\)-norm.- Parameters:
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)
- grad_x_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_x()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_y()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).
- Returns:
Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- divergence_x_grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Trace of the Laplace-style operator; a real number
- class coreax.kernels.PCIMQKernel(length_scale=1.0, output_scale=1.0)[source]¶
Bases:
ScalarValuedKernelDefine a pre-conditioned inverse multi-quadric (PCIMQ) kernel.
Given \(\lambda =\)
length_scaleand \(\rho =\)output_scale, the PCIMQ kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \frac{\rho}{\sqrt{1 + \frac{||x-y||^2}{2 \lambda^2}}} where :math:`||\cdot||\) is the usual \(L_2\)-norm.- Parameters:
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)
- grad_x_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_x()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_y()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).
- Returns:
Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- divergence_x_grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Trace of the Laplace-style operator; a real number
- class coreax.kernels.RationalQuadraticKernel(length_scale=1.0, output_scale=1.0, relative_weighting=1.0)[source]¶
Bases:
ScalarValuedKernelDefine a rational quadratic kernel.
Given \(\lambda =\)
length_scale, \(\rho =\)output_scale, and \(\alpha =\)relative_weighting, the rational quadratic kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * (1 + \frac{||x-y||^2}{2 \alpha \lambda^2})^{-\alpha}\) where \(||\cdot||\) is the usual \(L_2\)-norm.- Parameters:
length_scale (
float) – Kernel smoothing/bandwidth parameter, \(\lambda\), must be positiveoutput_scale (
float) – Kernel normalisation constant, \(\rho\), must be positiverelative_weighting (
float) – Parameter controlling the relative weighting of large-scale and small-scale variations, \(\alpha\). As \(alpha \to \infty\) the rational quadratic kernel is identical to the squared exponential kernel. Must be non-negative
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)
- grad_x_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_x()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_y()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).
- Returns:
Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- divergence_x_grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Trace of the Laplace-style operator; a real number
- class coreax.kernels.MaternKernel(length_scale=1.0, output_scale=1.0, degree=1)[source]¶
Bases:
ScalarValuedKernelDefine Matérn kernel with smoothness parameter a multiple of \(\frac{1}{2}\).
Given \(\lambda =\)
length_scaleand \(\rho =\)output_scale, the Matérn kernel with smoothness parameter \(\nu\) set to be a multiple of \(\frac{1}{2}\), i.e. \(\nu = p + \frac{1}{2}\) where \(p`=\)degree:`math:inmathbb{N}`, is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\),\[k(x, y) = \rho^2 * \exp\left(-\frac{\sqrt{2p+1}||x-y||}{\lambda}\right) \frac{p!}{(2p)!}\sum_{i=0}^p\frac{(p+i)!}{i!(p-i)!} \left(2\sqrt{2p+1}\frac{||x-y||}{\lambda}\right)^{p-i}\]where \(||\cdot||\) is the usual \(L_2\)-norm.
- Parameters:
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)
- class coreax.kernels.PeriodicKernel(length_scale=1.0, output_scale=1.0, periodicity=1.0)[source]¶
Bases:
ScalarValuedKernelDefine a periodic kernel.
Given \(\lambda =\)
length_scale, \(\rho =\)output_scale, and \(p =\)periodicity, the periodic kernel is defined as \(k: \mathbb{R}^d\times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = \rho * \exp(\frac{-2 \sin^2(\pi ||x-y||/p)}{\lambda^2})\) where \(||\cdot||\) is the usual \(L_2\)-norm.Warning
The periodic kernel is not differentiable when \(x=y\).
- Parameters:
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)
- grad_x_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_x()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_y()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).
- Returns:
Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- divergence_x_grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Trace of the Laplace-style operator; a real number
- class coreax.kernels.LocallyPeriodicKernel(periodic_length_scale=1.0, periodic_output_scale=1.0, periodicity=1.0, squared_exponential_length_scale=1.0, squared_exponential_output_scale=1.0)[source]¶
Bases:
ProductKernelDefine a locally periodic kernel.
The periodic kernel is defined as \(k: \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), \(k(x, y) = r(x,y)l(x,y)\) where \(r\) is the periodic kernel and \(l\) is the squared exponential kernel.
Warning
The locally periodic kernel is not differentiable when \(x=y\).
- Parameters:
periodic_length_scale (
float) – Periodic kernel smoothing/bandwidth parameterperiodic_output_scale (
float) – Periodic kernel normalisation constantperiodicity (
float) – Parameter controlling the periodicity of the Periodic kernelsquared_exponential_length_scale (
float) – SquaredExponential kernel smoothing/bandwidth parameter]squared_exponential_output_scale (
float) – SquaredExponential Kernel normalisation constant
- class coreax.kernels.PoissonKernel(index=0.5, output_scale=1.0)[source]¶
Bases:
ScalarValuedKernelDefine a Poisson kernel.
Given \(r=\)
index, \(0 < r < 1\), and \(\rho =\)output_scale, the Poisson kernel is defined as \(k: [0, 2\pi) \times [0, 2\pi) \to \mathbb{R}\), \(k(x, y) = \frac{\rho}{1 - 2r\cos(x-y) + r^2}\).Warning
Unlike many other kernels in Coreax, the Poisson kernel is not defined on arbitrary \(\mathbb{R}^d\), but instead a subset of the positive real line \([0, 2\pi)\). We do not check that inputs to methods in this class lie in the correct domain, therefore unexpected behaviour may occur. For example, passing \(n\)-vectors to the compute method will be interpreted as one observation of a :math:`n- dimensional vector, and not \(n\) observations of a one dimensional vector, and therefore would be an invalid use of this kernel function.
- Parameters:
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)
- grad_x_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
x.The gradient (Jacobian) of the kernel function w.r.t.
x.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_x()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Jacobian \(\nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise gradient of the kernel function w.r.t.
y.The gradient (Jacobian) of the kernel function w.r.t.
y.Only accepts single vectors
xandy, i.e. not arrays.coreax.kernels.ScalarValuedKernel.grad_y()provides a vectorised version of this method for arrays.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\).
y – Vector \(\mathbf{y} \in \mathbb{R}^d\).
- Returns:
Jacobian \(\nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^d\)
- divergence_x_grad_y_elementwise(x, y)[source]¶
Evaluate the element-wise divergence w.r.t.
xof Jacobian w.r.t.y.\(\nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\). Only accepts vectors
xandy. A vectorised version for arrays is computed indivergence_x_grad_y().This is the trace of the ‘pseudo-Hessian’, i.e. the trace of the Jacobian matrix \(\nabla_\mathbf{x} \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y})\).
- Parameters:
x – First vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Second vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Trace of the Laplace-style operator; a real number
- class coreax.kernels.SteinKernel(base_kernel, score_function)[source]¶
Bases:
UniCompositeKernelDefine the Stein kernel, i.e. the application of the Stein operator.
\[\mathcal{A}_\mathbb{P}(g(\mathbf{x})) := \nabla_\mathbf{x} g(\mathbf{x}) + g(\mathbf{x}) \nabla_\mathbf{x} \log f_X(\mathbf{x})^\intercal\]w.r.t. probability measure \(\mathbb{P}\) to the base kernel \(k(\mathbf{x}, \mathbf{y})\). Here, differentiable vector-valued \(g: \mathbb{R}^d \to \mathbb{R}^d\), and \(\nabla_\mathbf{x} \log f_X(\mathbf{x})\) is the score function of measure \(\mathbb{P}\).
\(\mathbb{P}\) is assumed to admit a density function \(f_X\) w.r.t. d-dimensional Lebesgue measure. The score function is assumed to be Lipschitz.
The key property of a Stein operator is zero expectation under \(\mathbb{P}\), i.e. \(\mathbb{E}_\mathbb{P}[\mathcal{A}_\mathbb{P} f(\mathbf{x})]\), for positive differentiable \(f_X\).
The Stein kernel for base kernel \(k(\mathbf{x}, \mathbf{y})\) is defined as
\[k_\mathbb{P}(\mathbf{x}, \mathbf{y}) = \nabla_\mathbf{x} \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) + \nabla_\mathbf{x} \log f_X(\mathbf{x}) \cdot \nabla_\mathbf{y} k(\mathbf{x}, \mathbf{y}) + \nabla_\mathbf{y} \log f_X(\mathbf{y}) \cdot \nabla_\mathbf{x} k(\mathbf{x}, \mathbf{y}) + (\nabla_\mathbf{x} \log f_X(\mathbf{x}) \cdot \nabla_\mathbf{y} \log f_X(\mathbf{y})) k(\mathbf{x}, \mathbf{y}).\]This kernel requires a ‘base’ kernel to evaluate. The base kernel can be any other implemented subclass of the Kernel abstract base class; even another Stein kernel.
The score function \(\nabla_\mathbf{x} \log f_X: \mathbb{R}^d \to \mathbb{R}^d\) can be any suitable Lipschitz score function, e.g. one that is learned from score matching (
ScoreMatching), computed explicitly from a density function, or known analytically.- Parameters:
base_kernel (
ScalarValuedKernel) – Initialised kernel object with which to evaluate the Stein kernelscore_function (
Callable[[Shaped[Array, 'n d']],Shaped[Array, 'n d']]) – A vector-valued callable defining a score function \(\mathbb{R}^d \to \mathbb{R}^d\)
- compute_elementwise(x, y)[source]¶
Evaluate the kernel on individual input vectors
xandy, not-vectorised.Vectorisation only becomes relevant in terms of computational speed when we have multiple
xory.- Parameters:
x – Vector \(\mathbf{x} \in \mathbb{R}^d\)
y – Vector \(\mathbf{y} \in \mathbb{R}^d\)
- Returns:
Kernel evaluated at (
x,y)