Score Matching#

Classes and associated functionality to perform score matching.

The score function of some data is the derivative of the log-PDF. Score matching aims to determine a model by matching the score function of the model to that of the data. Exactly how the score function is modelled is specific to each child class of the abstract base class ScoreMatching.

An example use of score matching arises when trying to work with a SteinKernel, which requires as an input a score function. If this is known analytically, one can provide an exact score function. In other cases, approximations to the score function are required, which can be determined using ScoreMatching.

When using SlicedScoreMatching, the score function is approximated using a neural network, whereas in KernelDensityMatching, it is approximated by fitting and then differentiating a kernel density estimate to the data.

class coreax.score_matching.ScoreMatching[source]#

Base class for score matching algorithms.

The score function of some data is the derivative of the log-PDF. Score matching aims to determine a model by ‘matching’ the score function of the model to that of the data. Exactly how the score function is modelled is specific to each child class of this base class.

abstract match(x)[source]#

Match some model score function to that of a dataset x.

Parameters:

x (ArrayLike) – The \(n \times d\) data vectors

Return type:

Callable[[ArrayLike], Array]

class coreax.score_matching.SlicedScoreMatching(random_key, random_generator, noise_conditioning=True, use_analytic=False, num_random_vectors=1, learning_rate=0.001, num_epochs=10, batch_size=64, hidden_dims=(128, 128, 128), optimiser=<function adamw>, num_noise_models=100, sigma=1.0, gamma=0.95)[source]#

Implementation of slice score matching, defined in [ssm].

The score function of some data is the derivative of the log-PDF. Score matching aims to determine a model by ‘matching’ the score function of the model to that of the data. Exactly how the score function is modelled is specific to each child class of ScoreMatching.

With sliced score matching, we train a neural network to directly approximate the score function of the data. The approach is outlined in detail in [ssm].

Note

The inputs num_random_vectors and num_noise_models are set to 1 if they are given any smaller than this.

Parameters:
  • random_key (ArrayLike) – Key for random number generation

  • random_generator (Callable[[ArrayLike, Sequence[int], Union[str, type[Any], dtype, SupportsDType]], Array]) – Distribution sampler (key, shape, dtype) \(\rightarrow\) Array, e.g. distributions in random

  • noise_conditioning (bool) – Use the noise conditioning version of score matching. Defaults to True.

  • use_analytic (bool) – Use the analytic (reduced variance) objective or not. Defaults to False.

  • num_random_vectors (int) – The number of random vectors to use per data vector. Defaults to 1.

  • learning_rate (float) – Optimiser learning rate. Defaults to 1e-3.

  • num_epochs (int) – Number of epochs for training. Defaults to 10.

  • batch_size (int) – Size of mini-batch. Defaults to 64.

  • hidden_dims (Sequence[int]) – Sequence of ScoreNetwork hidden layer sizes. Defaults to [128, 128, 128] denoting 3 hidden layers each composed of 128 nodes.

  • optimiser (Callable[[float], GradientTransformation]) – The optax optimiser to use. Defaults to optax.adam.

  • num_noise_models (int) – Number of noise models to use in noise conditional score matching. Defaults to 100.

  • sigma (float) – Initial noise standard deviation for noise geometric progression in noise conditional score matching. Defaults to 1.

  • gamma (float) – Geometric progression ratio. Defaults to 0.95.

random_key: ArrayLike#
random_generator: Callable[[ArrayLike, Sequence[int], Union[str, type[Any], dtype, SupportsDType]], Array]#
noise_conditioning: bool#
use_analytic: bool#
num_random_vectors: int#
learning_rate: float#
num_epochs: int#
batch_size: int#
hidden_dims: Sequence[int]#
optimiser: Callable[[float], GradientTransformation]#
num_noise_models: int#
sigma: float#
gamma: float#
_objective_function(random_direction_vector, grad_score_times_random_direction_matrix, score_matrix)[source]#

Compute the score matching loss function.

Two objectives are proposed in [ssm], a general objective, and a simplification with reduced variance that holds for particular assumptions. The choice between the two is determined by the boolean use_analytic defined when the class is initiated.

Parameters:
  • random_direction_vector (ArrayLike) – \(d\)-dimensional random vector

  • grad_score_times_random_direction_matrix (ArrayLike) – Product of the gradient of score_matrix (w.r.t. x) and the random_direction_vector

  • score_matrix (ArrayLike) – Gradients of log-density

Returns:

Evaluation of score matching objective, see equations 7 and 8 in [ssm]

static _analytic_objective(random_direction_vector, grad_score_times_random_direction_matrix, score_matrix)[source]#

Compute reduced variance score matching loss function.

This is for use with certain random measures, e.g. normal and Rademacher. If this assumption is not true, then SlicedScoreMatching._general_objective() should be used instead.

Parameters:
  • random_direction_vector (Array) – \(d\)-dimensional random vector

  • grad_score_times_random_direction_matrix (Array) – Product of the gradient of score_matrix (w.r.t. x) and the random_direction_vector

  • score_matrix (Array) – Gradients of log-density

Return type:

Array

Returns:

Evaluation of score matching objective, see equation 8 in [ssm]

static _general_objective(random_direction_vector, grad_score_times_random_direction_matrix, score_matrix)[source]#

Compute general score matching loss function.

This is to be used when one cannot assume normal or Rademacher random measures when using score matching, but has higher variance than SlicedScoreMatching._analytic_objective() if these assumptions hold.

Parameters:
  • random_direction_vector (Array) – \(d\)-dimensional random vector

  • grad_score_times_random_direction_matrix (Array) – Product of the gradient of score_matrix (w.r.t. x) and the random_direction_vector

  • score_matrix (Array) – Gradients of log-density

Return type:

Array

Returns:

Evaluation of score matching objective, see equation 7 in [ssm]

_loss_element(x, v, score_network)[source]#

Compute element-wise loss function.

Computes the loss function from Section 3.2 of Song el al.’s paper on sliced score matching [ssm].

Parameters:
  • x (ArrayLike) – \(d\)-dimensional data vector

  • v (ArrayLike) – \(d\)-dimensional random vector

  • score_network (Callable) – Function that calls the neural network on x

Return type:

float

Returns:

Objective function output for single x and v inputs

_loss(score_network)[source]#

Compute vector mapped loss function for arbitrary many X and V vectors.

In the context of score matching, we expect to call the objective function on the data vector x, random vectors v and using the score neural network.

Parameters:

score_network (Callable) – Function that calls the neural network on x

Return type:

Callable

Returns:

Callable vectorised sliced score matching loss function

_train_step(state, x, random_vectors)[source]#

Apply a single training step that updates model parameters using loss gradient.

Parameters:
Return type:

tuple[TrainState, float]

Returns:

The updated TrainState object

_noise_conditional_loop_body(i, obj, state, params, x, random_vectors, sigmas)[source]#

Sum objective function with noise perturbations.

Inputs are perturbed by Gaussian random noise to improve performance of score matching. See [improved_sgm] for details.

Parameters:
  • i (int) – Loop index

  • obj (float) – Running objective, i.e. the current partial sum

  • state (TrainState) – The TrainState object

  • params (dict) – The current iterate parameter settings

  • x (Array) – The \(n \times d\) data vectors

  • random_vectors (ArrayLike) – The \(n \times m \times d\) random vectors

  • sigmas (Array) – The geometric progression of noise standard deviations

Return type:

float

Returns:

The updated objective, i.e. partial sum

_noise_conditional_train_step(state, x, random_vectors, sigmas)[source]#

Apply a single training step that updates model parameters using loss gradient.

Parameters:
  • state (TrainState) – The TrainState object

  • x (Array) – The \(n \times d\) data vectors

  • random_vectors (Array) – The \(n \times m \times d\) random vectors

  • sigmas (Array) – Array of noise standard deviations to use in objective function

Return type:

tuple[TrainState, float]

Returns:

The updated TrainState object

match(x)[source]#

Learn a sliced score matching function from Song et al.’s paper [ssm].

We currently use the ScoreNetwork neural network to approximate the score function. Alternative network architectures can be considered.

Parameters:

x (ArrayLike) – The \(n \times d\) data vectors

Return type:

Callable

Returns:

A function that applies the learned score function to input x

class coreax.score_matching.KernelDensityMatching(length_scale)[source]#

Implementation of a kernel density estimate to determine a score function.

The score function of some data is the derivative of the log-PDF. Score matching aims to determine a model by ‘matching’ the score function of the model to that of the data. Exactly how the score function is modelled is specific to each child class of this base class.

With kernel density matching, we approximate the underlying distribution function from a dataset using kernel density estimation, and then differentiate this to compute an estimate of the score function. A Gaussian kernel is used to construct the kernel density estimate.

Parameters:

length_scale (float) – Kernel length_scale to use when fitting the kernel density estimate

kernel: Kernel#
match(x)[source]#

Learn a score function using kernel density estimation to model a distribution.

For the kernel density matching approach, the score function is determined by fitting a kernel density estimate to samples from the underlying distribution and then differentiating this. Therefore, learning in this context refers to simply defining the score function and kernel density estimate given some samples we wish to evaluate the score function at, and the data used to build the kernel density estimate.

Parameters:

x (ArrayLike) – Set of \(n \times d\) samples from the underlying distribution that are used to build the kernel density estimate

Return type:

Callable[[ArrayLike], Array]

Returns:

A function that applies the learned score function to input x

coreax.score_matching.convert_stein_kernel(x, kernel, score_matching)[source]#

Convert the kernel to a SteinKernel.

Parameters:
  • x (ArrayLike) – The data used to call score_matching.match(x)

  • kernel (Kernel) – Kernel instance implementing a kernel function \(k: \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}\); if ‘kernel’ is a SteinKernel and score_matching is not None, a new instance of the kernel will be generated where the score function is given by score_matching.match(x)

  • score_matching (Optional[ScoreMatching]) – Specifies/overwrite the score function of the implied/passed SteinKernel; if None, default to KernelDensityMatching unless ‘kernel’ is a SteinKernel, in which case the kernel’s existing score function is used.

Return type:

SteinKernel

Returns:

The (potentially) converted/updated SteinKernel.