Coresets

Module for defining coreset data structures.

class coreax.coreset.AbstractCoreset[source]

Bases: Module, Generic[_TPointsData_co, _TOriginalData_co]

Abstract base class for coresets.

A coreset is a reduced set of \(\hat{n}\) (potentially weighted) data points, \(\hat{X} := \{(\hat{x}_i, \hat{w}_i)\}_{i=1}^\hat{n}\) that, in some sense, best represent the “important” properties of a larger set of \(n > \hat{n}\) (potentially weighted) data points \(X := \{(x_i, w_i)\}_{i=1}^n\).

\(\hat{x}_i, x_i \in \Omega\) represent the data points/nodes and \(\hat{w}_i, w_i \in \mathbb{R}\) represent the associated weights.

abstract property points: _TPointsData_co

The coreset points.

abstract property pre_coreset_data: _TOriginalData_co

The original data that this coreset is based on.

abstract property nodes: Data

Deprecated alias for indices or points, depending on subclass.

abstract solve_weights(solver, **solver_kwargs)[source]

Return a copy of ‘self’ with weights solved by ‘solver’.

Return type:

Self

Parameters:

solver (WeightsOptimiser[Data])

compute_metric(metric, **metric_kwargs)[source]

Return metric-distance between self.pre_coreset_data and self.coreset.

Return type:

Shaped[Array, '']

Parameters:

metric (Metric[Data])

property coreset: _TPointsData_co

Deprecated alias for .points.

class coreax.coreset.PseudoCoreset(nodes, pre_coreset_data)[source]

Bases: AbstractCoreset[Data, _TOriginalData_co], Generic[_TOriginalData_co]

Data structure for representing a pseudo-coreset.

The points of a pseudo-coreset are not necessarily points in the original dataset.

Parameters:
  • nodes (Data) – The (weighted) coreset nodes, \(I\); these can be accessed via Coresubset.points().

  • pre_coreset_data (Any) – The dataset \(X\) used to construct the coreset.

classmethod build(nodes, pre_coreset_data)[source]

Construct a PseudoCoreset from Data or raw Arrays.

Parameters:
Return type:

Union[PseudoCoreset[Data], PseudoCoreset[SupervisedData], PseudoCoreset[Any]]

property points: Data

Materialised coreset.

property pre_coreset_data

The original data that this coreset is based on.

property nodes: Data

Deprecated alias for points.

solve_weights(solver, **solver_kwargs)[source]

Return a copy of ‘self’ with weights solved by ‘solver’.

Return type:

Self

Parameters:

solver (WeightsOptimiser[Data])

class coreax.coreset.Coreset(nodes, pre_coreset_data)[source]

Bases: PseudoCoreset

Deprecated - split into AbstractCoreset and PseudoCoreset.

Parameters:
  • nodes (Data)

  • pre_coreset_data (_TOriginalData_co)

class coreax.coreset.Coresubset(indices, pre_coreset_data)[source]

Bases: AbstractCoreset[_TOriginalData_co, _TOriginalData_co], Generic[_TOriginalData_co]

Data structure for representing a coresubset.

A coresubset is a Coreset, with the additional condition that the coreset data points/nodes must be a subset of the original data points/nodes, such that

\[\hat{x}_i = x_i, \forall i \in I, I \subset \{1, \dots, n\}, \text{card}(I) = \hat{n}.\]

Thus, a coresubset, unlike a coreset, ensures that feasibility constraints on the support of the measure are maintained [litterer2012recombination].

In coresubsets, the dataset reduction can be implicit (setting weights/nodes to zero for all \(i \notin I\)) or explicit (removing entries from the weight/node arrays). The implicit approach is useful when input/output array shape stability is required (E.G. for some JAX transformations); the explicit approach is more similar to a standard coreset.

Parameters:
  • indices (Data) – The (weighted) coresubset node indices, \(I\); the materialised coresubset nodes should only be accessed via Coresubset.points().

  • pre_coreset_data (Any) – The dataset \(X\) used to construct the coreset.

classmethod build(indices, pre_coreset_data)[source]

Construct a Coresubset from Data or raw Arrays.

Parameters:
Return type:

Union[Coresubset[Data], Coresubset[SupervisedData], Coresubset[Any]]

property points: _TOriginalData_co

Materialise the coresubset from the indices and original data.

property unweighted_indices: Shaped[Array, 'n']

Unweighted Coresubset indices - attribute access helper.

property pre_coreset_data

The original data that this coreset is based on.

property indices: Data

The (possibly weighted) Coresubset indices.

property nodes: Data

Deprecated alias for indices.

solve_weights(solver, **solver_kwargs)[source]

Return a copy of ‘self’ with weights solved by ‘solver’.

Return type:

Self

Parameters:

solver (WeightsOptimiser[Data])