Coreax (v0.1.0)#

Coreax is a library for coreset algorithms, written in JAX for fast execution and GPU support.

Setup#

Before installing Coreax, make sure JAX is installed. Be sure to install the preferred version of JAX for your system.

  1. Install JAX noting that there are (currently) different setup paths for CPU and GPU use.

  2. Install Coreax:

$ python3 -m pip install coreax
  1. Optionally, install additional dependencies required to run the examples:

$ python3 -m pip install coreax[test]

Should the installation fail, try again using stable pinned package versions. Note that these versions may be rather outdated, although we endeavour to avoid versions with known vulnerabilities. To install Coreax:

$ python3 -m pip install --no-dependencies -r requirements.txt

To run the examples, use requirements-test.txt instead.

Contents#

Bibliography#

[liu2016kernelized]

Qiang Liu, Jason D. Lee, and Michael I. Jordan. A kernelized stein discrepancy for goodness-of-fit tests and model evaluation. 2016. arXiv:1602.03253.

[ssm]

Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: a scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence, 574–584. PMLR, 2020.

[chatalic2022nystrom]

Antoine Chatalic, Nicolas Schreuder, Alessandro Rudi, and Lorenzo Rosasco. Nyström Kernel Mean Embeddings. 2022. arXiv:2201.13055.

[litterer2012recombination]

C. Litterer and T. Lyons. High order recombination and an application to cubature on wiener space. The Annals of Applied Probability, August 2012. URL: http://dx.doi.org/10.1214/11-AAP786, doi:10.1214/11-aap786.

[halko2009randomness]

Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp. Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. 2009. arXiv:0909.4061.

[garreau2018median]

Damien Garreau, Wittawat Jitkrittum, and Motonobu Kanagawa. Large sample analysis of the median heuristic. 2018. arXiv:1707.07269.

[improved_sgm]

Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. In Advances in Neural Information Processing Systems, volume 33, 12438–12448. 2020.

[chen2012herding]

Yutian Chen, Max Welling, and Alex Smola. Super-samples from kernel herding. 2012. arXiv:1203.3472.

[benard2023kernel]

Clement Benard, Brian Staber, and Sébastien Da Veiga. Kernel Stein discrepancy thinning: a theoretical perspective of pathologies and a practical fix with regularization. In Thirty-seventh Conference on Neural Information Processing Systems. 2023. URL: https://openreview.net/forum?id=TjgG4UT62W.

[huszar2016optimally]

Ferenc Huszar and David Duvenaud. Optimally-Weighted Herding is Bayesian Quadrature. 2016. arXiv:1204.1664.

Release Cycle#

We anticipate two release types: feature releases and security releases. Security releases will be issued as needed in accordance with the security policy. Feature releases will be issued as appropriate, dependent on the feature pipeline and development priorities.

Coming Soon#

Some features coming soon include:

  • Coordinate bootstrapping for high-dimensional data.

  • Other coreset-style algorithms, including kernel thinning and recombination, as means to reducing a large dataset whilst maintaining properties of the underlying distribution.

Indices and Tables#