David Map Reduce Weighted

Example coreset generation using an image of the statue of David.

This example showcases how a coreset can be generated from image data. In this context, a coreset is a set of pixels that best capture the information in the original image.

The coreset is generated using scalable Stein kernel herding, with a PCIMQ base kernel. The score function (gradient of the log-density function) for the Stein kernel is estimated by applying kernel density estimation (KDE) to the data, and then taking gradients.

The initial coreset generated from this procedure is then weighted, with weights determined such that the weighted coreset achieves a better maximum mean discrepancy when compared to the original dataset than the unweighted coreset.

To reduce computational requirements, a map reduce approach is used, splitting the original dataset into distinct segments, with each segment handled on a different process.

The coreset attained from Stein kernel herding is compared to a coreset generated via uniform random sampling. Coreset quality is measured using maximum mean discrepancy (MMD).

examples.david_map_reduce_weighted.main(in_path=PosixPath('../examples/data/david_orig.png'), out_path=None, downsampling_factor=1)[source]

Run the ‘david’ example for image sampling.

Take an image of the statue of David and then generate a coreset using scalable Stein kernel herding.

The initial coreset generated from this procedure is then weighted, with weights determined such that the weighted coreset achieves a better maximum mean discrepancy when compared to the original dataset than the unweighted coreset.

To reduce computational requirements, a map reduce approach is used, splitting the original dataset into distinct segments, with each segment handled on a different process.

Compare the result from this to a coreset generated via uniform random sampling. Coreset quality is measured using maximum mean discrepancy (MMD).

Parameters:
  • in_path (Path) – Path to input image, assumed relative to this module file unless an absolute path is given

  • out_path (Optional[Path]) – Path to save output to, if not None, assumed relative to this module file unless an absolute path is given

  • downsampling_factor (int) – the window size to average (downsample) the images over.

Return type:

tuple[float, float]

Returns:

Coreset MMD, random sample MMD