anscombe-transform Codec¶
This specification defines an array->array codec that encodes an input array using the Anscombe transform followed by an optional data type casting operations and decodes using the inverted type cast and the inverse Anscombe transform. This transformation is not generally lossless, but is useful as for conditioning data prior to compression.
Anscombe transform¶
The Anscombe transform is bijection from a Poisson-distributed variable to an approximately Gaussian-distributed variable with a variance of 1.
This transformation is useful in sensing applications to mitigate shot noise. Shot noise is typically modelled as a Poisson process. The variance of a Poisson-distributed signal scales with its mean. The Anscombe transform maps a Poisson-distributed signal to a Gaussian-distributed signal with a variance near 1. Decoupling the mean of the signal from its variance facilitates noise removal and data compression, the latter of which is the intended application of this codec.
Codec algorithm¶
Encoding¶
Parameters¶
In addition to the input array, the encoding procedure takes the following parameters:
| name | type |
|---|---|
conversion_gain |
positive real number |
zero_level |
real number |
beta |
positive real number |
encoded_dtype |
Zarr V3 data type |
Algorithm¶
For each element $x$ of the input array, an output value $y$ is generated via the following procedure:
-
$x$ is normalized by subtracting $\text{zero_level}$ and then dividing by $\text{conversion_gain}$. The result of this transformation, called $x_{\text{norm}}$, now represents a quantity of observed events.
Schematically:
$x_{\text{norm}} := \frac{x - \text{zero_level}}{\text{conversion_gain}}$
-
If $x_{\text{norm}}$ is non-negative, we apply the Anscombe transform, multiply by a scaling factor, and add an offset, and bind $\text{result}$ to the result. Schematically, the transformation is as follows:
$$ \text{result} := \frac{1}{\text{beta}} \left(\frac{\text{zero_level}}{\text{conversion_gain} \, * \sqrt{3/8}} + 2 \left( \sqrt{x_{\text{norm}} + \tfrac{3}{8}} - \sqrt{\tfrac{3}{8}} \right)\right) $$
The additional scaling and offset factors ensure that the transform maps the $\text{zero_level}$ value to $0$, and also that the transform is continuous around 0, because we will use linear extrapolation to resolve negative values of $x_{\text{norm}}$.
When $x_{\text{norm}}$ is negative, we bind $\text{result}$ to $x$ divided by the product of $\text{beta}$ , $\text{conversion_gain}$, and $\sqrt{3/8}$. This is effectively linear extrapolation from 0 in the negative direction. Schematically:
$$ \text{result} := \frac {x} {\text{beta} * \text{conversion_gain} *\sqrt{3/8} } $$
If encoded_dtype denotes an integer data type, then $\text{result}$ is rounded before the data type casting procedure.
Reference python function¶
The above procedure is implemented in the following reference Python function:
# /// script
# requires-python = ">=3.11"
# dependencies = ["zarr>=3.1.1", "numpy==2.2"]
# ///
import numpy as np
from zarr.core.dtype import ZDType
def anscombe_transform(x, conversion_gain: float, zero_level: float, beta: float, encoded_dtype: ZDType):
# Convert to event units
event_rate = (x - zero_level) / conversion_gain
zero_slope = 1.0 / (beta * np.sqrt(3.0 / 8.0))
offset = zero_level * zero_slope / conversion_gain
if event_rate < 0:
# Linear extrapolation
result = offset + event_rate * zero_slope
else:
# Anscombe transform
result = offset + (2.0 / beta) * (np.sqrt(event_rate + 3.0 / 8.0) - np.sqrt(3.0 / 8.0))
# When converting from a floating point to an integer data type,
# values should be rounded prior to type conversion
np_dtype = encoded_dtype.to_native_dtype()
if np_dtype.kind in {"i", "u"}:
return np.astype(np.round(result), np_dtype)
return np.astype(result, np_dtype)
Decoding¶
Algorithm¶
To decode Anscombe-transformed data, invert the encoding algorithm. Depending on the choice of output data type, the decoded data may not match exactly the input.
Parameters¶
In addition to the input array, the decoding procedure takes the following parameters:
| name | type |
|---|---|
conversion_gain |
positive real number |
zero_level |
real number |
beta |
positive real number |
decoded_dtype |
Zarr V3 data type |
These are the same as the parameters used for the encoding procedure minus the encoded_dtype; the decoded_dtype is required instead.
Codec metadata¶
| field | type | required | notes |
|---|---|---|---|
name |
literal "anscombe-transform" |
yes | |
configuration |
anscombe transform configuration | yes |
Configuration metadata¶
| field | type | required | notes |
|---|---|---|---|
zero_level |
number | yes | The value in the input array that corresponds to 0 detected events. |
beta |
positive number | yes | Ratio of quantization step to noise. Typical values are between 0.5 and 2. |
conversion_gain |
positive number | yes | The magnitude of a single recorded event in the input data |
decoded_dtype |
Zarr V3 data type metadata | yes | The Zarr data type of the input array. |
encoded_dtype |
Zarr V3 data type metadata | yes | The Zarr data type of the output array. |
Supported array data types¶
This codec is compatible with array data types that model real numbers or a subset thereof.