This notebook demonstrates training data generation for a 2D denoising task, where corresponding pairs of low and high quality images can be acquired.
The high SNR images are acquistions of Human U2OS cells taken from the Broad Bioimage Benchmark Collection and the low SNR images were created by synthetically adding strong read-out and shot-noise (and additionally applying pixel binning of 2x2) thus mimicking acquisitions at a very low light level.
Each image pair should be registered, which in a real application setting is best achieved by acquiring both images interleaved, i.e. as different channels that correspond to the different exposure/laser settings. Since the image pairs were synthetically created in this example, they are already perfectly aligned.
More documentation is available at http://csbdeep.bioimagecomputing.com/doc/.
from __future__ import print_function, unicode_literals, absolute_import, division
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
from tifffile import imread
from csbdeep.utils import download_and_extract_zip_file, plot_some
from csbdeep.data import RawData, create_patches
The example data consists of low-SNR and high-SNR 2D images of human U2OS cells.
Note that GT
stands for ground truth and represents high signal-to-noise ratio (SNR) stacks.
download_and_extract_zip_file (
url = 'http://csbdeep.bioimagecomputing.com/example_data/snr_7_binning_2.zip',
targetdir = 'data',
verbose = 1,
)
Files missing, downloading... extracting... done.
The data set is already split into a train and test set, each containing low SNR ("low") and corresponding high SNR ("GT") images.
We can plot some training images:
y = imread('data/train/GT/img_0010.tif')
x = imread('data/train/low/img_0010.tif')
print('image size =', x.shape)
plt.figure(figsize=(13,5))
plt.subplot(1,2,1)
plt.imshow(x, cmap ="magma")
plt.colorbar()
plt.title("low")
plt.subplot(1,2,2)
plt.imshow(y, cmap ="magma")
plt.colorbar()
plt.title("high");
image size = (256, 256)
We first need to create a RawData
object, which defines how to get the pairs of low/high SNR images and the semantics of each axis (e.g. which one is considered a color channel, etc.).
Here we have two folders "low" and "GT", where corresponding low and high-SNR TIFF images have identical filenames.
For this case, we can simply use RawData.from_folder
and set axes = 'YX'
to indicate the semantic order of the image axes (i.e. we have typical 2 dimensional images).
raw_data = RawData.from_folder (
basepath = 'data/train',
source_dirs = ['low'],
target_dir = 'GT',
axes = 'YX',
)
From corresponding images, the function create_patches
will now generate lots of paired patches that will be used for training the CARE model later.
create_patches
returns values (X, Y, XY_axes)
.
By convention, the variable name X
(or x
) refers to an input variable for a machine learning model, whereas Y
(or y
) indicates an output variable.
As a general rule, use a patch size that is a power of two along all axes, or which is at least divisible by 8. For this example we will use patches of size 128x128.
An important aspect is data normalization, i.e. the rescaling of corresponding patches to a dynamic range of ~ (0,1). By default, this is automatically provided via percentile normalization, which can be adapted if needed.
By default, patches are sampled from non-background regions i.e. that are above a relative threshold that can be given in the function below. We will disable this for this dataset as most image regions already contain foreground pixels and thus set the threshold to 0.
from csbdeep.data import no_background_patches, norm_percentiles, sample_percentiles
X, Y, XY_axes = create_patches (
raw_data = raw_data,
patch_size = (128,128),
patch_filter = no_background_patches(0),
n_patches_per_image = 2,
save_file = 'data/my_training_data.npz',
)
================================================================== 2457 raw images x 1 transformations = 2457 images 2457 images x 2 patches per image = 4914 patches in total ================================================================== Input data: data/train: target='GT', sources=['low'], axes='YX', pattern='*.tif*' ================================================================== Transformations: 1 x Identity ================================================================== Patch size: 128 x 128 ==================================================================
100%|██████████| 2457/2457 [00:09<00:00, 249.16it/s]
Saving data to data/my_training_data.npz.
assert X.shape == Y.shape
print("shape of X,Y =", X.shape)
print("axes of X,Y =", XY_axes)
shape of X,Y = (4914, 1, 128, 128) axes of X,Y = SCYX
This shows some of the generated patch pairs (even rows: input, odd rows: target)
for i in range(2):
plt.figure(figsize=(16,4))
sl = slice(8*i, 8*(i+1)), 0
plot_some(X[sl],Y[sl],title_list=[np.arange(sl[0].start,sl[0].stop)])
plt.show()
None;