Datasets#

`build_pipeline`	Build an albumentations pipeline for image augmentation.
`build_dataset`	Build a dataset.
`ImageFolder`	Dataset in the ImageFolder style.
`image_folder_collate_fn`	Collate function for `saliency_metrics.datasets.image_folder.ImageFolder`.

Dataset Classes#

ImageFolder#

class saliency_metrics.datasets.ImageFolder(img_root, pipeline, smap_root=None, smap_extension='.png', cls_to_ind_file=None)[source]#

Dataset in the ImageFolder style.

Compared to the torchvision.datasets.ImageFolder, this class can load an image and its corresponding saliency map (abbreviated as “smap”) simultaneously. It is assumed that the dataset folder has the following hierarchy:

# images
root/images/dog/dog_0.jpg
root/images/dog/dog_1.jpg
...

root/images/cat/cat_0.jpg
root/images/cat/cat_1.jpg
...

# saliency maps
root/smaps/dog/dog_0.png
root/smaps/dog/dog_1.png
...

root/smaps/cat/cat_0.png
root/smaps/cat/cat_1.png
...

Note

An image and its corresponding saliency map must have the same spatial size. Please pre-process the images and saliency maps in advance.
The file names (without extensions) of an image and its corresponding saliency map must be consistent, e.g. "dog_0.jpg" and "dog_0.png".

Each sample is a dict containing following fields:

"img": (Union[torch.Tensor, numpy.ndarray]) Transformed image. The image is converted to torch.Tensor with shape (num_channels, height, width) if ToTensorV2 (or ToTensor) is in the transform pipeline. Otherwise, it is a numpy.ndarray with shape (height, width, num_channels).
"smap": (numpy.ndarray) Saliency map with shape (height, width). This field exists only when smap_root is not None.
"target": (int) Ground truth label.
"meta": (dict) A dictionary containing meta information like image path (with key "img_path") and original size (with key "ori_size") of the image.

Parameters

img_root (str) – Root of the image folders.
pipeline (List[Dict]) – Config of transform pipeline.
smap_root (Optional[str]) – Root of the saliency map folders. If None, no saliency maps will be loaded.
smap_extension (Optional[str]) – File extension of the saliency maps. This argument only has influence when smap_root is not None. If smap_extension is None, then the extension of the images will be used, this assumes that all the images have the same extension.
cls_to_ind_file (Optional[str]) – Path of a file (json, yaml etc.) that can be de-serialized to a dictionary, which maps class names to indices. If None, the class names (folder names under img_root) will be sorted and mapped to the sorted indices. For example, ["a", "b"] will be mapped to [0, 1], respectively.

Examples

from saliency_metrics.datasets import build_dataset

pipeline = [
    dict(type="Resize", height=5, width=5),
    dict(type="Normalize", mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
    dict(type="ToTensorV2",)
]

cfg = dict(
    type="ImageFolder",
    img_root="path/to/data/images/",
    pipeline=pipeline,
    smap_root="path/to/data/smaps/",
    smap_extension=".png",
    cls_to_ind_file="path/to/data/cls_to_ind_file.json",
)

dataset = build_dataset(cfg)
assert isinstance(dataset, ImageFolder)

get_cls_to_ind()[source]#

Get the dictionary mapping class names to indices.

Return type: Dict[str, int]
Returns: A dict mapping class names to indices.

get_ind_to_cls()[source]#

Get the dictionary mapping indices to class names.

Return type: Dict[int, str]
Returns: A dict mapping indices to class names.

Functions#

build_dataset#

saliency_metrics.datasets.build_dataset(cfg, default_args=None)[source]#

Build a dataset.

Parameters

cfg (Dict) – A config dict. It should at least contain the field “type”, which is the registered name of the dataset.
default_args (Optional[Dict]) – Other default arguments.

Return type

Dataset

Returns

An instance of torch.utils.data.Dataset.

build_pipeline#

saliency_metrics.datasets.build_pipeline(cfg, default_args=None)[source]#

Build an albumentations pipeline for image augmentation.

import numpy as np
import albumentations as A
from saliency_metrics.datasets import build_pipeline

img = np.random.randint(0, 225, (250, 250), dtype=np.uint8)
bboxes = [[10, 100, 10, 100]]
labels = [0]

# build single augmentation
cfg_1 = dict(type="GaussianBlur", blur_limit=(3, 7), p=0.5)
pipeline_1 = build_pipeline(cfg_1)
img_1 = pipeline_1(image=img)["image"]

# build multiple augmentations and perform transformation for bounding boxes
cfg_2 = [
    dict(type="RandomCrop", height=200, width=200),
    dict(type="Resize", height=224, width=224),
    dict(type="ToTensorV2")
]
default_args = dict(bbox_params=A.BboxParams(format="pascal_voc", label_fields=["labels"]))
pipeline_2 = build_pipeline(cfg_2, default_args=default_args)
output_2 = pipeline_2(image=img, bboxes=bboxes, labels=labels)
img_2, bboxes_2, labels_2 = output_2["image"], output_2["bboxes"], output_2["labels"]

Parameters

cfg (Union[Dict, List]) – Config dictionary. If cfg is a dict, then the function returns a single albumentations augmentation. If cfg is a list of dict, then the function first builds each augmentation respectively, and then compose them into an albumentations.Compose.
default_args (Optional[Dict]) – Other default arguments.

Return type: Union[object, Compose]
Returns: A single albumentations augmentation or albumentations.Compose.

image_folder_collate_fn#

saliency_metrics.datasets.image_folder_collate_fn(batch, smap_as_tensor=True)[source]#

Collate function for saliency_metrics.datasets.image_folder.ImageFolder.

The collated batch is a dict that contains:

"img": (Tensor) images with shape (batch_size, num_channels, height, width).
"target": (Tensor) targets with shape (batch_size,).
"smap": (Optional[Union[Tensor, ndarray]]) saliency maps with shape (batch_size, height, width).
"meta": A dict that contains:
- "img_path": (List[str]) image paths with the length of batch_size.
- "ori_size": (List[Tuple[int, int]]) list of original spatial sizes of images.

Parameters

batch (List[Dict]) – A batch of data with length of batch_size.
smap_as_tensor (bool) – If True, batch the saliency maps to a torch.Tensor. Otherwise, batch them to a numpy.ndarray.

Return type

Dict

Returns

Collated batch.