deepdisc.data_format.file_io

Attributes

logger

Classes

DDLoader

A base deepdisc data loader class

NpEncoder

Extensible JSON <https://json.org> encoder for Python data structures.

Functions

get_data_from_json(filename)

Open a JSON text file, and return encoded data as dictionary.

convert_to_json(dict_list, output_file[, allow_cached])

Converts dataset into COCO format and saves it to a json file.

Module Contents

logger[source]
class DDLoader[source]

A base deepdisc data loader class

filedict = None[source]
dataset = None[source]
get_dataset()[source]

retrieves the list of dataset_dicts if established.

generate_filedict(dirpath, filters, img_files, mask_files, subdirs=False, filt_loc=0, n_samples=None)[source]

Generates a path dictionary from a directory of files.

Parameters:
  • dirpath (str, path-like) – The path to the data directory.

  • filters (list) – A list of filters available in the dataset. The filter names should match some string identifier in the name itself. E.g. img_r.fits will be matched to a filter with label “r”

  • img_files (str) – The name of the image files to collect, should have a “*” to collect all image files in the dataset. E.g. 001_img.fits can be caught used img_files = “*_img.fits”

  • maskfiles (str) – The name of the mask files to collect, should have a “*” to collect all mask files in the dataset. E.g. 001_mask.fits can be caught used mask_files = “*_mask.fits”

  • subdirs (bool) – Indicates whether the data is stored within subdirectories within the dirpath. If True, will recursively search for files.

  • filt_loc (int) – The integer location of the filter within the image name, used to split files across filters accordingly. E.g. 001_img_r.fits would have a filt_loc of 8 (or -6).

  • n_samples (int) – If specified, filters down to a subset of the dataset that contains n_samples image files per filter.

Returns:

self – A DataLoader with a filename dictionary generated in DataLoader.filedict

Return type:

DataLoader

generate_dataset_dict(func=None, filedict=None, filters=True, **kwargs)[source]

Generates a list of dictionaries using a user-defined annotation generator function on each image file/mask. The format is determined by the user defined function

Parameters:
  • func (function) – A user-defined function that operates on a set of images and a mask file to generate a dictionary of annotations. The DataLoader expects this function to take in kwargs as follows (image_files, mask_file, **kwargs), where image files is a list of paths to image filenames (each image corresponds to one band) and mask_file points to a single mask filename.

  • filedict (dict) – A dictionary with image and mask filepaths defined, generated by DataLoader.generate_filedict. If not specified, attempts to use a filedict stored within the DataLoader instance.

  • filters (bool) – Determines whether the list of filters is passed along to the annotation function. If true is passed along as (images, mask, index, filters, other kwargs).

Returns:

self – A DataLoader with a dataset dictionary generated. Access using DataLoader.get_dataset().

Return type:

DataLoader

load_coco_json_file(file)[source]

Open a JSON text file, and return encoded data as dictionary.

Assumes JSON data is in the COCO format.

Parameters:

file (str) – pointer to file

Return type:

dictionary of encoded data

_verify_input_file_count(filenames_dict)[source]

Make sure that there are the same number of images for each filter

random_sample(outdir, filedict=None, sets=['train', 'test'], nfiles=[3, 1])[source]

Generates randomly sampled subsets of the data, assuming the scarlet output exists

Parameters:
  • outdir (str) – Base output directory

  • filedict (dict) – Dictionary of files to be sampled

  • sets (list[str]) – Name of subsets

  • nfiles – How many files go in each subset

get_data_from_json(filename)[source]

Open a JSON text file, and return encoded data as dictionary.

Parameters:

filename (str) – The name of the file to load.

Return type:

dictionary of encoded data

Raises:

FileNotFoundError if the file cannot be found.

class NpEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: json.JSONEncoder

Extensible JSON <https://json.org> encoder for Python data structures.

Supports the following objects and types by default:

Python

JSON

dict

object

list, tuple

array

str

string

int, float

number

True

true

False

false

None

null

To extend this to recognize other objects, subclass and implement a .default() method with another method that returns a serializable object for o if possible, otherwise it should call the superclass implementation (to raise TypeError).

default(obj)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
convert_to_json(dict_list, output_file, allow_cached=True)[source]

Converts dataset into COCO format and saves it to a json file. dataset_name must be registered in DatasetCatalog and in detectron2’s standard format.

Parameters:
  • dict_list – list of metadata dictionaries

  • output_file – path of json file that will be saved to

  • allow_cached – if json file is already present then skip conversion