deepdisc.data_format.file_io

Attributes

logger

Classes

`DDLoader`	A base deepdisc data loader class
`NpEncoder`	Extensible JSON <https://json.org> encoder for Python data structures.

Functions

`get_data_from_json`(filename)	Open a JSON text file, and return encoded data as dictionary.
`convert_to_json`(dict_list, output_file[, allow_cached])	Converts dataset into COCO format and saves it to a json file.

Module Contents

logger[source]

class DDLoader[source]

A base deepdisc data loader class

filedict = None[source]

dataset = None[source]

get_dataset()[source]: retrieves the list of dataset_dicts if established.

generate_filedict(dirpath, filters, img_files, mask_files, subdirs=False, filt_loc=0, n_samples=None)[source]

Generates a path dictionary from a directory of files.

Parameters:

dirpath (str, path-like) – The path to the data directory.
filters (list) – A list of filters available in the dataset. The filter names should match some string identifier in the name itself. E.g. img_r.fits will be matched to a filter with label “r”
img_files (str) – The name of the image files to collect, should have a “*” to collect all image files in the dataset. E.g. 001_img.fits can be caught used img_files = “*_img.fits”
maskfiles (str) – The name of the mask files to collect, should have a “*” to collect all mask files in the dataset. E.g. 001_mask.fits can be caught used mask_files = “*_mask.fits”
subdirs (bool) – Indicates whether the data is stored within subdirectories within the dirpath. If True, will recursively search for files.
filt_loc (int) – The integer location of the filter within the image name, used to split files across filters accordingly. E.g. 001_img_r.fits would have a filt_loc of 8 (or -6).
n_samples (int) – If specified, filters down to a subset of the dataset that contains n_samples image files per filter.

Returns:

self – A DataLoader with a filename dictionary generated in DataLoader.filedict

Return type:

DataLoader

generate_dataset_dict(func=None, filedict=None, filters=True, **kwargs)[source]

Generates a list of dictionaries using a user-defined annotation generator function on each image file/mask. The format is determined by the user defined function

Parameters:

func (function) – A user-defined function that operates on a set of images and a mask file to generate a dictionary of annotations. The DataLoader expects this function to take in kwargs as follows (image_files, mask_file, **kwargs), where image files is a list of paths to image filenames (each image corresponds to one band) and mask_file points to a single mask filename.
filedict (dict) – A dictionary with image and mask filepaths defined, generated by DataLoader.generate_filedict. If not specified, attempts to use a filedict stored within the DataLoader instance.
filters (bool) – Determines whether the list of filters is passed along to the annotation function. If true is passed along as (images, mask, index, filters, other kwargs).

Returns:

self – A DataLoader with a dataset dictionary generated. Access using DataLoader.get_dataset().

Return type:

DataLoader

load_coco_json_file(file)[source]

Open a JSON text file, and return encoded data as dictionary.

Assumes JSON data is in the COCO format.

Parameters:: file (str) – pointer to file
Return type:: dictionary of encoded data

_verify_input_file_count(filenames_dict)[source]: Make sure that there are the same number of images for each filter

random_sample(outdir, filedict=None, sets=['train', 'test'], nfiles=[3, 1])[source]

Generates randomly sampled subsets of the data, assuming the scarlet output exists

Parameters:

outdir (str) – Base output directory
filedict (dict) – Dictionary of files to be sampled
sets (list[str]) – Name of subsets
nfiles – How many files go in each subset

get_data_from_json(filename)[source]

Open a JSON text file, and return encoded data as dictionary.

Parameters:: filename (str) – The name of the file to load.
Return type:: dictionary of encoded data
Raises:: FileNotFoundError if the file cannot be found. –

class NpEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: json.JSONEncoder

Extensible JSON <https://json.org> encoder for Python data structures.

Supports the following objects and types by default:

Python	JSON
dict	object
list, tuple	array
str	string
int, float	number
True	true
False	false
None	null

To extend this to recognize other objects, subclass and implement a .default() method with another method that returns a serializable object for o if possible, otherwise it should call the superclass implementation (to raise TypeError).

default(obj)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)

convert_to_json(dict_list, output_file, allow_cached=True)[source]

Converts dataset into COCO format and saves it to a json file. dataset_name must be registered in DatasetCatalog and in detectron2’s standard format.

Parameters:

dict_list – list of metadata dictionaries
output_file – path of json file that will be saved to
allow_cached – if json file is already present then skip conversion