deepdisc.data_format.file_io
Attributes
Classes
A base deepdisc data loader class |
|
Extensible JSON <https://json.org> encoder for Python data structures. |
Functions
|
Open a JSON text file, and return encoded data as dictionary. |
|
Converts dataset into COCO format and saves it to a json file. |
Module Contents
- class DDLoader[source]
A base deepdisc data loader class
- generate_filedict(dirpath, filters, img_files, mask_files, subdirs=False, filt_loc=0, n_samples=None)[source]
Generates a path dictionary from a directory of files.
- Parameters:
dirpath (str, path-like) – The path to the data directory.
filters (list) – A list of filters available in the dataset. The filter names should match some string identifier in the name itself. E.g. img_r.fits will be matched to a filter with label “r”
img_files (str) – The name of the image files to collect, should have a “*” to collect all image files in the dataset. E.g. 001_img.fits can be caught used img_files = “*_img.fits”
maskfiles (str) – The name of the mask files to collect, should have a “*” to collect all mask files in the dataset. E.g. 001_mask.fits can be caught used mask_files = “*_mask.fits”
subdirs (bool) – Indicates whether the data is stored within subdirectories within the dirpath. If True, will recursively search for files.
filt_loc (int) – The integer location of the filter within the image name, used to split files across filters accordingly. E.g. 001_img_r.fits would have a filt_loc of 8 (or -6).
n_samples (int) – If specified, filters down to a subset of the dataset that contains n_samples image files per filter.
- Returns:
self – A DataLoader with a filename dictionary generated in DataLoader.filedict
- Return type:
DataLoader
- generate_dataset_dict(func=None, filedict=None, filters=True, **kwargs)[source]
Generates a list of dictionaries using a user-defined annotation generator function on each image file/mask. The format is determined by the user defined function
- Parameters:
func (function) – A user-defined function that operates on a set of images and a mask file to generate a dictionary of annotations. The DataLoader expects this function to take in kwargs as follows (image_files, mask_file, **kwargs), where image files is a list of paths to image filenames (each image corresponds to one band) and mask_file points to a single mask filename.
filedict (dict) – A dictionary with image and mask filepaths defined, generated by DataLoader.generate_filedict. If not specified, attempts to use a filedict stored within the DataLoader instance.
filters (bool) – Determines whether the list of filters is passed along to the annotation function. If true is passed along as (images, mask, index, filters, other kwargs).
- Returns:
self – A DataLoader with a dataset dictionary generated. Access using DataLoader.get_dataset().
- Return type:
DataLoader
- load_coco_json_file(file)[source]
Open a JSON text file, and return encoded data as dictionary.
Assumes JSON data is in the COCO format.
- Parameters:
file (str) – pointer to file
- Return type:
dictionary of encoded data
- _verify_input_file_count(filenames_dict)[source]
Make sure that there are the same number of images for each filter
- random_sample(outdir, filedict=None, sets=['train', 'test'], nfiles=[3, 1])[source]
Generates randomly sampled subsets of the data, assuming the scarlet output exists
- Parameters:
outdir (str) – Base output directory
filedict (dict) – Dictionary of files to be sampled
sets (list[str]) – Name of subsets
nfiles – How many files go in each subset
- get_data_from_json(filename)[source]
Open a JSON text file, and return encoded data as dictionary.
- Parameters:
filename (str) – The name of the file to load.
- Return type:
dictionary of encoded data
- Raises:
FileNotFoundError if the file cannot be found. –
- class NpEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]
Bases:
json.JSONEncoderExtensible JSON <https://json.org> encoder for Python data structures.
Supports the following objects and types by default:
Python
JSON
dict
object
list, tuple
array
str
string
int, float
number
True
true
False
false
None
null
To extend this to recognize other objects, subclass and implement a
.default()method with another method that returns a serializable object foroif possible, otherwise it should call the superclass implementation (to raiseTypeError).- default(obj)[source]
Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- convert_to_json(dict_list, output_file, allow_cached=True)[source]
Converts dataset into COCO format and saves it to a json file. dataset_name must be registered in DatasetCatalog and in detectron2’s standard format.
- Parameters:
dict_list – list of metadata dictionaries
output_file – path of json file that will be saved to
allow_cached – if json file is already present then skip conversion