Download decorators

Single files

Package datamaestro.download.single

datamaestro.download.single.filedownloader(filename: str, url: str, size: int = None, transforms=None, checker=None)

Base class for all download handlers

datamaestro.download.single.concatdownload(filename: str, url: str, transforms=None)

Concatenate all files in an archive

Archives

Package datamaestro.download.archive

The different archive download methods associated with the archive a Path. They allow to filter the archives with the files argument and subpath (to only include a sub-folder of the archive)

datamaestro.download.archive.zipdownloader(varname, url: str, subpath: str = None, checker: FileChecker = None, files: Set[str] = None)

ZIP Archive handler

datamaestro.download.archive.tardownloader(varname, url: str, subpath: str = None, checker: FileChecker = None, files: Set[str] = None)

TAR archive handler

Syncing

Package datamaestro.download.sync

datamaestro.download.sync.gsync(varname: str, url: str)

Google sync call

Utility functions

File hashes can be checked with the following checker

class datamaestro.utils.FileChecker
class datamaestro.utils.HashCheck(hashstr: str, hasherfn=<built-in function openssl_md5>)
__init__(hashstr: str, hasherfn=<built-in function openssl_md5>)

Check a file against a hash

Parameters:
  • hashstr – The HASH value

  • hasherfn – The hash computer, defaults to hashlib.md5