Skip to main content

download

superduper.misc.download

Source code

download_content​

download_content(db,
query: Union[superduper.backends.base.query.Query,
Dict],
ids: Optional[Sequence[str]] = None,
documents: Optional[List[superduper.base.document.Document]] = None,
raises: bool = True,
n_workers: Optional[int] = None) -> Optional[Sequence[superduper.base.document.Document]]
ParameterDescription
dbdatabase instance
queryquery to be executed
idsids to be downloaded
documentsdocuments to be downloaded
raiseswhether to raise errors
n_workersnumber of download workers

Download content contained in uploaded data.

Items to be downloaded are identifier via the subdocuments in the form exemplified below. By default items are downloaded to the database, unless a download_update function is provided.

d = {"_content": {"uri": "<uri>", "encoder": "<encoder-identifier>"}}
def update(key, id, bytes):
... with open(f'/tmp/{key}+{id}', 'wb') as f:
... f.write(bytes)
download_content(None, None, ids=["0"], documents=[d]))

download_from_one​

download_from_one(r: superduper.base.document.Document)
ParameterDescription
rdocument to download from

Download content from a single document.

This function will find all URIs in the document and download them.

gather_uris​

gather_uris(documents: Sequence[superduper.base.document.Document],
gather_ids: bool = True) -> Tuple[List[str],
List[str],
List[Any],
List[str]]
ParameterDescription
documentslist of dictionaries
gather_idsif True then gather ids of documents

Get the uris out of all documents as denoted by {"_content": ...}.

timeout​

timeout(seconds)
ParameterDescription
secondsseconds until timeout

Context manager to set a timeout.

timeout_handler​

timeout_handler(signum,
frame)
ParameterDescription
signumsignal number
frameframe

Timeout handler to raise an TimeoutException.

BaseDownloader​

BaseDownloader(self,
uris: List[str],
n_workers: int = 0,
timeout: Optional[int] = None,
headers: Optional[Dict] = None,
raises: bool = True)
ParameterDescription
urislist of uris/ file names to fetch
n_workersnumber of multiprocessing workers
timeoutset seconds until request times out
headersdictionary of request headers passed torequests package
raisesraises error True/False

Base class for downloading files.

Downloader​

Downloader(self,
uris,
update_one: Optional[Callable] = None,
ids: Union[List[str],
List[int],
NoneType] = None,
keys: Optional[List[str]] = None,
datatypes: Optional[List[str]] = None,
n_workers: int = 20,
headers: Optional[Dict] = None,
skip_existing: bool = True,
timeout: Optional[int] = None,
raises: bool = True)
ParameterDescription
urislist of uris/ file names to fetch
update_onefunction to call to insert data into table
idslist of ids of rows/ documents to update
keyslist of keys in rows/ documents to insert to
datatypeslist of datatypes of rows/ documents to insert to
n_workersnumber of multiprocessing workers
headersdictionary of request headers passed torequests package
skip_existingif True then don't bother getting already present data
timeoutset seconds until request times out
raisesraises error True/False

Download files from a list of URIs.

Fetcher​

Fetcher(self,
headers: Optional[Dict] = None,
n_workers: int = 0)
ParameterDescription
headersheaders to be used for download
n_workersnumber of download workers

Fetches data from a URI.

TimeoutException​

TimeoutException(self,
/,
*args,
**kwargs)
ParameterDescription
args*args of Exception
kwargs**kwargs of Exception

Timeout exception.

Updater​

Updater(self,
db,
query)
ParameterDescription
dbDatalayer instance
queryquery to be executed

Updater class to update the artifact.