Chunked storage utilities¶
This module provides an abstraction layer over generic chunked array storage libraries. Currently HDF5 (via h5py), bcolz and zarr are supported storage layers.
Different storage configurations can be used with the functions and classes defined below. Wherever a function or method takes a storage keyword argument, the value of the argument will determine the storage used for the output.
If storage is a string, it will be used to look up one of several predefined storage configurations via the storage registry, which is a dictionary located at allel.chunked.storage_registry. The default storage can be changed globally by setting the value of the ‘default’ key in the storage registry.
Alternatively, storage may be an instance of one of the storage classes
defined below, e.g., allel.chunked.storage_bcolz.BcolzMemStorage
or
allel.chunked.storage_hdf5.HDF5TmpStorage
, which allows custom
configuration of storage parameters such as compression type and level.
For example:
>>> from allel import chunked
>>> import numpy as np
>>> a = np.arange(10000000)
>>> chunked.copy(a)
<zarr.core.Array (10000000,) int64>
>>> chunked.copy(a, storage='bcolzmem')
carray((10000000,), int64)
...
>>> chunked.copy(a, storage='bcolztmp')
carray((10000000,), int64)
...
>>> chunked.copy(a, storage='zarrmem')
<zarr.core.Array (10000000,) int64>
>>> chunked.copy(a, storage='zarrtmp')
<zarr.core.Array (10000000,) int64>
>>> chunked.copy(a, storage=chunked.BcolzStorage(cparams=bcolz.cparams(cname='lz4')))
carray((10000000,), int64)
...
>>> chunked.copy(a, storage='hdf5mem_zlib1')
<HDF5 dataset "data": shape (10000000,), type "<i8">
>>> chunked.copy(a, storage='hdf5tmp_zlib1')
<HDF5 dataset "data": shape (10000000,), type "<i8">
>>> import h5py
>>> h5f = h5py.File('example.h5', mode='w')
>>> h5g = h5f.create_group('test')
>>> chunked.copy(a, storage='hdf5', group=h5g, name='data')
<HDF5 dataset "data": shape (10000000,), type "<i8">
>>> h5f['test/data']
<HDF5 dataset "data": shape (10000000,), type "<i8">
Storage¶
bcolz¶
-
class
allel.chunked.storage_bcolz.
BcolzStorage
(**kwargs)[source]¶ Storage layer using bcolz carray and ctable.
-
allel.chunked.storage_bcolz.
bcolz_storage
= 'bcolz'¶ bcolz storage with default parameters
-
allel.chunked.storage_bcolz.
bcolzmem_storage
= 'bcolzmem'¶ bcolz in-memory storage with default compression
-
allel.chunked.storage_bcolz.
bcolztmp_storage
= 'bcolztmp'¶ bcolz temporary file storage with default compression
-
allel.chunked.storage_bcolz.
bcolz_zlib1_storage
= 'bcolz_zlib1'¶ bcolz storage with zlib level 1 compression
-
allel.chunked.storage_bcolz.
bcolzmem_zlib1_storage
= 'bcolzmem_zlib1'¶ bcolz in-memory storage with zlib level 1 compression
-
allel.chunked.storage_bcolz.
bcolztmp_zlib1_storage
= 'bcolztmp_zlib1'¶ bcolz temporary file storage with zlib level 1 compression
HDF5 (h5py)¶
-
class
allel.chunked.storage_hdf5.
HDF5Storage
(**kwargs)[source]¶ Storage layer using HDF5 dataset and group.
-
allel.chunked.storage_hdf5.
hdf5_storage
= 'hdf5'¶ HDF5 storage with default parameters
-
allel.chunked.storage_hdf5.
hdf5mem_storage
= 'hdf5mem'¶ HDF5 in-memory storage with default compression
-
allel.chunked.storage_hdf5.
hdf5tmp_storage
= 'hdf5tmp'¶ HDF5 temporary file storage with default compression
-
allel.chunked.storage_hdf5.
hdf5_zlib1_storage
= 'hdf5_zlib1'¶ HDF5 storage with zlib level 1 compression
-
allel.chunked.storage_hdf5.
hdf5mem_zlib1_storage
= 'hdf5mem_zlib1'¶ HDF5 in-memory storage with zlib level 1 compression
-
allel.chunked.storage_hdf5.
hdf5tmp_zlib1_storage
= 'hdf5tmp_zlib1'¶ HDF5 temporary file storage with zlib level 1 compression
-
allel.chunked.storage_hdf5.
hdf5_lzf_storage
= 'hdf5_lzf'¶ HDF5 storage with LZF compression
-
allel.chunked.storage_hdf5.
hdf5mem_lzf_storage
= 'hdf5mem_lzf'¶ HDF5 in-memory storage with LZF compression
-
allel.chunked.storage_hdf5.
hdf5tmp_lzf_storage
= 'hdf5tmp_lzf'¶ HDF5 temporary file storage with LZF compression
Functions¶
-
allel.chunked.core.
store
(data, arr, start=0, stop=None, offset=0, blen=None)[source]¶ Copy data block-wise into arr.
-
allel.chunked.core.
copy
(data, start=0, stop=None, blen=None, storage=None, create='array', **kwargs)[source]¶ Copy data block-wise into a new array.
-
allel.chunked.core.
map_blocks
(data, f, blen=None, storage=None, create='array', **kwargs)[source]¶ Apply function f block-wise over data.
-
allel.chunked.core.
reduce_axis
(data, reducer, block_reducer, mapper=None, axis=None, blen=None, storage=None, create='array', **kwargs)[source]¶ Apply an operation to data that reduces over one or more axes.
-
allel.chunked.core.
amax
(data, axis=None, mapper=None, blen=None, storage=None, create='array', **kwargs)[source]¶ Compute the maximum value.
-
allel.chunked.core.
amin
(data, axis=None, mapper=None, blen=None, storage=None, create='array', **kwargs)[source]¶ Compute the minimum value.
-
allel.chunked.core.
asum
(data, axis=None, mapper=None, blen=None, storage=None, create='array', **kwargs)[source]¶ Compute the sum.
-
allel.chunked.core.
count_nonzero
(data, mapper=None, blen=None, storage=None, create='array', **kwargs)[source]¶ Count the number of non-zero elements.
-
allel.chunked.core.
compress
(condition, data, axis=0, out=None, blen=None, storage=None, create='array', **kwargs)[source]¶ Return selected slices of an array along given axis.
-
allel.chunked.core.
take
(data, indices, axis=0, out=None, mode='raise', blen=None, storage=None, create='array', **kwargs)[source]¶ Take elements from an array along an axis.
-
allel.chunked.core.
subset
(data, sel0=None, sel1=None, blen=None, storage=None, create='array', **kwargs)[source]¶ Return selected rows and columns of an array.
-
allel.chunked.core.
concatenate
(tup, axis=0, blen=None, storage=None, create='array', **kwargs)[source]¶ Concatenate arrays.
-
allel.chunked.core.
binary_op
(data, op, other, blen=None, storage=None, create='array', **kwargs)[source]¶ Compute a binary operation block-wise over data.
-
allel.chunked.core.
copy_table
(tbl, start=0, stop=None, blen=None, storage=None, create='table', **kwargs)[source]¶ Copy tbl block-wise into a new table.
-
allel.chunked.core.
compress_table
(condition, tbl, axis=None, out=None, blen=None, storage=None, create='table', **kwargs)[source]¶ Return selected rows of a table.
-
allel.chunked.core.
take_table
(tbl, indices, axis=None, out=None, mode='raise', blen=None, storage=None, create='table', **kwargs)[source]¶ Return selected rows of a table.
Classes¶
-
class
allel.chunked.core.
ChunkedArrayWrapper
(data)[source]¶ Wrapper class for chunked array-like data.
Parameters: - data : array_like
Data to be wrapped. May be a bcolz carray, h5py dataset, or anything providing a similar interface.
-
class
allel.chunked.core.
ChunkedTableWrapper
(data, names=None)[source]¶ Wrapper class for chunked table-like data.
Parameters: - data: table_like
Data to be wrapped. May be a tuple or list of columns (array-like), a dict mapping names to columns, a bcolz ctable, h5py group, numpy recarray, or anything providing a similar interface.
- names : sequence of strings
Column names.