Skip to content
Snippets Groups Projects
Commit 81d268df authored by Marco Frailis's avatar Marco Frailis
Browse files

Adding jupyter notebook with examples

parent 283c6da9
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
## HDF5 Datasets
In the following we show some examples on how to create Datasets in HDF5 (whith h5py) and update values
### First dataset
%% Cell type:code id: tags:
``` python
import h5py
from timeit import timeit # To measure execution time
import numpy as np # this is the main python numerical library
f = h5py.File("testdata.hdf5",'w')
# We create a test 2-d array filled with 1s and with 10 rows and 6 columns
data = np.ones((10, 6))
f["dataset_one"] = data
# We now retrieve the dataset from file
dset = f["dataset_one"]
```
%% Cell type:code id: tags:
``` python
#The following instructions show some dataset metadata
print(dset)
print(dset.dtype)
print(dset.shape)
```
%% Cell type:markdown id: tags:
### Dataset slicing
Datasets provide analogous slicing operations as numpy arrays (with h5py). But these selections are translated by h5py to portion of the dataset and then HDF5 read the data form "disk". Slicing into a dataset object returns a NumpPy array.
%% Cell type:code id: tags:
``` python
out = dset[...]
print(out)
type(out)
```
%% Cell type:code id: tags:
``` python
dset[1:5, 1] = 0.0
dset[...]
```
%% Cell type:code id: tags:
``` python
# random 2d distribution
data = np.random.rand(15, 10)*2 - 1
dset = f.create_dataset('random', data=data)
# print the first 5 even rows and the first two columns
out = dset[0:10:2, :2]
print(out)
# clipping to zero all negative values
dset[data<0] = 0
```
%% Cell type:markdown id: tags:
### Resizable datasets
If we don't know in advance the dataset size and we need to append new data several times, we have to create a resizable dataset, then we have to append data in a scalable manner
%% Cell type:code id: tags:
``` python
dset = f.create_dataset('dataset_two', (1,1000), dtype=np.float32,
maxshape=(None, 1000))
a = np.ones((1000,1000))
num_rows = dset.shape[0]
dset.resize((num_rows+a.shape[0], 1000))
for row in a:
dset[num_rows,:] = row
num_rows +=1
print(dset[1000,:20])
```
%% Cell type:markdown id: tags:
## Groups
We can directly create nested groups with a single instruction. For instance to create the group 'nisp_frame', then the subgroup 'detectors' and at last its child group 'det11', we can use the instruction below.
%% Cell type:code id: tags:
``` python
grp = f.create_group('nisp_frame/detectors/det11')
grp['sci_image'] = np.zeros((2040,2040))
print(grp.name) # the group name property
print(grp.parent) # the parent group property
print(grp.file) # the file property
print(grp) # prints some group information. It has one member, the dataset
```
%% Cell type:markdown id: tags:
## Attributes
Attributes can be defined inside a group or in a dataset. Both have the **.attrs** property to access an attribute or define new attributes. With h5py, the attribute type is inferred from the passed value, but it is also possible to explicitly assign a type.
%% Cell type:code id: tags:
``` python
grp = f['nisp_frame']
grp.attrs['telescope'] = 'Euclid'
grp.attrs['instrument'] = 'NISP'
grp.attrs['pointing'] = np.array([8.48223045516, -20.4610801911, 64.8793517547])
grp.attrs.create('detector_id', '11', dtype="|S2")
print(grp.attrs['pointing'])
print(grp.attrs['detector_id'])
```
%% Cell type:code id: tags:
``` python
f.close()
!h5ls -vlr testdata.hdf5
```
%% Cell type:markdown id: tags:
## Tables (compound types)
Tables can be stored as datasets where the elements (rows) have the same compound type.
%% Cell type:code id: tags:
``` python
f = h5py.File("testdata.hdf5")
dt = np.dtype([('source_id', np.uint32), ('ra', np.float32), ('dec', np.float32), ('magnitude', np.float64)])
grp = f.create_group('source_catalog/det11')
dset = grp.create_dataset('star_catalog', (100,), dtype=dt)
dset['source_id', 0] = 1
print(dset['source_id', 'ra', :20])
print(dset[0])
```
%% Cell type:markdown id: tags:
## References and Region references
In the following instruction we create a reference from the detector 11 scientific image to the corresponding star catalog, which is stored in the same file
%% Cell type:code id: tags:
``` python
sci_image = f['/nisp_frame/detectors/det11/sci_image']
sci_image.attrs['star_catalog'] = dset.ref
cat_ref = sci_image.attrs['star_catalog']
print(cat_ref)
dset = f[cat_ref]
print(dset[0])
dt = h5py.special_dtype(ref=h5py.Reference)
# the above data type dt can be used to create a dataset of references or just an attribute
```
%% Cell type:code id: tags:
``` python
roi = sci_image.regionref[15:20, 36:78]
sci_image[roi]
```
%% Cell type:markdown id: tags:
## Chuncking
%% Cell type:code id: tags:
``` python
dset = f.create_dataset('chunked', (10,2048,2048), dtype=np.uint16, chunks=(1,64,64))
```
%% Cell type:code id: tags:
``` python
dset = f.require_dataset('auto_chunked', (2048,2048), dtype=np.float32, compression="gzip")
print(dset.compression)
print(dset.compression_opts)
print(dset.chunks)
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment