# Using HDF5 files

HDF5 Format

The data made available on the Big Data Knowledge Discovery database is stored in HDF5 format. HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data.

See the HDF Group website for more details and to download the required tools and software.

https://www.hdfgroup.org/HDF5/

In order to view the structure and contents of a HDF5 file you can use HDFView (https://www.hdfgroup.org/products/java/)

Example:

In this example, a single file from the “Semiconductor Laser with Optical Feedback – Bulk – 4GHz Detection BW – Optical Spectra” dataset will be used to demonstrate the process of accessing the contents using both Matlab and Python.

Matlab

Accessing the data with Matlab can be achieved using the built-in HDF5 functions

http://au.mathworks.com/help/matlab/hdf5-files.html

Example:

The filename is “AOM_0.256V_INJ_68.5mA.h5”. Ensure this file is in the Matlab working directory.

In Matlab command window type:

ds_info = h5info(‘AOM_0.256V_INJ_68.5mA.h5’)

This will create a structure called ds_info that contains 7 fields. You can explore this structure from the Matlab’s Variable Viewer (double click on ds_info in the Workspace).

One of the fields is another structure called dsinfo.Attributes. This contains names and values detailing the system parameters at which the data was recorded.

e.g.
var1_name = ‘AOM Voltage (V)’
var1 = 0.256
var2_name = ‘Injection Current (mA)’
var2 = 68.5

The field called ds_info.Datasets contains the data recorded under those conditions. In this example there are 2 sets of data named “OpticalSpectra” and “TimeSeries”. Both of these have associated attributes containing information regarding sampling rates, units, etc.

This script demonstrates how to plot the data.

%%  Read HDF5 file information
file_name = 'AOM_0.256V_INJ_68.5mA.h5';
ds_info = h5info(file_name);

%% ------------------------ Time Series Data ------------------------------

% Create array of x-axis values
ts_x_vals = (0:length(TimeSeries)-1)*ts_x_int;

% Read x and y axis labels

%% ----------------------- Optical Spectrum Data --------------------------

% Create array of x-axis values
os_x_vals = os_x_start:os_x_int:os_x_finish;

% Read x and y axis labels

%% --------------------------- Create figure ------------------------------
figure(1)
subplot(1,2,1)
plot(ts_x_vals,TimeSeries,'b')
xlabel(ts_x_label)
ylabel(ts_y_label)
legend('Time Series')

subplot(1,2,2)
plot(os_x_vals,OpticalSpectrum,'r')
xlabel(os_x_label)
ylabel(os_y_label)
legend('Optical Spectrum')

ha = axes('Position',[0 0 1 1],'Xlim',[0 1],'Ylim',[0 1],'Box',...
'off','Visible','off','Units','normalized', 'clipping' , 'off');

text(0.5, 1,...
['\bf Laser operating with ' var1_name ' = ' num2str(var1) ' and ' var2_name ' = ' num2str(var2)],...
'HorizontalAlignment','center','VerticalAlignment', 'top')

Python

To access the data from Python you will need the h5py package (http://docs.h5py.org/en/latest/)

The following script can be used to plot the data (written for Python 3.5)

import numpy as np
import h5py
import matplotlib.pyplot as plt

file_name = "AOM_0.256V_INJ_68.5mA.h5"

h5f = h5py.File(file_name, 'r')
ts_group = h5f["/TimeSeries"]
os_group = h5f["/OpticalSpectrum"]
ts = h5f['TimeSeries'][:]
os = h5f['OpticalSpectrum'][:]
root_att = dict()
for item in h5f.attrs.keys():
if isinstance(h5f.attrs[item], np.bytes_):
root_att[item] = h5f.attrs[item].decode("utf-8")
else:
root_att[item] = h5f.attrs[item][0]

ts_group_att = dict()
for item in ts_group.attrs.keys():
if isinstance(ts_group.attrs[item], np.bytes_):
ts_group_att[item] = ts_group.attrs[item].decode("utf-8")
else:
ts_group_att[item] = ts_group.attrs[item]

os_group_att = dict()
for item in os_group.attrs.keys():
if isinstance(os_group.attrs[item], np.bytes_):
os_group_att[item] = os_group.attrs[item].decode("utf-8")
else:
os_group_att[item] = os_group.attrs[item]
h5f.close()

ts_x_vals = np.arange(len(ts))*ts_group_att['x_int']
os_x_vals = np.linspace(os_group_att['x_start'], os_group_att['x_finish'], num=len(os))

title_text = 'Laser operating with ' + str(root_att['var1_name']) + ' = ' + str(root_att['var1']) + ' and ' + str(root_att['var2_name']) + ' = ' + str(root_att['var2'])

plt.figure(1)
plt.suptitle(title_text)
plt.subplot(1, 2, 1)
plt.plot(ts_x_vals, ts)
plt.xlabel(ts_group_att['x_label'])
plt.ylabel(ts_group_att['y_label'])
plt.subplot(1, 2, 2)
plt.plot(os_x_vals, os)
plt.xlabel(os_group_att['x_label'])
plt.ylabel(os_group_att['y_label'])
plt.show()