If you implement chunking correctly, writing this much data should be "relatively fast" (minutes, not hours). Without chunking, this could take a very long time.
To demonstrate how to use chunking (and provide a timing benchmark), I wrote a short code segment that populates a dataset with some random data (of type np.float32). I create and write the data incrementally because don't have enough RAM to store an array of this size (16813, 60, 44, 257) in memory.
This example runs in ~6 minutes on an old Windows system with 24GB RAM and a mechanical HDD (6gbps @ 7200rpm). File size is 42.7 GB. You should get much faster times with a SSD. Snapshot of output:
Creating slice: 0
Writing slice: 0; Time to write: 0.13
...
Creating slice: 256
Writing slice: 256; Time to write: 0.91
Total time: 377.47
See code below. Chunk size was set at (100, 60, 44, 1) which is ~1MB for np.float32. h5py docs recommend keeping chunk size between 10 KB and 1 MB -- larger for larger datasets. Ref: h5py Chunked Storage. You should adjust the chuck shape to typical read/write data block shape you will use. Note: I did not add compression. This reduces the on-disk file size, but increases I/O time to compress/uncompressed the data on-the fly.
Code below:
with h5py.File('sequences.h5', 'w') as h5f:
ds = h5f.create_dataset('X_train', shape=(16813, 60, 44, 257),
chunks=(100, 60, 44, 1), dtype=np.float32)
start = time.time()
for i in range(257):
print(f'Creating slice: {i}')
arr = np.random.random(16813*60*44).astype(np.float32).reshape(16813, 60, 44, 1)
incr = time.time()
print(f'Writing slice: {i}; ',end='')
ds[:,:,:,i:i+1] = arr
print(f'Time to write: {time.time()-incr:.2f}')
print(f'Total time: {time.time()-start:.2f}')