in Education by
After preparing data from a dataset, I want to save the prepared data using h5py. The data is a float32 numpy array of shape (16813, 60, 44, 257). Preparing the data is very fast, only a few seconds to prepare 13GB of data. But when I try to write the data to disk (500mb/s SSD) using h5py it gets very slow (waited for ours) and it even freezes/crashes the computer. hf = h5py.File('sequences.h5', 'a') hf.create_dataset('X_train', data=X_train) hf.create_dataset('Y_train', data=Y_train) hf.close() I calculated that the data in memory should be around 160GB. Why is it so slow? I tried multiple things like compressing, chunking, predefine shape and write while preparing. JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by
If you implement chunking correctly, writing this much data should be "relatively fast" (minutes, not hours). Without chunking, this could take a very long time. To demonstrate how to use chunking (and provide a timing benchmark), I wrote a short code segment that populates a dataset with some random data (of type np.float32). I create and write the data incrementally because don't have enough RAM to store an array of this size (16813, 60, 44, 257) in memory. This example runs in ~6 minutes on an old Windows system with 24GB RAM and a mechanical HDD (6gbps @ 7200rpm). File size is 42.7 GB. You should get much faster times with a SSD. Snapshot of output: Creating slice: 0 Writing slice: 0; Time to write: 0.13 ... Creating slice: 256 Writing slice: 256; Time to write: 0.91 Total time: 377.47 See code below. Chunk size was set at (100, 60, 44, 1) which is ~1MB for np.float32. h5py docs recommend keeping chunk size between 10 KB and 1 MB -- larger for larger datasets. Ref: h5py Chunked Storage. You should adjust the chuck shape to typical read/write data block shape you will use. Note: I did not add compression. This reduces the on-disk file size, but increases I/O time to compress/uncompressed the data on-the fly. Code below: with h5py.File('sequences.h5', 'w') as h5f: ds = h5f.create_dataset('X_train', shape=(16813, 60, 44, 257), chunks=(100, 60, 44, 1), dtype=np.float32) start = time.time() for i in range(257): print(f'Creating slice: {i}') arr = np.random.random(16813*60*44).astype(np.float32).reshape(16813, 60, 44, 1) incr = time.time() print(f'Writing slice: {i}; ',end='') ds[:,:,:,i:i+1] = arr print(f'Time to write: {time.time()-incr:.2f}') print(f'Total time: {time.time()-start:.2f}')

Related questions

0 votes
    Artificial Intelligence has evolved extremely in all the fields except for _________ a) Web mining b) Construction ... natural language robustly d) All of the mentioned...
asked Jan 15, 2023 in Education by JackTerrance
0 votes
    Java Beans are extremely secured? (a) True (b) False I got this question in unit test. Question is taken from ... Beans & JDBC of Java Select the correct answer from above options...
asked Feb 22, 2022 in Education by JackTerrance
0 votes
    Although it may seem overly simplistic, _______ is extremely useful both conceptually and practically. (a) Linear ... R Programming Select the correct answer from above options...
asked Feb 9, 2022 in Education by JackTerrance
0 votes
0 votes
    Java Beans are extremely secured? (a) True (b) False I got this question in unit test. Question ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Oct 24, 2021 in Education by JackTerrance
0 votes
    Bitmaps can be combined with regular B+-tree indices for relations where a few attribute values are ... Database Interview Questions and Answers for Freshers and Experience...
asked Oct 11, 2021 in Education by JackTerrance
0 votes
    Big Boss 16: Will Priyanka Choudhary will win Big Boss 16 with Slow and Steady Race? Big Boss Live Watch : Watch ... -plans-to-evict-her-from-the-house-latest-tv-news-2295454%2F...
asked Jan 4, 2023 in Technology by Editorial Staff
0 votes
    I'm trying to enable Slow Query Logging on mysql 5.7 and getting this error: 2016-04-27T14:55:51 ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jul 11, 2022 in Education by JackTerrance
0 votes
    My context IINM, the percentage-height assumes that he height of the parent is available when the height is ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 8, 2022 in Education by JackTerrance
0 votes
    So I have some code which copies files to 5 remote PCs on the network. I have a class which ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 26, 2022 in Education by JackTerrance
0 votes
    I'm fairly new to python and web-scraping in general. The code below works but it seems to be ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 26, 2022 in Education by JackTerrance
0 votes
    I am facing issue of very slow result. I am sharing table structure as and results also. if you ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 9, 2022 in Education by JackTerrance
0 votes
    I have an EC2 instance (medium, us-east-1d), and RDS instance (us-east-1a, db.t2.medium). I ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 3, 2022 in Education by JackTerrance
0 votes
    I had someting like this in my code (.Net 2.0, MS SQL) SqlConnection connection = new SqlConnection ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 15, 2022 in Education by JackTerrance
0 votes
    I had someting like this in my code (.Net 2.0, MS SQL) SqlConnection connection = new SqlConnection ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 13, 2022 in Education by JackTerrance
...