h5py extremely slow writing

Question

h5py extremely slow writing

asked Apr 2, 2022 in Education by JackTerrance

After preparing data from a dataset, I want to save the prepared data using h5py. The data is a float32 numpy array of shape (16813, 60, 44, 257). Preparing the data is very fast, only a few seconds to prepare 13GB of data. But when I try to write the data to disk (500mb/s SSD) using h5py it gets very slow (waited for ours) and it even freezes/crashes the computer. hf = h5py.File('sequences.h5', 'a') hf.create_dataset('X_train', data=X_train) hf.create_dataset('Y_train', data=Y_train) hf.close() I calculated that the data in memory should be around 160GB. Why is it so slow? I tried multiple things like compressing, chunking, predefine shape and write while preparing. JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

Related questions

0 votes

Q: Artificial Intelligence has evolved extremely in all the fields except for _________

Artificial Intelligence has evolved extremely in all the fields except for _________ a) Web mining b) Construction ... natural language robustly d) All of the mentioned...

asked Jan 15, 2023 in Education by JackTerrance

0 votes

Q: Java Beans are extremely secured?

Java Beans are extremely secured? (a) True (b) False I got this question in unit test. Question is taken from ... Beans & JDBC of Java Select the correct answer from above options...

asked Feb 22, 2022 in Education by JackTerrance

0 votes

Q: Although it may seem overly simplistic, _______ is extremely useful both conceptually and practically.

Although it may seem overly simplistic, _______ is extremely useful both conceptually and practically. (a) Linear ... R Programming Select the correct answer from above options...

asked Feb 9, 2022 in Education by JackTerrance

0 votes

Q: A group of 12 persons, of which 3 are extremely patient, other 6 are extremely honest and rest are extremely kind. A person from the group is selected

A group of 12 persons, of which 3 are extremely patient, other 6 are extremely honest and rest are extremely kind ... you prefer more? Select the correct answer from above options...

asked Nov 17, 2021 in Education by JackTerrance

0 votes

Q: Java Beans are extremely secured?

Java Beans are extremely secured? (a) True (b) False I got this question in unit test. Question ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Oct 24, 2021 in Education by JackTerrance

0 votes

Q: Bitmaps can be combined with regular B+-tree indices for relations where a few attribute values are extremely common, and other values also occur, but much less frequently.

Bitmaps can be combined with regular B+-tree indices for relations where a few attribute values are ... Database Interview Questions and Answers for Freshers and Experience...

asked Oct 11, 2021 in Education by JackTerrance

0 votes

Q: Big Boss 16: Will Priyanka Choudhary will win Big Boss 16 with Slow and Steady Race?

Big Boss 16: Will Priyanka Choudhary will win Big Boss 16 with Slow and Steady Race? Big Boss Live Watch : Watch ... -plans-to-evict-her-from-the-house-latest-tv-news-2295454%2F...

asked Jan 4, 2023 in Technology by Editorial Staff

0 votes

Q: mysql 5.7 log-slow-queries error

I'm trying to enable Slow Query Logging on mysql 5.7 and getting this error: 2016-04-27T14:55:51 ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Jul 11, 2022 in Education by JackTerrance

0 votes

Q: Height in % - Slow rendering

My context IINM, the percentage-height assumes that he height of the parent is available when the height is ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Jun 8, 2022 in Education by JackTerrance

0 votes

Q: Why impersonated Parallel function performs slow on the first run but speeds up subsequently?

So I have some code which copies files to 5 remote PCs on the network. I have a class which ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked May 26, 2022 in Education by JackTerrance

0 votes

Q: simple web scraper very slow

I'm fairly new to python and web-scraping in general. The code below works but it seems to be ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 26, 2022 in Education by JackTerrance

0 votes

Q: Very Slow result when use WHERE and ORDER BY condition in MYSQL Query

I am facing issue of very slow result. I am sharing table structure as and results also. if you ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 9, 2022 in Education by JackTerrance

0 votes

Q: Why are RDS queries from EC2 to RDS taking around 22ms each, which is very slow

I have an EC2 instance (medium, us-east-1d), and RDS instance (us-east-1a, db.t2.medium). I ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 3, 2022 in Education by JackTerrance

0 votes

Q: How to change slow parametrized inserts into fast bulk copy (even from memory)

I had someting like this in my code (.Net 2.0, MS SQL) SqlConnection connection = new SqlConnection ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Mar 15, 2022 in Education by JackTerrance

0 votes

Q: How to change slow parametrized inserts into fast bulk copy (even from memory)

I had someting like this in my code (.Net 2.0, MS SQL) SqlConnection connection = new SqlConnection ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Mar 13, 2022 in Education by JackTerrance

JackTerrance · Answer 1 · 2022-04-02T18:13:08+0000

If you implement chunking correctly, writing this much data should be "relatively fast" (minutes, not hours). Without chunking, this could take a very long time. To demonstrate how to use chunking (and provide a timing benchmark), I wrote a short code segment that populates a dataset with some random data (of type np.float32). I create and write the data incrementally because don't have enough RAM to store an array of this size (16813, 60, 44, 257) in memory. This example runs in ~6 minutes on an old Windows system with 24GB RAM and a mechanical HDD (6gbps @ 7200rpm). File size is 42.7 GB. You should get much faster times with a SSD. Snapshot of output: Creating slice: 0 Writing slice: 0; Time to write: 0.13 ... Creating slice: 256 Writing slice: 256; Time to write: 0.91 Total time: 377.47 See code below. Chunk size was set at (100, 60, 44, 1) which is ~1MB for np.float32. h5py docs recommend keeping chunk size between 10 KB and 1 MB -- larger for larger datasets. Ref: h5py Chunked Storage. You should adjust the chuck shape to typical read/write data block shape you will use. Note: I did not add compression. This reduces the on-disk file size, but increases I/O time to compress/uncompressed the data on-the fly. Code below: with h5py.File('sequences.h5', 'w') as h5f: ds = h5f.create_dataset('X_train', shape=(16813, 60, 44, 257), chunks=(100, 60, 44, 1), dtype=np.float32) start = time.time() for i in range(257): print(f'Creating slice: {i}') arr = np.random.random(16813*60*44).astype(np.float32).reshape(16813, 60, 44, 1) incr = time.time() print(f'Writing slice: {i}; ',end='') ds[:,:,:,i:i+1] = arr print(f'Time to write: {time.time()-incr:.2f}') print(f'Total time: {time.time()-start:.2f}')