in Education by
What is the most efficient way to remove duplicate items from an array under the constraint that axillary memory usage must be to a minimum, preferably small enough to not even require any heap allocations? Sorting seems like the obvious choice, but this is clearly not asymptotically efficient. Is there a better algorithm that can be done in place or close to in place? If sorting is the best choice, what kind of sort would be best for something like this? JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by
I'll answer my own question since, after posting, I came up with a really clever algorithm to do this. It uses hashing, building something like a hash set in place. It's guaranteed to be O(1) in axillary space (the recursion is a tail call), and is typically O(N) time complexity. The algorithm is as follows: Take the first element of the array, this will be the sentinel. Reorder the rest of the array, as much as possible, such that each element is in the position corresponding to its hash. As this step is completed, duplicates will be discovered. Set them equal to sentinel. Move all elements for which the index is equal to the hash to the beginning of the array. Move all elements that are equal to sentinel, except the first element of the array, to the end of the array. What's left between the properly hashed elements and the duplicate elements will be the elements that couldn't be placed in the index corresponding to their hash because of a collision. Recurse to deal with these elements. This can be shown to be O(N) provided no pathological scenario in the hashing: Even if there are no duplicates, approximately 2/3 of the elements will be eliminated at each recursion. Each level of recursion is O(n) where small n is the amount of elements left. The only problem is that, in practice, it's slower than a quick sort when there are few duplicates, i.e. lots of collisions. However, when there are huge amounts of duplicates, it's amazingly fast. Edit: In current implementations of D, hash_t is 32 bits. Everything about this algorithm assumes that there will be very few, if any, hash collisions in full 32-bit space. Collisions may, however, occur frequently in the modulus space. However, this assumption will in all likelihood be true for any reasonably sized data set. If the key is less than or equal to 32 bits, it can be its own hash, meaning that a collision in full 32-bit space is impossible. If it is larger, you simply can't fit enough of them into 32-bit memory address space for it to be a problem. I assume hash_t will be increased to 64 bits in 64-bit implementations of D, where datasets can be larger. Furthermore, if this ever did prove to be a problem, one could change the hash function at each level of recursion. Here's an implementation in the D programming language: void uniqueInPlace(T)(ref T[] dataIn) { uniqueInPlaceImpl(dataIn, 0); } void uniqueInPlaceImpl(T)(ref T[] dataIn, size_t start) { if(dataIn.length - start < 2) return; invariant T sentinel = dataIn[start]; T[] data = dataIn[start + 1..$]; static hash_t getHash(T elem) { static if(is(T == uint) || is(T == int)) { return cast(hash_t) elem; } else static if(__traits(compiles, elem.toHash)) { return elem.toHash; } else { static auto ti = typeid(typeof(elem)); return ti.getHash(&elem); } } for(size_t index = 0; index < data.length;) { if(data[index] == sentinel) { index++; continue; } auto hash = getHash(data[index]) % data.length; if(index == hash) { index++; continue; } if(data[index] == data[hash]) { data[index] = sentinel; index++; continue; } if(data[hash] == sentinel) { swap(data[hash], data[index]); index++; continue; } auto hashHash = getHash(data[hash]) % data.length; if(hashHash != hash) { swap(data[index], data[hash]); if(hash < index) index++; } else { index++; } } size_t swapPos = 0; foreach(i; 0..data.length) { if(data[i] != sentinel && i == getHash(data[i]) % data.length) { swap(data[i], data[swapPos++]); } } size_t sentinelPos = data.length; for(size_t i = swapPos; i < sentinelPos;) { if(data[i] == sentinel) { swap(data[i], data[--sentinelPos]); } else { i++; } } dataIn = dataIn[0..sentinelPos + start + 1]; uniqueInPlaceImpl(dataIn, start + swapPos + 1); }

Related questions

0 votes
    What is the most efficient way to remove duplicate items from an array under the constraint that axillary ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 10, 2022 in Education by JackTerrance
0 votes
    What is the most efficient way to remove duplicate items from an array under the constraint that axillary ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 10, 2022 in Education by JackTerrance
0 votes
    What is the most efficient way to remove duplicate items from an array under the constraint that axillary ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 8, 2022 in Education by JackTerrance
0 votes
    Main memory is also known as __ memory? (a) Secondary memory (b) Auxiliary memory (c) Primary memory (d) None of the above Select the correct answer from above options...
asked Dec 13, 2021 in Education by JackTerrance
0 votes
    I am newbie to android development and learning it to my own. I have a very strange problem of ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 18, 2022 in Education by JackTerrance
0 votes
    I am newbie to android development and learning it to my own. I have a very strange problem of ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 14, 2022 in Education by JackTerrance
0 votes
    This question already has answers here: RxJava delay for each item of list emitted (17 answers) Closed 5 ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 8, 2022 in Education by JackTerrance
0 votes
    I've got an extremely long XML file, like context1 test1 context1 context2 test2 context2 ........ ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 4, 2022 in Education by JackTerrance
0 votes
    How do you remove Duplicate records in Informatica? And how many ways are there to do it?...
asked Mar 27, 2021 in Technology by JackTerrance
0 votes
    I'd like to make my yearless dates class play nice with Range#include?, according to the docs, all ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 7, 2022 in Education by JackTerrance
0 votes
    Which forms simplifies and ensures that there are minimal data aggregates and repetitive groups: (a) 1NF (b ... from Normal Forms in portion Normalization of Database Management...
asked Oct 10, 2021 in Education by JackTerrance
0 votes
    What is the minimal set of information for matching dependency reference against a dependencyManagement section?...
asked May 4, 2021 in Technology by Editorial Staff
0 votes
    Differentiate between the verbose and minimal monitoring in Azure?...
asked Dec 27, 2020 in Technology by JackTerrance
0 votes
    I have this JSON tree view that represents a menu : var menus = [ { label : "1", items: [ ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 3, 2022 in Education by JackTerrance
0 votes
    I have hours trying to figure out how to avoid this bullet points at the bottom being cutted when the ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 2, 2022 in Education by JackTerrance
...