in Education by
I have 2 large files (each about 500k lines or 85mb) containing the checksum of the file and the filepath itself. What is the best way to get the differences between the files based on the checksum? I can write a Java program, script, etc. but the goal is it has to be efficient. For example, I have FileA: ec7a063d3990cf7d8481952ffb45f1d8b490b1b5 /home/user/first.txt e0f886f2124804b87a81defdc38ad2b492458f34 /home/user/second.txt File B: 650bc1eb1b24604819eb342f2ebc1bab464d9210 /home/user/third.txt ec7a063d3990cf7d8481952ffb45f1d8b490b1b5 /home/user/blah/dup.txt I want to output two files containing the unique files in File A and B. UniqueA e0f886f2124804b87a81defdc38ad2b492458f34 /home/user/second.txt UniqueB 650bc1eb1b24604819eb342f2ebc1bab464d9210 /home/user/third.txt In this case, "first.txt" and "dup.txt" are the same since their checksum is the same so I exclude it as not being unique. What is the most efficient way to do this? The files aren't sorted in any way. JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by
So here's a quick answer, but it's not so efficient: $ join -v1 <(sort FileA) <(sort FileB) | tee UniqueA e0f886f2124804b87a81defdc38ad2b492458f34 /home/user/second.txt $ join -v2 <(sort FileA) <(sort FileB) | tee UniqueB 650bc1eb1b24604819eb342f2ebc1bab464d9210 /home/user/third.txt The join command matches lines from two sorted files by the key (which by default is the first field with a default delimeter of space ). The commands above are not so efficient, though, because we are sorting the files twice: once to get the values unique to the first file (-v1) and then again to get the unique values from the second (-v2). I'll post some improvements shortly. You can get the values that are unique in a single invocation, but the original file is lost. See this code below: $ join -v1 -v2 <(sort FileA) <(sort FileB) 650bc1eb1b24604819eb342f2ebc1bab464d9210 /home/user/third.txt e0f886f2124804b87a81defdc38ad2b492458f34 /home/user/second.txt At this point, we almost have our answer. We have all of the unmatched files from both files. Moreover, we've only sorted each file once. I believe this is efficient. However, you have lost the "origin" information. We can tag the rows with sed using this iteration or the code: $ join -v1 -v2 <(sort FileA | sed s/$/\ A/ ) <(sort FileB | sed s/$/\ B/ ) 650bc1eb1b24604819eb342f2ebc1bab464d9210 /home/user/third.txt B e0f886f2124804b87a81defdc38ad2b492458f34 /home/user/second.txt A At this point, we have our unique entries and we know the file they came from. If you must have the results in separate files, I imagine that you can accomplish this with awk ( or just more bash ). Here's one more iteration of the code with awk included: join -v1 -v2 <(sort FileA | sed s/$/\ A/ ) <(sort FileB | sed s/$/\ B/ ) | awk '{ file="Unique" $3 ; print $1,$2 > file }

Related questions

0 votes
    What is the best way to represent the attributes in a large database? (a) Relational-and (b) ... from Database Design Process in portion Normalization of Database Management...
asked Oct 10, 2021 in Education by JackTerrance
0 votes
    Yacc does not permit objects to be passed around. Because the %union can only contain POD types, complex ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 24, 2022 in Education by JackTerrance
0 votes
    Difference between token and non token based algorithm in tabular form Select the correct answer from above options...
asked Dec 14, 2021 in Education by JackTerrance
0 votes
    I am familiar with tools such as tkDiff and WinMerge and am aware of how to see the difference between ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Feb 18, 2022 in Education by JackTerrance
0 votes
    I am familiar with tools such as tkDiff and WinMerge and am aware of how to see the difference between ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Feb 17, 2022 in Education by JackTerrance
0 votes
    I've seen examples that allow you to create a manipulator that inserts delimiters but none of those manipulators are sticky. That is, ... . I want to be able to do this: std::cout...
asked Apr 5, 2022 in Education by JackTerrance
0 votes
    Is there a way to compare two structure variables?...
asked Nov 9, 2020 in Technology by JackTerrance
0 votes
    Compare the First World War and the Second World War with the help of the following points. Points First ... established after the War Please answer the above question....
asked Aug 20, 2022 in Education by JackTerrance
0 votes
    In one Google sheet workbook, lets say I have Sheet1 with 5 rows as Sheet1 And I have Sheet2 as ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 6, 2022 in Education by JackTerrance
0 votes
    I'm working in Centura 3.0 team developer and I want to copy large file VisFileCopy is not working ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 5, 2022 in Education by JackTerrance
0 votes
    Large collection of files are called ____________ (a) Fields (b) Records (c) Database (d) Sectors The ... Files in division Storage and File Structures of Database Management...
asked Oct 10, 2021 in Education by JackTerrance
0 votes
    SAP Fiori apps can be classified based on _____. 1. Line of Business 2. Industry 3. Roles A. 1,2 B. 2,3 C. 1,2,3...
asked Mar 2, 2023 in Technology by JackTerrance
0 votes
    I had to delete all the rows from a log table that contained about 5 million rows. My initial try ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 25, 2022 in Education by JackTerrance
0 votes
    I had to delete all the rows from a log table that contained about 5 million rows. My initial try ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 25, 2022 in Education by JackTerrance
...