in Education by
I am trying to write code for converting data in Java RDD to a histogram so that I can bin the data in a certain way. For example, for the data I want to create a histogram of sizes such that I can find out which bin contains how many entries of a certain size range. I am able to get the value in different RDD's but I am not sure what I am missing here. Is there an easier way to do this? 0 - 1 GB - 2 entries 1 - 5GB - 4 entries and so on EntryWithSize { long size; String entryId; String groupId; } JavaRDD entries = getEntries(); JavaRDD histoSizeJavaRDD = entryJavaRDD.keyBy(EntryWithSize::getGroupId) .combineByKey( HistoSize::new, (HistoSize h, EntryWithSize y) -> h.mergeWith(new HistoSize(y)), HistoSize::mergeWith ).values(); @Data @AllArgsConstructor static class HistoSize implements Serializable { int oneGB; int fiveGB; public HistoSize(EntryWithSize entry) { addSize(entry); } private void addSize(EntryWithSize entry) { long size = entry.getSize(); if (size <= ONE_GB) { oneGB++; } else { fiveGB++; } } public HistoSize mergeWith(HistoSize other) { oneGB += other.oneGB; fiveGB += other.fiveGB; return this; } } JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by
I was able to get it working by using a reduce on final pair rdd. My test data was wrong which was causing red herring in the output. Function2 reduceSumFunc = (a, b) -> (new HistoSize( a.oneGB + b.oneGB, a.fiveGB + b.fiveGB, )); HistoSize finalSize = histoSizeJavaRDD.reduce(reduceSumFunc);

Related questions

0 votes
    so I am constructing a recommedation model using ALS package And make all user-product list by cartesian ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jul 8, 2022 in Education by JackTerrance
0 votes
    Hi guys simple question for experienced guys. I have a spark job reading files under a path. I wanted ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 6, 2022 in Education by JackTerrance
0 votes
    Definition says: RDD is immutable distributed collection of objects I don't quite understand what does it mean. Is ... one please help. Select the correct answer from above options...
asked Jan 30, 2022 in Education by JackTerrance
0 votes
    I'm trying to run a spark application using bin/spark-submit. When I reference my application jar inside my local ... version 1.2.1 Select the correct answer from above options...
asked Jan 29, 2022 in Education by JackTerrance
0 votes
    I want to know in simple language what are all the differences between rdd and dataframes? Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    Is spark dependent on Hadoop? If not, then I can run Spark without Hadoop right? Will I miss any features if I do Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    How can I convert an RDD to a dataframe? I converted a data frame to rdd using .rdd. After processing it I ... convert it back to rdd Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    Which function is used to create a histogram for visualisation in R programming language? (a) Library (b) Hist ... of R Programming Select the correct answer from above options...
asked Feb 13, 2022 in Education by JackTerrance
0 votes
    alues (number of iterations taken for an algorithm to solve a problem). The answer should be called something like this: ... for a simpe class to calculate a histogram of integer v...
asked Feb 11, 2022 in Education by JackTerrance
0 votes
    Which of the following function is used for plotting histogram? (a) hist() (b) histog() (c) histg() ... Regression of R Programming Select the correct answer from above options...
asked Feb 10, 2022 in Education by JackTerrance
0 votes
    Define Histogram chart in Tableau?...
asked Nov 1, 2020 in Technology by JackTerrance
0 votes
    My goal is to merge tables from different schema into one single schema so I would like to execute this ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 23, 2022 in Education by JackTerrance
0 votes
    Just started working with Python and I would like to know how to create a set only from user input ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 17, 2022 in Education by JackTerrance
0 votes
    I try to create database from entity framework code first follow with this tutorial http://www.asp.net/mvc ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 2, 2022 in Education by JackTerrance
0 votes
    The _________ function is used to generate summary statistics from the data frame within strata defined by a ... R Programming Select the correct answer from above options...
asked Feb 13, 2022 in Education by JackTerrance
...