I am trying to write code for converting data in Java RDD to a histogram so that I can bin the data in a certain way. For example, for the data I want to create a histogram of sizes such that I can find out which bin contains how many entries of a certain size range. I am able to get the value in different RDD's but I am not sure what I am missing here.
Is there an easier way to do this?
0 - 1 GB - 2 entries
1 - 5GB - 4 entries
and so on
EntryWithSize {
long size;
String entryId;
String groupId;
}
JavaRDD entries = getEntries();
JavaRDD histoSizeJavaRDD = entryJavaRDD.keyBy(EntryWithSize::getGroupId)
.combineByKey(
HistoSize::new,
(HistoSize h, EntryWithSize y) -> h.mergeWith(new HistoSize(y)),
HistoSize::mergeWith
).values();
@Data
@AllArgsConstructor
static class HistoSize implements Serializable {
int oneGB;
int fiveGB;
public HistoSize(EntryWithSize entry) {
addSize(entry);
}
private void addSize(EntryWithSize entry) {
long size = entry.getSize();
if (size <= ONE_GB) {
oneGB++;
} else {
fiveGB++;
}
}
public HistoSize mergeWith(HistoSize other) {
oneGB += other.oneGB;
fiveGB += other.fiveGB;
return this;
}
}
JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)