Quantcast
Channel: Simplilearn Community
Viewing all articles
Browse latest Browse all 50081

How does Hadoop process records split across block boundaries?

$
0
0
The splits are handled by the client by InputFormat.getSplits, so a look at FileInputFormat gives the following info:

  • For each input file, get the file length, the block size and calculate the split size as max(minSize, min(maxSize, blockSize)) where maxSize corresponds to mapred.max.split.size and minSize is mapred.min.split.size.
  • Divide the file into different FileSplits based on the split size calculated above. What's important here is that each FileSplit is...

How does Hadoop process records split across block boundaries?

Viewing all articles
Browse latest Browse all 50081

Trending Articles