Hi,
I need to process 1 TB of data (assume data is completely unstructured ). Data is Stored in HDFS and started MAP. after the mapper, output of the data is same (1TB ) .
1. Is the 1 TB of data processed from memory in the Reducer Phase? in other words after the Shuffling phase where the data is stored?
2. In case,If I am not sure about the output size of the map, then how to pick the RAM of the Data node for reducer
I need to process 1 TB of data (assume data is completely unstructured ). Data is Stored in HDFS and started MAP. after the mapper, output of the data is same (1TB ) .
1. Is the 1 TB of data processed from memory in the Reducer Phase? in other words after the Shuffling phase where the data is stored?
2. In case,If I am not sure about the output size of the map, then how to pick the RAM of the Data node for reducer