Tag Archives: sort

Apache Spark Shuffles Explained In Depth

I originally intended this to be a much longer post about memory in Spark, but I figured it would be useful to just talk about Shuffles generally so that I could brush over it in the Memory discussion and just make it a bit more digestible. Shuffles are one of the most memory/network intensive parts of most Spark jobs so it’s important to understand when they occur and what’s going on when you’re trying to improve performance.

Continue reading