Tag Archives: pushdown

Writing a Spark Data Source

Note: This is being written as of early December of 2015 and currently assumes Spark 1.5.2 API. The data sources API has been out for a few versions now but it’s still stabilizing so some of this might change to be out of date.

I’m writing this guide as part of my own exploration on how to write a data source using the Spark SQL Data Sources API. We’ll start with exploration of the interfaces and features, then dive into 2 examples: Parquet and Spark JDBC. Finally we will cover an end to end example of writing your own simple data source.

Continue reading