Getting Started with Auron for Apache Spark

Build from source

To build Auron, please follow the steps below:

Install Rust

The native execution lib is written in Rust. So you're required to install Rust (nightly) first for compilation. We recommend you to use rustup.

Install JDK

Auron has been well tested on jdk8/11/17, should work fine with higher versions.

Check out the source code.

git clone git@github.com:kwai/auron.git
cd auron

Build the project.

Specify shims package of which spark version that you would like to run on.

Currently we have supported these shims:

spark-3.0 - for spark3.0.x
spark-3.1 - for spark3.1.x
spark-3.2 - for spark3.2.x
spark-3.3 - for spark3.3.x
spark-3.4 - for spark3.4.x
spark-3.5 - for spark3.5.x.

You could either build Auron in pre mode for debugging or in release mode to unlock the full potential of Auron.

SHIM=spark-3.5 # or spark-3.0/spark-3.1/spark-3.2/spark-3.3/spark-3.4/spark-3.5
MODE=release # or pre
JDK=jdk-8
./build/mvn package -P"${SHIM}" -P"${MODE}" -P${JDK}

After the build is finished, a fat Jar package that contains all the dependencies will be generated in the target directory.

Build with docker

You can use the following command to build a centos-7 compatible release:

SHIM=spark-3.5 MODE=release ./release-docker.sh

Run Spark Job with Auron Accelerator

This section describes how to submit and configure a Spark Job with Auron support.

move auron jar package to spark client classpath (normally spark-xx.xx.xx/jars/).
add the follow confs to spark configuration in spark-xx.xx.xx/conf/spark-default.conf:

spark.auron.enable true
spark.sql.extensions org.apache.spark.sql.auron.AuronSparkSessionExtension
spark.shuffle.manager org.apache.spark.sql.execution.auron.shuffle.AuronShuffleManager
spark.memory.offHeap.enabled false

# suggested executor memory configuration
spark.executor.memory 4g
spark.executor.memoryOverhead 4096

submit a query with spark-sql, or other tools like spark-thriftserver:

spark-sql -f tpcds/q01.sql