Getting Started with Auron for Apache Spark
Build from source
To build Auron, please follow the steps below:
- Install Rust
The native execution lib is written in Rust. So you're required to install Rust (nightly) first for compilation. We recommend you to use rustup.
- Install JDK
Auron has been well tested on jdk8/11/17, should work fine with higher versions.
- Check out the source code.
git clone git@github.com:kwai/auron.git
cd auron
- Build the project.
Specify shims package of which spark version that you would like to run on.
Currently we have supported these shims:
- spark-3.0 - for spark3.0.x
- spark-3.1 - for spark3.1.x
- spark-3.2 - for spark3.2.x
- spark-3.3 - for spark3.3.x
- spark-3.4 - for spark3.4.x
- spark-3.5 - for spark3.5.x.
You could either build Auron in pre mode for debugging or in release mode to unlock the full potential of Auron.
SHIM=spark-3.5 # or spark-3.0/spark-3.1/spark-3.2/spark-3.3/spark-3.4/spark-3.5
MODE=release # or pre
JDK=jdk-8
./build/mvn package -P"${SHIM}" -P"${MODE}" -P${JDK}
After the build is finished, a fat Jar package that contains all the dependencies will be generated in the target
directory.
Build with docker
You can use the following command to build a centos-7 compatible release:
SHIM=spark-3.5 MODE=release ./release-docker.sh
Run Spark Job with Auron Accelerator
This section describes how to submit and configure a Spark Job with Auron support.
move auron jar package to spark client classpath (normally
spark-xx.xx.xx/jars/
).add the follow confs to spark configuration in
spark-xx.xx.xx/conf/spark-default.conf
:
spark.auron.enable true
spark.sql.extensions org.apache.spark.sql.auron.AuronSparkSessionExtension
spark.shuffle.manager org.apache.spark.sql.execution.auron.shuffle.AuronShuffleManager
spark.memory.offHeap.enabled false
# suggested executor memory configuration
spark.executor.memory 4g
spark.executor.memoryOverhead 4096
- submit a query with spark-sql, or other tools like spark-thriftserver:
spark-sql -f tpcds/q01.sql