Tuesday, January 12, 2016

Debugging a local Spark job using IntelliJ

A coworker was working on a local Spark job and shared how he set up his environment for debugging the job (which is basically the same as debugging any other remote process). These are the instructions I followed:

1. Create a remote debug configurations.

Go to IntelliJ's "Run | Edit Configurations" screen
Click on the "+" to "Add New Configuration"
Select "Remote"

2. Copy the command line argument to use and modify it however you see fit.

I'm using Java 8, so I used the example command line arguments from the top edit box. The only change I made was to set "suspend=y" so the spark job would stop and wait for me to start my "Remote Debug" process.

This is what I used: 

3. Export the command line arg as SPARK_JAVA_OPTS (Spark uses this value when you submit a spark job).

I set the SPARK_JAVA_OPTS like this:

export SPARK_JAVA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

4. Start the spark job.

You should see your spark job start up, and then pause with the following line printed on the console:

Listening for transport dt_scoket at address: 5005

5. In IntelliJ, create whatever breakpoints you want to use and start the remote debug configuration.