Combining apache spark and ipython notebook

Ipython 3 renamed jupyter by its creators introduces the possibility to use different kernels and thus programming languages than the python 2 and 3 kernels. So the first thing you should do is install it as well as its notebook. The easiest way is to use the python pip installer.

for Arch linux :

sudo pacman -S python-pip
(sudo) pip install "ipython[notebook]"

Second hing to do is to get a kernell that will allow us to use Scala with jupyter/Ipython. The wiki lists quite a few and while there is a Spark Kernel I believe the scala kernell Iscala is much easier to use with dependencies. (There is also a link to the Jove project which seems interesting but I haven't had time to check it out.) So get over there and download the Iscala.jar or build it from source.

Now that you have you jar, we need to create a custom kernell for Ipython. This is made extremely simple, find your ipython config directory which should be ~/.ipython or ~/.config/ipython (for ubuntu users) and create a kernel dir. Inside create a iscala dir and copy the following text in a kernel.json file.

{
"argv": ["java", "-jar", "/path/to/IScala.jar", "--connection-file", "{connection_file}"],
"display_name": "IScala",
"language": "scala"
}

For the ones who like it, you can also add a logo-64x64.png file next to it that will be displayed in the top right corner of the notebook when using the kernell. (I downloaded the scala logo from the scala homepage.) and to make it all easier here is a copy of my kernels

Once you have this setup, using spark in ipython notebook is as easy as launching it :

iython notebook

Creating a new notebook with the iscala kernel and adding these two lines IN SEPERATE CELLS to your notebook

%libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.0"

%update

This is actually the way to add all your dependencies to your project. Once this is done, you can import and create a spark context just like you would in a standalone scala application and use it just as you would the spark-shell.

Here is my first spark notebook as an example which you can also fork from my repo

Voilà !!! you can now easily sandbox your spark development.


Latest Posts