{"id":2325,"date":"2016-08-08T08:06:24","date_gmt":"2016-08-08T08:06:24","guid":{"rendered":"http:\/\/www.doanduyhai.com\/blog\/?p=2325"},"modified":"2018-07-28T07:35:56","modified_gmt":"2018-07-28T07:35:56","slug":"zeppelinsparkcassandra-integration-tutorial","status":"publish","type":"post","link":"https:\/\/www.doanduyhai.com\/blog\/?p=2325","title":{"rendered":"Zeppelin\/Spark\/Cassandra integration tutorial"},"content":{"rendered":"<p> In this post, I&#8217;ll cover in detail all the steps necessary to integrate <strong><a href=\"http:\/\/zeppelin.apache.org\" target=\"_blank\">Apache Zeppelin<\/a><\/strong>, <strong><a href=\"http:\/\/spark.apache.org\" target=\"_blank\">Apache Spark<\/a><\/strong> and <strong><a href=\"http:\/\/cassandra.apache.org\" target=\"_blank\">Apache Cassandra<\/a><\/strong>.<\/p>\n<blockquote><p>For the remaining of this post, Zeppelin == Apache Zeppelin\u2122, Spark == Apache Spark\u2122 and Cassandra == Apache Cassandra\u2122<\/p><\/blockquote>\n<p> If you are not familiar with Zeppelin, I recommend reading my introduction slides <strong><a href=\"http:\/\/www.slideshare.net\/doanduyhai\/apache-zeppelin-the-missing-component-for-the-big-data-ecosystem-58863557\" target=\"_blank\">here<\/a><\/strong><\/p>\n<p> The integration between <strong>Spark<\/strong> and <strong>Cassandra<\/strong> is achieved using the <strong><a href=\"https:\/\/github.com\/datastax\/spark-cassandra-connector\/blob\/master\/doc\/0_quick_start.md\" target=\"_blank\">Spark-Cassandra connector<\/a><\/strong>. <\/p>\n<p> Natively <strong>Zeppelin<\/strong> does support <strong>Spark<\/strong> out of the box. But making <strong>Zeppelin<\/strong> supporting the Spark-Cassandra integration requires some extra work.<\/p>\n<h1 id=\"zeppelin-spark-workflow\">Zeppelin &#8211; Spark workflow<\/h1>\n<p> With Zeppelin, any interpreter is executed in a separated JVM and it does apply to the Spark interpreter too.<\/p>\n<p> The interpreter is first launched in the class <code><strong>RemoteInterpreterProcess<\/strong><\/code>:<\/p>\n<pre class=\"brush: java; title: ; notranslate\" title=\"\">\r\n public int reference(InterpreterGroup interpreterGroup) {\r\n        ...\r\n        if (!isInterpreterAlreadyExecuting) {\r\n          try {\r\n            port = RemoteInterpreterUtils.findRandomAvailablePortOnAllLocalInterfaces();\r\n          } catch (IOException e1) {\r\n            throw new InterpreterException(e1);\r\n          }\r\n          CommandLine cmdLine = CommandLine.parse(interpreterRunner);\r\n          cmdLine.addArgument(&quot;-d&quot;, false);\r\n          cmdLine.addArgument(interpreterDir, false);\r\n          cmdLine.addArgument(&quot;-p&quot;, false);\r\n          cmdLine.addArgument(Integer.toString(port), false);\r\n          cmdLine.addArgument(&quot;-l&quot;, false);\r\n          cmdLine.addArgument(localRepoDir, false);\r\n\r\n          executor = new DefaultExecutor();\r\n\r\n          watchdog = new ExecuteWatchdog(ExecuteWatchdog.INFINITE_TIMEOUT);\r\n          executor.setWatchdog(watchdog);\r\n\r\n          running = true;\r\n          ...\r\n<\/pre>\n<p> Indeed each interpreter is bootstrapped using the <code><strong>interpreterRunner<\/strong><\/code> which is the shell script <code><strong>$ZEPPELIN_HOME\/bin\/interpreter.sh<\/strong><\/code><\/p>\n<p> Depending on the interpreter type and run mode, the execution is launched with a different set of environment. Extract of the <code><strong>$ZEPPELIN_HOME\/bin\/interpreter.sh<\/strong><\/code> script:<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\nif [[ -n &quot;${SPARK_SUBMIT}&quot; ]]; then\r\n    ${SPARK_SUBMIT} --class ${ZEPPELIN_SERVER} --driver-class-path &quot;${ZEPPELIN_CLASSPATH_OVERRIDES}:${CLASSPATH}&quot; --driver-java-options &quot;${JAVA_INTP_OPTS}&quot; ${SPARK_SUBMIT_OPTIONS} ${SPARK_APP_JAR} ${PORT} &amp;\r\nelse\r\n    ${ZEPPELIN_RUNNER} ${JAVA_INTP_OPTS} ${ZEPPELIN_INTP_MEM} -cp ${ZEPPELIN_CLASSPATH_OVERRIDES}:${CLASSPATH} ${ZEPPELIN_SERVER} ${PORT} &amp;\r\nfi\r\n<\/pre>\n<p>There is a small detail here that is critical for the integration of the Spark-Cassandra connector, which is the <strong>classpath<\/strong> used to launch the interpreter process. The idea is to include the Spark-Cassandra connector dependencies in this classpath so that <strong>Zeppelin<\/strong> can access <strong>Cassandra<\/strong> using <strong>Spark<\/strong><\/p>\n<h1 id=\"configuration_matrix\">Configuration matrix<\/h1>\n<p> There are many parameters and configurations to run <strong>Zeppelin<\/strong> with <strong>Spark<\/strong> and <strong>Cassandra<\/strong>:<\/p>\n<ol>\n<li>Standard Zeppelin binaries<\/li>\n<li>Custom Zeppelin build with the Spark-Cassandra connector<\/li>\n<li>Zeppelin connecting to the local Spark runner<\/li>\n<li>Zeppelin connecting to a stand-alone Spark cluster<\/li>\n<li>Using Zeppelin with OSS Spark<\/li>\n<li>Using Zeppelin with DSE (<strong><a href=\"http:\/\/docs.datastax.com\/en\/latest-dse\/index.html\" target=\"_blank\">Datastax Enterprise<\/a><\/strong>)<\/li>\n<\/ol>\n<h1 id=\"standard_zeppelin\">Standard Zeppelin build with local Spark<\/h1>\n<p>If you are using the default <strong>Zeppelin<\/strong> binaries (downloaded from the official repo), to make the Spark-Cassandra integration work, you would have to<\/p>\n<ol>\n<li>In the <strong>Interpreter<\/strong> menu, add the property <em>spark.cassandra.connection.host<\/em> to the Spark interpreter. The value should point to a single or a list of IP addresses of your Cassandra cluster<br \/>\n  <div id=\"attachment_2334\" style=\"width: 955px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-2334\" loading=\"lazy\" src=\"https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_cassandra_connection_host.png\" alt=\"spark_cassandra_connection_host\" width=\"945\" height=\"101\" class=\"size-full wp-image-2334\" srcset=\"https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_cassandra_connection_host.png 945w, https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_cassandra_connection_host-300x32.png 300w, https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_cassandra_connection_host-768x82.png 768w\" sizes=\"(max-width: 945px) 100vw, 945px\" \/><p id=\"caption-attachment-2334\" class=\"wp-caption-text\">spark_cassandra_connection_host<\/p><\/div>\n <\/li>\n<li>last but not least, you&#8217;d have to add also the Spark-Cassandra connector as dependency to the interpreter<br \/>\n <div id=\"attachment_2336\" style=\"width: 955px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-2336\" loading=\"lazy\" src=\"https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_dependencies.png\" alt=\"spark_cassandra_dependencies\" width=\"945\" height=\"372\" class=\"size-full wp-image-2336\" srcset=\"https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_dependencies.png 945w, https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_dependencies-300x118.png 300w, https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_dependencies-768x302.png 768w\" sizes=\"(max-width: 945px) 100vw, 945px\" \/><p id=\"caption-attachment-2336\" class=\"wp-caption-text\">spark_cassandra_dependencies<\/p><\/div><\/p>\n<blockquote><p>  when adding the dependency and the property, <strong>do not forget to click on the + icon<\/strong> to force Zeppelin to add your change otherwise it will be lost<\/p><\/blockquote>\n<\/li>\n<\/ol>\n<p> What happens at runtime is Zeppelin will download the declared dependencie(s) and all its transitive dependencie(s) from Maven central and\/or from your local Maven repository (if any).<\/p>\n<p> Those dependencies will then be stored inside the local repository folder defined by the property: <em>zeppelin.dep.localrepo<\/em>.<\/p>\n<p> Also, if you go back to the interpreter configuration menu (after a successful run), you&#8217;ll see a new property added by Zeppelin: <code><strong>zeppelin.interpreter.localRepo<\/strong><\/code><\/p>\n<div id=\"attachment_2341\" style=\"width: 954px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-2341\" loading=\"lazy\" src=\"https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/interpreter_repo.png\" alt=\"interpreter_repo\" width=\"944\" height=\"67\" class=\"size-full wp-image-2341\" srcset=\"https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/interpreter_repo.png 944w, https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/interpreter_repo-300x21.png 300w, https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/interpreter_repo-768x55.png 768w\" sizes=\"(max-width: 944px) 100vw, 944px\" \/><p id=\"caption-attachment-2341\" class=\"wp-caption-text\">interpreter_repo<\/p><\/div>\n<p> The last string in the folder (<strong>2BTPVTBVH<\/strong> in the example) is the id of the interpreter instance. All transitive dependencies are downloaded and stored as jar files inside <code><em>$ZEPPELIN_HOME\/local-repo\/&lt;INTERPRETER_ID&gt;<\/em><\/code> and their content (.class files) is extracted into <code><em>$ZEPPELIN_HOME\/local-repo<\/em><\/code><\/p>\n<p><strong>If your Zeppelin server is behind <strong>corporate firewall<\/strong>, the download will fail so Spark won&#8217;t be able to connect to Cassandra (you&#8217;ll get a <code>ClassNotFoundException<\/code> in the Spark interpreter logs).<\/strong><\/p>\n<p> The solution in this case is:<\/p>\n<ol>\n<li>either download manually all the dependencies and put them into the folder <em>zeppelin.dep.localrepo<\/em><\/li>\n<li>or build Zeppelin with the Spark-Cassandra connector integrated (see right after)<\/li>\n<\/ol>\n<h1 id=\"custom_zeppelin\">Custom Zeppelin build with local Spark<\/h1>\n<p> You&#8217;ll need to build Zeppelin yourself, using one of the available Maven profiles <strong>cassandra-spark-1.x<\/strong> to get the correct Spark version.<\/p>\n<p> Those profiles are defined in the <code><strong>$ZEPPELIN_HOME\/spark-dependencies\/pom.xml<\/strong><\/code> file.<\/p>\n<p> For each <strong>cassandra-spark-1.x<\/strong>, you can override the defined Spark version using the <code><em>-Dspark.version=x.y.z<\/em><\/code> flag for the build. To change the Spark-Cassandra connector version, you&#8217;ll need to edit the <code><strong>$ZEPPELIN_HOME\/spark-dependencies\/pom.xml<\/strong><\/code> file yourself. Similarly if you want to use the latest version of the Spark-Cassandra connector and a profile does not exist, just edit the file and add your own profile.<\/p>\n<p> In a nutshell, the build command is<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\n&gt;mvn clean package -Pbuild-distr -Pcassandra-spark-1.x -DskipTests \r\n<\/pre>\n<p>or with the Spark version manually forced:<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\n&gt;mvn clean package -Pbuild-distr -Pcassandra-spark-1.x -Dspark.version=x.y.z -DskipTests \r\n<\/pre>\n<p> This will force Zeppelin to add all transitive dependencies for the Spark-Cassandra connector into the big\/fat jar file located in <code><em>$ZEPPELIN_HOME\/interpreter\/spark\/dep\/zeppelin-spark-dependencies-&lt;ZEPPELIN_VERSION&gt;.jar<\/em><\/code><\/p>\n<p> One easy way to verify that the Spark-Cassandra connector has been correctly embedded into this file is to copy it somewhere and extract its content to check using the command <code><em>jar -xvf zeppelin-spark-dependencies-&lt;ZEPPELIN_VERSION&gt;.jar<\/em><\/code> <\/p>\n<p> Once built, you can use this special version of Zeppelin without declaring any dependency to the Spark-Cassandra connector. You still have to set the <code><strong>spark.cassandra.connection.host<\/strong><\/code> property on the Spark interpreter<\/p>\n<h1 id=\"standalone_oss_spark\">Zeppelin connecting to a stand-alone OSS Spark cluster<\/h1>\n<p> Until now, we have supposed that you are using the local Spark mode of Zeppelin (<strong><em>master = local[*]<\/em><\/strong>). In this section, we want Zeppelin to connect to an existing stand-alone Spark cluster (Spark running on Yarn and Mesos is not covered here because it is recommended to run Spark in stand-alone mode with Cassandra to benefit from <strong>data-locality<\/strong>).<\/p>\n<p> First, you&#8217;ll need to set the Spark master property for the Spark interpreter. Instead of <strong>local[*]<\/strong>, put a real address like <strong>spark:\/\/x.y.z:7077<\/strong>.<\/p>\n<div id=\"attachment_2345\" style=\"width: 954px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-2345\" loading=\"lazy\" src=\"https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_master_url.png\" alt=\"spark_master_url\" width=\"944\" height=\"103\" class=\"size-full wp-image-2345\" srcset=\"https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_master_url.png 944w, https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_master_url-300x33.png 300w, https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_master_url-768x84.png 768w\" sizes=\"(max-width: 944px) 100vw, 944px\" \/><p id=\"caption-attachment-2345\" class=\"wp-caption-text\">spark_master_url<\/p><\/div>\n<p> The extract of the shell script from the first section showed that Zeppelin will invoke the <code><strong>spark-submit<\/strong><\/code> command, passing its own Spark jar with all the transitive dependencies using the parameter <code><em>--driver-class-path<\/em><\/code>. <\/p>\n<p> But where does Zeppelin fetches all the dependencies jar ? From the local repository seen earlier !!!<\/p>\n<blockquote><p><strong>As a consequence, if you add the Spark-Cassandra connector as dependency (standard Zeppelin build) and you run against a stand-alone Spark cluster, it will fail because the local repository will be empty!!!. Run first a simple Spark job in local Spark mode to let Zeppelin a chance to download the dependencies before switching to the stand-alone Spark<\/strong><\/p><\/blockquote>\n<p> But it&#8217;s not sufficient, on your stand-alone Spark cluster, you must also add the Spark-Cassandra connector dependencies into Spark classpath so that the workers can connect to Cassandra.<\/p>\n<p> How do to that ?<\/p>\n<ol>\n<li>edit <code><em>$SPARK_HOME\/conf\/spark-env.sh<\/em><\/code> file and add the Spark-Cassandra dependencies to the <strong>SPARK_CLASSPATH<\/strong> variable.<br \/>\n  <div id=\"attachment_2348\" style=\"width: 955px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-2348\" loading=\"lazy\" src=\"https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_classpath.png\" alt=\"spark_classpath\" width=\"945\" height=\"98\" class=\"size-full wp-image-2348\" srcset=\"https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_classpath.png 945w, https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_classpath-300x31.png 300w, https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_classpath-768x80.png 768w\" sizes=\"(max-width: 945px) 100vw, 945px\" \/><p id=\"caption-attachment-2348\" class=\"wp-caption-text\">spark_classpath<\/p><\/div><\/p>\n<p>  As you can see, it&#8217;s not just the simple spark-cassandra connector jar we need but the <strong>assembly jar<\/strong> e.g. the <strong>fat jar which includes all transitive dependencies<\/strong>.<\/p>\n<p>  To get this jar, you&#8217;ll have to build it yourself:<\/p>\n<ul>\n<li><code>git clone https:\/\/github.com\/datastax\/spark-cassandra-connector\/<\/code><\/li>\n<li><code>sbt assembly<\/code><\/li>\n<\/ul>\n<\/li>\n<li>another alternative is to execute the <code><strong>spark-submit<\/strong><\/code> command with the <code><em>--package com.datastax.spark:spark-cassandra-connector_2.10:&lt;connector_version&gt;<\/em><\/code>\tflag. In this case, Spark is clever enough to fetch all the transitive dependencies for you from a remote repository.<br \/>\n<\/p>\n<blockquote><p>The same warning about corporate firewall applies here.<\/p><\/blockquote>\n<p>How would you add this extra <code><em>--package<\/em><\/code> flag to Zeppelin <code><strong>spark-submit<\/strong><\/code> ? By exporting the <strong>SPARK_SUBMIT_OPTIONS<\/strong> environment variable in <code><em>$ZEPPELIN_HOME\/conf\/zeppelin-env.sh<\/em><\/code><\/p>\n<p> <div id=\"attachment_2349\" style=\"width: 954px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-2349\" loading=\"lazy\" src=\"https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_submit_options.png\" alt=\"spark_submit_options\" width=\"944\" height=\"67\" class=\"size-full wp-image-2349\" srcset=\"https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_submit_options.png 944w, https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_submit_options-300x21.png 300w, https:\/\/www.doanduyhai.com\/blog\/wp-content\/uploads\/2016\/07\/spark_submit_options-768x55.png 768w\" sizes=\"(max-width: 944px) 100vw, 944px\" \/><p id=\"caption-attachment-2349\" class=\"wp-caption-text\">spark_submit_options<\/p><\/div>\n<\/li>\n<\/ol>\n<p> The solution of using the <code><em>--package<\/em><\/code> flag seems easy but not suitable for a recurrent Spark job because it will force Spark to download all the dependencies.<\/p>\n<p> If your Spark job is not a one-shot job, I would recommend building the assembly jar for the Spark-Cassandra connector and set it in the <strong>SPARK_CLASSPATH<\/strong> variable so that is it available for all of your Spark jobs. <\/p>\n<p> I have pre-built some assembly jars (using <strong>Scala 2.10<\/strong>) you can download <strong><a href=\"https:\/\/github.com\/doanduyhai\/zeppelin_custom_builds\/tree\/master\/spark-cassandra-connector-assembly\" target=\"_blank\">here<\/a><\/strong><\/p>\n<h1 id=\"standalone_dse_spark\">Zeppelin connecting to a stand-alone Datastax Enterprise cluster<\/h1>\n<p> Instead of using an open-source Spark, using <a href=\"http:\/\/docs.datastax.com\/en\/latest-dse\/index.html\" target=\"_blank\">Datastax Enterprise<\/a> (DSE) makes your life easier because all the dependencies of the Spark-Cassandra connector are included by default in the build of Spark. So there is neither <strong>SPARK_CLASSPATH<\/strong> variable to set nor <code><em>--package<\/em><\/code> flag to manage on Zeppelin side.<\/p>\n<p> But you&#8217;ll still need to either declare the Spark-Cassandra connector dependency on Zeppelin side or build Zeppelin with the connector embedded.<\/p>\n<blockquote><p> Pay attention if you want to build Zeppelin for DSE because each version of <strong>DSE 4.8.x<\/strong> is using a custom Spark version and Hadoop 1 dependencies.<\/p><\/blockquote>\n<h1 id=\"zeppelin_custom_builds\">Zeppelin custom builds<\/h1>\n<p> To make your life easier, I have created a list of custom Zeppelin builds for each version of OSS Spark\/DSE. All the zeppelin custom builds are located in the shared <strong><a href=\"https:\/\/drive.google.com\/folderview?id=0B6wR2aj4Cb6wQ01aR3ItR0xUNms\" target=\"_blank\">Google drive folder<\/a><\/strong>.<\/p>\n<p>The custom Maven pom file <code><a href=\"https:\/\/github.com\/doanduyhai\/zeppelin_custom_builds\/blob\/master\/spark-dependencies-pom.xml#L949-L1293\" target=\"_blank\">spark-dependencies-pon.xml<\/a><\/code> used for building those versions is provided as a reference<\/p>\n<table>\n<thead>\n<tr>\n<th>Zeppelin version<\/th>\n<th>Spark version\/DSE version<\/th>\n<th>Spark-Cassandra connector version<\/th>\n<th>Tarball<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>0.6.0<\/td>\n<td>Spark 1.4.0<\/td>\n<td>1.4.4<\/td>\n<td>zeppelin-0.6.0-cassandra-spark-1.4.0.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.6.0<\/td>\n<td>Spark 1.4.1<\/td>\n<td>1.4.4<\/td>\n<td>zeppelin-0.6.0-cassandra-spark-1.4.1.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.6.0<\/td>\n<td>Spark 1.5.0<\/td>\n<td>1.5.1<\/td>\n<td>zeppelin-0.6.0-cassandra-spark-1.5.0.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.6.0<\/td>\n<td>Spark 1.5.1<\/td>\n<td>1.5.1<\/td>\n<td>zeppelin-0.6.0-cassandra-spark-1.5.1.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.6.0<\/td>\n<td>Spark 1.5.2<\/td>\n<td>1.5.1<\/td>\n<td>zeppelin-0.6.0-cassandra-spark-1.5.2.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.6.0<\/td>\n<td>Spark 1.6.0<\/td>\n<td>1.6.0<\/td>\n<td>zeppelin-0.6.0-cassandra-spark-1.6.0.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.6.0<\/td>\n<td>Spark 1.6.1<\/td>\n<td>1.6.0<\/td>\n<td>zeppelin-0.6.0-cassandra-spark-1.6.1.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.6.0<\/td>\n<td>Spark 1.6.2<\/td>\n<td>1.6.0<\/td>\n<td>zeppelin-0.6.0-cassandra-spark-1.6.2.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.6.0<\/td>\n<td>DSE 4.8.3, DSE 4.8.4 (Spark 1.4.1)<\/td>\n<td>1.4.1<\/td>\n<td>zeppelin-0.6.0-dse-4.8.3-4.8.4.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.6.0<\/td>\n<td>DSE 4.8.5, DSE 4.8.6 (Spark 1.4.1)<\/td>\n<td>1.4.2<\/td>\n<td>zeppelin-0.6.0-dse-4.8.5-4.8.6.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.6.0<\/td>\n<td>DSE 4.8.7 (Spark 1.4.1)<\/td>\n<td>1.4.3<\/td>\n<td>zeppelin-0.6.0-dse-4.8.7.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.6.0<\/td>\n<td>DSE 4.8.8, DSE 4.8.9 (Spark 1.4.1)<\/td>\n<td>1.4.4<\/td>\n<td>zeppelin-0.6.0-dse-4.8.8-4.8.9.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.6.0<\/td>\n<td>DSE 5.0.0, DSE 5.0.1 (Spark 1.6.1)<\/td>\n<td>1.6.0<\/td>\n<td>zeppelin-0.6.0-dse-5.0.0-5.0.1.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.6.1<\/td>\n<td>DSE 5.0.2, DSE 5.0.3 (Spark 1.6.2)<\/td>\n<td>1.6.0<\/td>\n<td>zeppelin-0.6.1-dse-5.0.2-5.0.3.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.7.0<\/td>\n<td>DSE 5.0.4 (Spark 1.6.2)<\/td>\n<td>1.6.2<\/td>\n<td>zeppelin-0.7.0-DSE-5.0.4.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.7.0<\/td>\n<td>DSE 5.0.5 (Spark 1.6.2)<\/td>\n<td>1.6.3<\/td>\n<td>zeppelin-0.7.0-DSE-5.0.4.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.7.0<\/td>\n<td>DSE 5.0.6 (Spark 1.6.3)<\/td>\n<td>1.6.4<\/td>\n<td>zeppelin-0.7.0-DSE-5.0.6.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.7.1<\/td>\n<td>DSE 5.1.0 (Spark 2.0.2)<\/td>\n<td>2.0.1<\/td>\n<td>zeppelin-0.7.1-dse-5.1.0.tar.gz<\/td>\n<\/tr>\n<tr>\n<td>0.7.1<\/td>\n<td>DSE 5.1.1 (Spark 2.0.2)<\/td>\n<td>2.0.2<\/td>\n<td>zeppelin-0.7.1-dse-5.1.1.tar.gz<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n","protected":false},"excerpt":{"rendered":"<p>In this post, I&#8217;ll cover in detail all the steps necessary to integrate Apache Zeppelin, Apache Spark and Apache Cassandra. For the remaining of this post, Zeppelin == Apache Zeppelin\u2122, Spark == Apache Spark\u2122 and Cassandra == Apache Cassandra\u2122 If&#8230;<br \/><a class=\"read-more-button\" href=\"https:\/\/www.doanduyhai.com\/blog\/?p=2325\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[57,10],"tags":[],"_links":{"self":[{"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2325"}],"collection":[{"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2325"}],"version-history":[{"count":53,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2325\/revisions"}],"predecessor-version":[{"id":2369,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2325\/revisions\/2369"}],"wp:attachment":[{"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2325"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2325"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2325"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}