{"id":13404,"date":"2017-08-10T16:33:27","date_gmt":"2017-08-10T16:33:27","guid":{"rendered":"http:\/\/www.doanduyhai.com\/blog\/?p=13404"},"modified":"2017-08-10T16:44:43","modified_gmt":"2017-08-10T16:44:43","slug":"gremlin-recipes-8-sack-operator","status":"publish","type":"post","link":"https:\/\/www.doanduyhai.com\/blog\/?p=13404","title":{"rendered":"Gremlin recipes: 8 &#8211; sack() operator"},"content":{"rendered":"<p>This blog post is the 8<sup>th<\/sup> from the series <strong>Gremlin Recipes<\/strong>. It is recommended to read the previous blog posts first: <\/p>\n<ol>\n<li><a href=\"https:\/\/www.doanduyhai.com\/blog\/?p=13224\" target=\"_blank\"><strong>Gremlin as a Stream<\/strong><\/a><\/li>\n<li><a href=\"https:\/\/www.doanduyhai.com\/blog\/?p=13260\" target=\"_blank\"><strong>SQL to Gremlin<\/strong><\/a><\/li>\n<li><a href=\"https:\/\/www.doanduyhai.com\/blog\/?p=13285\" target=\"_blank\"><strong>Recommendation Engine traversal<\/strong><\/a><\/li>\n<li><a href=\"https:\/\/www.doanduyhai.com\/blog\/?p=13301\" target=\"_blank\"><strong>Recursive traversals<\/strong><\/a><\/li>\n<li><a href=\"https:\/\/www.doanduyhai.com\/blog\/?p=13320\" target=\"_blank\"><strong>Path object<\/strong><\/a><\/li>\n<li><a href=\"https:\/\/www.doanduyhai.com\/blog\/?p=13352\" target=\"_blank\"><strong>Projection and selection<\/strong><\/a><\/li>\n<li><a href=\"https:\/\/www.doanduyhai.com\/blog\/?p=13374\" target=\"_blank\"><strong>Variable handling<\/strong><\/a><\/li>\n<\/ol>\n<p><!--more--><\/p>\n<h1>I KillrVideo dataset<\/h1>\n<p>To illustrate this series of recipes, you need first to create the schema for <strong>KillrVideo<\/strong> and import the data. See <a href=\"https:\/\/www.doanduyhai.com\/blog\/?p=13224#killrvideo_dataset\" target=\"_blank\"><strong>here<\/strong><\/a> for more details.<\/p>\n<p>The graph schema of this dataset is :<\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/s3.amazonaws.com\/datastax-graph-schema-viewer\/index.html#\/?schema=killr_video_small.json\" height=\"600px\" width=\"100%\"><\/iframe><\/p>\n<h1>II Introduction to <code><strong>sack()<\/strong><\/code><\/h1>\n<p>The <code><strong>sack()<\/strong><\/code> step was introduced recently to help people solving some problem more efficiently.<br \/>\nIn some case people are writing <strong>Gremlin<\/strong> traversals that make use of data aggregation using path information. Typically, people will use <code><strong>path()<\/strong><\/code> and then perform &#8220;reduction&#8221; of the data in the path to get the specific result. <\/p>\n<p>Unfortunately, this is inefficient as path calculations are expensive, not mergable. The same ca be now achieved using <code><strong>sack()<\/strong><\/code>.<\/p>\n<p>A sack is a <strong>local datastructure<\/strong> relative to each traverser, unlike global datastructure as when using <code><strong>aggregate()\/store()\/group(x)\/subgraph(x)...<\/strong><\/code>. It mean that each traverser is now equipped with his own sack. A sack can contain any type of data structure:<\/p>\n<ul>\n<li><strong>primitives<\/strong>: Int, Double, Long, &#8230;<\/li>\n<li><strong>collections<\/strong>: List\/Set\/Map<\/li>\n<\/ul>\n<h3>A Defining the sack<\/h3>\n<p>The definition of this data structure should be placed right after the traversal object:<\/p>\n<pre class=\"brush: java; title: ; wrap-lines: false; notranslate\" title=\"\">\r\ngremlin&gt;g.withSack(1.0). \/\/ Define a sack containing an initial double value        \r\n  V().\r\n  ...\r\n\r\ngremlin&gt;g.withSack([] as Set). \/\/ Define a sack containing an initial empty Set        \r\n  V().\r\n  ...\r\n\r\ngremlin&gt;g.withSack([1,2,3] as Set). \/\/ Define a sack containing a pre-initialized Set        \r\n  V().\r\n  ...\r\n\r\ngremlin&gt;g.withSack([] as List). \/\/ Define a sack containing an initial empty List        \r\n  V().\r\n  ...\r\n\r\ngremlin&gt;g.withSack([:]). \/\/ Define a sack containing an initial empty Map        \r\n  V().\r\n  ...\r\n\r\ngremlin&gt;g.withSack([:].              \/\/ Define a sack containing an initial empty Map  \r\n    withDefault{key -&gt; [] as List}). \/\/ with default values being an empty list        \r\n  V().\r\n  ...\r\n\r\n<\/pre>\n<p>The complete signature of the <code><strong>withSack()<\/strong><\/code> step is  <code><strong>withSack(default_data_structure, split_operator, merge_operator)<\/strong><\/code><\/p>\n<ul>\n<li><strong>split operator<\/strong> is used to split the data structure whenever traverser is facing a branch and needs to duplicate itself. Example: <code><strong>V().has(\"user\",\"userId\", \"u861\").out(\"rated\")<\/strong><\/code>. From the User vertex now there are as many traversers as there are ratings to different movie. Thus the sack structure at User vertex get duplicated<\/li>\n<li><strong>merge operator<\/strong> is used to merge different sack data structures together whenever all traversers are converging to a single point. Example: <code><strong>g.V().hasLabel(\"movie\").out(\"director\").has(\"name\", \"Woody Allen\")<\/strong><\/code>. The traversers from many Movie vertices are converging to the single Person vertex which is <strong>Woody Allen<\/strong>. The sack at vertex <strong>Woody Allen<\/strong> is the merge of all sack data structures coming from his movies. If you do not define any merge operator, the sacks cannot be merged<\/li>\n<\/ul>\n<blockquote><p><strong>Note: for primitive data type, split operator == value copy. For other data types like collection, if you don&#8217;t define any split operator, the same object reference will be copied to all branching traversers. Consequently all those traversers will shared the same data structure. Be very wary of this<\/strong><\/p><\/blockquote>\n<p>Some example of defining collection as data structure for sack:<\/p>\n<pre class=\"brush: java; title: ; wrap-lines: false; notranslate\" title=\"\">\r\n\/\/ Define a sack with sum() as merge operator\r\n\/\/ The split operator is implicitly copy-by-value for primitives\r\ngremlin&gt;g.withSack(1.0, sum). \r\n  V().\r\n  ...\r\n\r\n\/\/ Define a sack with initial empty Set\r\n\/\/ Split function == Set::clone()\r\n\/\/ Merge function == Set::appendAll()\r\ng.withSack([] as Set){set-&gt; set.clone()}{setA,setB -&gt; setA.appendAll(setB); setA}.\r\n  V()  ...\r\n\r\n\/\/ Define a sack with initial empty Map\r\n\/\/ Split function == Map::clone()\r\n\/\/ Merge function == Map::putAll()\r\ngremlin&gt;g.withSack([:]){it.clone}{mapA,mapB -&gt; mapA.putAll(mapB); mapA}.\r\n  V().\r\n  ...\r\n<\/pre>\n<h3>B. Mutating the sack<\/h3>\n<p>To mutate the content of the sack, you can use the step <code><strong>sack(bi-function).by(projection_axis)<\/strong><\/code>.<\/p>\n<ul>\n<li><strong>bi-function<\/strong> is a function with signature: <code><strong>DATA_STRUCTURE apply(DATA_STRUCTURE current, X projected_value)<\/strong><\/code>\n<ul>\n<li><strong>DATA_STRUCTURE<\/strong> is the type of the data structure inside the sack (primitives, collections, &#8230;) initially defined with <code><strong>g.withSack(DATA_STRUCTURE)<\/strong><\/code><\/li>\n<li><strong>X<\/strong> is the type of the projected value on the current vertex\/edge using the modulator <code><strong>by()<\/strong><br \/>\n<\/code><\/li>\n<\/ul>\n<\/li>\n<li><strong>projection_axis<\/strong> is the property\/inner traversal result applied on a vertex\/edge<\/li>\n<\/ul>\n<h3>C. Accessing the sack<\/h3>\n<p>The sack data structure can be accessed at any time by calling the step <code><strong>sack()<\/strong><\/code>. The data type returned by this step is <code><strong>Iterator&lt;DATA_STRUCTURE&gt;<\/strong><\/code><\/p>\n<h1>III Usage of <code><strong>sack()<\/strong><\/code><\/h1>\n<pre class=\"brush: java; title: ; wrap-lines: false; notranslate\" title=\"\">\r\ngremlin&gt;g.withSack(0).V().\r\n  has(&quot;movie&quot;, &quot;title&quot;, &quot;Blade Runner&quot;). \/\/ Iterator&lt;Movie&gt;\r\n  inE(&quot;rated&quot;).                          \/\/ Iterator&lt;Rated_Edge&gt;  \r\n    sack(sum).by(&quot;rating&quot;).              \/\/ iterator.sideEffect(sack -&gt; sack.get() + values(&quot;rating&quot;))\r\n  outV().                                \/\/ Iterator&lt;User&gt;\r\n  sack()                                 \/\/ Iterator&lt;Int&gt;\r\n==&gt;6\r\n==&gt;5\r\n==&gt;8\r\n==&gt;8\r\n==&gt;8\r\n==&gt;8\r\n==&gt;9\r\n...\r\n<\/pre>\n<p>In the above traversal the interesting step is <code><strong>inE(\"rated\").sack(sum).by(\"rating\").outV()<\/strong><\/code>. On each edge <em>&#8220;rated&#8221;<\/em>, we add the value <em>&#8220;rating&#8221;<\/em> into the sack value. Please note the usage of <code><strong>by(\"rating\")<\/strong><\/code> acting as projection for the <code><strong>sack()<\/strong><\/code>. Sum is the bi-function <code><strong>Int apply(Int current, Int projected_value)<\/strong><\/code><\/p>\n<p>When the traversal reaches step <code><strong>outV()<\/strong><\/code> there are as many traversers as there are users rating <strong>Blade Runner<\/strong> and each of those traversers carries a sack containing the value of the rating.<\/p>\n<p>When calling <code><strong>sack()<\/strong><\/code> the content of each sack is extracted and therefor we have an <code><strong>Iterator&lt;Int&gt;<\/strong><\/code><\/p>\n<p>Computing Blade Runner average rating with <code><strong>sack()<\/strong><\/code> is really easy:<\/p>\n<pre class=\"brush: java; title: ; wrap-lines: false; notranslate\" title=\"\">\r\ngremlin&gt;g.withSack(0).V().\r\n  has(&quot;movie&quot;, &quot;title&quot;, &quot;Blade Runner&quot;). \/\/ Iterator&lt;Movie&gt;\r\n  inE(&quot;rated&quot;).                          \/\/ Iterator&lt;Rated_Edge&gt;  \r\n    sack(sum).by(&quot;rating&quot;).              \/\/ iterator.sideEffect(sack -&gt; sack.get() + values(&quot;rating&quot;))\r\n  outV().                                \/\/ Iterator&lt;User&gt;\r\n  sack().                                \/\/ Iterator&lt;Int&gt;\r\n  mean()\r\n==&gt;8.20353982300885\r\n<\/pre>\n<p>Now let&#8217;s see how we can use <code><strong>sack()<\/strong><\/code> for useful purposes. First you need to import the schema of London Tube as described in this <a href=\"https:\/\/www.doanduyhai.com\/blog\/?p=13320\" target=\"_blank\"><strong>previous post<\/strong><\/a>.<\/p>\n<p>Then switch g alias to London_Tube<\/p>\n<pre class=\"brush: java; title: ; wrap-lines: false; notranslate\" title=\"\">\r\ngremlin&gt;:remote config alias g London_Tube.g\r\n==&gt;g=London_Tube.g\r\n<\/pre>\n<p>One of the goal in this blog post was to find the shortest path between <strong>South Kensington<\/strong> and <strong>Marble Arch<\/strong> tube stations. Shortest path can mean either <em>least stations count<\/em> or <em>least line changes<\/em>.<\/p>\n<h4>Shortest path by least stations count<\/h4>\n<pre class=\"brush: java; title: ; wrap-lines: false; notranslate\" title=\"\">\r\ngremlin&gt;g.withSack(0).V().\r\n  has(&quot;station&quot;, &quot;name&quot;, &quot;South Kensington&quot;). \/\/ Iterator&lt;Station&gt;\r\n  emit().\r\n  repeat(timeLimit(400).\r\n         bothE(&quot;connectedTo&quot;).\r\n         otherV().\r\n         sack(sum).by(constant(1)).           \/\/ +1  \r\n         simplePath()).  \r\n  has(&quot;name&quot;, &quot;Marble Arch&quot;).\r\n  order().                                    \/\/ Order each journey\r\n    by(sack(),incr).                          \/\/  by the sack value, ASCENDING\r\n  limit(1).\r\n  path().by(\r\n    choose(label).\r\n    option(&quot;station&quot;, values(&quot;name&quot;)).\r\n    option(&quot;connectedTo&quot;, constant(&quot;--&gt;&quot;)))\r\n==&gt;[South Kensington, --&gt;, Knightsbridge, --&gt;, Hyde Park Corner, --&gt;, Green Park, --&gt;, Bond Street, --&gt;, Marble Arch]\r\n<\/pre>\n<p>The interesting pieces are:<\/p>\n<ul>\n<li><code><strong>sack(sum).by(constant(1))<\/strong><\/code>: for each traversed station, we increment the counter by +1<\/li>\n<li><code><strong>order().by(sack(),incr)<\/strong><\/code>: order each journey by the counter in each sack, ASCENDING order<\/li>\n<\/ul>\n<h4>Shortest path by least line changes<\/h4>\n<pre class=\"brush: java; title: ; wrap-lines: false; notranslate\" title=\"\">\r\ngremlin&gt;g.withSack([] as Set){it.clone()}{a,b -&gt; a.addAll(b); a}.\r\n  V().\r\n  has(&quot;station&quot;, &quot;name&quot;, &quot;South Kensington&quot;). \/\/ Iterator&lt;Station&gt;\r\n  emit().\r\n  repeat(timeLimit(400).\r\n         bothE(&quot;connectedTo&quot;).\r\n         sack{set,val -&gt; set.add(val); set}.by(&quot;line&quot;).\r\n         otherV().  \r\n         simplePath()).  \r\n  has(&quot;name&quot;, &quot;Marble Arch&quot;).\r\n  order().                                    \/\/ Order each journey\r\n    by(sack().count(local),incr).             \/\/  by the sack number of element, ASCENDING\r\n  limit(1).\r\n  path().by(\r\n    choose(label).\r\n    option(&quot;station&quot;, values(&quot;name&quot;)).\r\n    option(&quot;connectedTo&quot;, constant(&quot;--&gt;&quot;)))\r\n==&gt;[South Kensington, --&gt;, Gloucester Road, --&gt;, High Street Kensington, --&gt;, Notting Hill Gate, --&gt;, Queensway, --&gt;, Lancaster Gate, --&gt;, Marble Arch]\r\n<\/pre>\n<p>Let&#8217;s dig into the traversal:<\/p>\n<ul>\n<li><code><strong>withSack([] as Set)<\/strong><\/code>: defines a sack with an empty Set<\/li>\n<li><code><strong>{it.clone()}<\/strong><\/code>: defines the split operator for the set, which is just cloning the set itself<\/li>\n<li><code><strong>{a,b -> a.addAll(b); a}<\/strong><\/code>: defines the merge operator for the set, which is just appending all contents of the sets together<\/li>\n<li>Please notice the Groovy closure syntax to pass lambdas as arguments to the <code><strong>withSack()<\/strong><\/code> method. More details about this syntax is given <a href=\"http:\/\/groovy-lang.org\/semantics.html#_calling_a_method_accepting_a_sam_type_with_a_closure\" target=\"_blank\"><strong>here<\/strong><\/a><\/li>\n<li><code><strong>sack{set,val -> set.add(val); set}.by(\"line\")<\/strong><\/code>: on each traversed <em>&#8220;connectedTo&#8221;<\/em> edge, we add the <em>&#8220;line&#8221;<\/em> property to our set inside the sack<\/li>\n<li><code><strong>order().by(sack().count(local),incr)<\/strong><\/code>: we order the journeys by the number of lines inside each sack. Please notice the usage of <code><strong>sack().count(local)<\/strong><\/code>. Since the <code><strong>sack()<\/strong><\/code> type is now an <code><strong>Iterator&lt;Set&lt;String&gt;&gt;<\/strong><\/code> we need the <strong>local<\/strong> keyword to count the number of element in the nested set<\/li>\n<\/ul>\n<p>And that&#8217;s all folks! <strong>Do not miss the other Gremlin recipes in this series<\/strong>.<\/p>\n<p>If you have any question about <strong>Gremlin<\/strong>, find me on the <strong><a href=\"http:\/\/datastaxacademy.slack.com\" target=\"_blank\">datastaxacademy.slack.com<\/a><\/strong>, channel <strong>dse-graph<\/strong>. My id is <em>@doanduyhai<\/em>   <\/p>\n","protected":false},"excerpt":{"rendered":"<p>This blog post is the 8th from the series Gremlin Recipes. It is recommended to read the previous blog posts first: Gremlin as a Stream SQL to Gremlin Recommendation Engine traversal Recursive traversals Path object Projection and selection Variable handling<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[58,10],"tags":[],"_links":{"self":[{"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/13404"}],"collection":[{"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=13404"}],"version-history":[{"count":16,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/13404\/revisions"}],"predecessor-version":[{"id":13420,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/13404\/revisions\/13420"}],"wp:attachment":[{"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=13404"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=13404"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=13404"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}