This blog post is the 8^{th} from the series **Gremlin Recipes**. It is recommended to read the previous blog posts first:

**Gremlin as a Stream****SQL to Gremlin****Recommendation Engine traversal****Recursive traversals****Path object****Projection and selection****Variable handling**

# I KillrVideo dataset

To illustrate this series of recipes, you need first to create the schema for **KillrVideo** and import the data. See **here** for more details.

The graph schema of this dataset is :

# II Introduction to **sack()**

**sack()**

The

step was introduced recently to help people solving some problem more efficiently.**sack()**

In some case people are writing **Gremlin** traversals that make use of data aggregation using path information. Typically, people will use

and then perform “reduction” of the data in the path to get the specific result. **path()**

Unfortunately, this is inefficient as path calculations are expensive, not mergable. The same ca be now achieved using

.**sack()**

A sack is a **local datastructure** relative to each traverser, unlike global datastructure as when using

. It mean that each traverser is now equipped with his own sack. A sack can contain any type of data structure:**aggregate()/store()/group(x)/subgraph(x)...**

**primitives**: Int, Double, Long, …**collections**: List/Set/Map

### A Defining the sack

The definition of this data structure should be placed right after the traversal object:

gremlin>g.withSack(1.0). // Define a sack containing an initial double value V(). ... gremlin>g.withSack([] as Set). // Define a sack containing an initial empty Set V(). ... gremlin>g.withSack([1,2,3] as Set). // Define a sack containing a pre-initialized Set V(). ... gremlin>g.withSack([] as List). // Define a sack containing an initial empty List V(). ... gremlin>g.withSack([:]). // Define a sack containing an initial empty Map V(). ... gremlin>g.withSack([:]. // Define a sack containing an initial empty Map withDefault{key -> [] as List}). // with default values being an empty list V(). ...

The complete signature of the

step is **withSack()****withSack(default_data_structure, split_operator, merge_operator)**

**split operator**is used to split the data structure whenever traverser is facing a branch and needs to duplicate itself. Example:

. From the User vertex now there are as many traversers as there are ratings to different movie. Thus the sack structure at User vertex get duplicated**V().has("user","userId", "u861").out("rated")****merge operator**is used to merge different sack data structures together whenever all traversers are converging to a single point. Example:

. The traversers from many Movie vertices are converging to the single Person vertex which is**g.V().hasLabel("movie").out("director").has("name", "Woody Allen")****Woody Allen**. The sack at vertex**Woody Allen**is the merge of all sack data structures coming from his movies. If you do not define any merge operator, the sacks cannot be merged

Note: for primitive data type, split operator == value copy. For other data types like collection, if you don’t define any split operator, the same object reference will be copied to all branching traversers. Consequently all those traversers will shared the same data structure. Be very wary of this

Some example of defining collection as data structure for sack:

// Define a sack with sum() as merge operator // The split operator is implicitly copy-by-value for primitives gremlin>g.withSack(1.0, sum). V(). ... // Define a sack with initial empty Set // Split function == Set::clone() // Merge function == Set::appendAll() g.withSack([] as Set){set-> set.clone()}{setA,setB -> setA.appendAll(setB); setA}. V() ... // Define a sack with initial empty Map // Split function == Map::clone() // Merge function == Map::putAll() gremlin>g.withSack([:]){it.clone}{mapA,mapB -> mapA.putAll(mapB); mapA}. V(). ...

### B. Mutating the sack

To mutate the content of the sack, you can use the step

.**sack(bi-function).by(projection_axis)**

**bi-function**is a function with signature:**DATA_STRUCTURE apply(DATA_STRUCTURE current, X projected_value)****DATA_STRUCTURE**is the type of the data structure inside the sack (primitives, collections, …) initially defined with**g.withSack(DATA_STRUCTURE)****X**is the type of the projected value on the current vertex/edge using the modulator**by()**

**projection_axis**is the property/inner traversal result applied on a vertex/edge

### C. Accessing the sack

The sack data structure can be accessed at any time by calling the step

. The data type returned by this step is **sack()****Iterator<DATA_STRUCTURE>**

# III Usage of **sack()**

**sack()**

gremlin>g.withSack(0).V(). has("movie", "title", "Blade Runner"). // Iterator<Movie> inE("rated"). // Iterator<Rated_Edge> sack(sum).by("rating"). // iterator.sideEffect(sack -> sack.get() + values("rating")) outV(). // Iterator<User> sack() // Iterator<Int> ==>6 ==>5 ==>8 ==>8 ==>8 ==>8 ==>9 ...

In the above traversal the interesting step is

. On each edge **inE("rated").sack(sum).by("rating").outV()***“rated”*, we add the value *“rating”* into the sack value. Please note the usage of

acting as projection for the **by("rating")**

. Sum is the bi-function **sack()****Int apply(Int current, Int projected_value)**

When the traversal reaches step

there are as many traversers as there are users rating **outV()****Blade Runner** and each of those traversers carries a sack containing the value of the rating.

When calling

the content of each sack is extracted and therefor we have an **sack()****Iterator<Int>**

Computing Blade Runner average rating with

is really easy:**sack()**

gremlin>g.withSack(0).V(). has("movie", "title", "Blade Runner"). // Iterator<Movie> inE("rated"). // Iterator<Rated_Edge> sack(sum).by("rating"). // iterator.sideEffect(sack -> sack.get() + values("rating")) outV(). // Iterator<User> sack(). // Iterator<Int> mean() ==>8.20353982300885

Now let’s see how we can use

for useful purposes. First you need to import the schema of London Tube as described in this **sack()****previous post**.

Then switch g alias to London_Tube

gremlin>:remote config alias g London_Tube.g ==>g=London_Tube.g

One of the goal in this blog post was to find the shortest path between **South Kensington** and **Marble Arch** tube stations. Shortest path can mean either *least stations count* or *least line changes*.

#### Shortest path by least stations count

gremlin>g.withSack(0).V(). has("station", "name", "South Kensington"). // Iterator<Station> emit(). repeat(timeLimit(400). bothE("connectedTo"). otherV(). sack(sum).by(constant(1)). // +1 simplePath()). has("name", "Marble Arch"). order(). // Order each journey by(sack(),incr). // by the sack value, ASCENDING limit(1). path().by( choose(label). option("station", values("name")). option("connectedTo", constant("-->"))) ==>[South Kensington, -->, Knightsbridge, -->, Hyde Park Corner, -->, Green Park, -->, Bond Street, -->, Marble Arch]

The interesting pieces are:

: for each traversed station, we increment the counter by +1**sack(sum).by(constant(1))**

: order each journey by the counter in each sack, ASCENDING order**order().by(sack(),incr)**

#### Shortest path by least line changes

gremlin>g.withSack([] as Set){it.clone()}{a,b -> a.addAll(b); a}. V(). has("station", "name", "South Kensington"). // Iterator<Station> emit(). repeat(timeLimit(400). bothE("connectedTo"). sack{set,val -> set.add(val); set}.by("line"). otherV(). simplePath()). has("name", "Marble Arch"). order(). // Order each journey by(sack().count(local),incr). // by the sack number of element, ASCENDING limit(1). path().by( choose(label). option("station", values("name")). option("connectedTo", constant("-->"))) ==>[South Kensington, -->, Gloucester Road, -->, High Street Kensington, -->, Notting Hill Gate, -->, Queensway, -->, Lancaster Gate, -->, Marble Arch]

Let’s dig into the traversal:

: defines a sack with an empty Set**withSack([] as Set)**

: defines the split operator for the set, which is just cloning the set itself**{it.clone()}**

: defines the merge operator for the set, which is just appending all contents of the sets together**{a,b -> a.addAll(b); a}**- Please notice the Groovy closure syntax to pass lambdas as arguments to the

method. More details about this syntax is given**withSack()****here**

: on each traversed**sack{set,val -> set.add(val); set}.by("line")***“connectedTo”*edge, we add the*“line”*property to our set inside the sack

: we order the journeys by the number of lines inside each sack. Please notice the usage of**order().by(sack().count(local),incr)**

. Since the**sack().count(local)**

type is now an**sack()**

we need the**Iterator<Set<String>>****local**keyword to count the number of element in the nested set

And that’s all folks! **Do not miss the other Gremlin recipes in this series**.

If you have any question about **Gremlin**, find me on the **datastaxacademy.slack.com**, channel **dse-graph**. My id is *@doanduyhai*