Distributed graphs processing with Spark GraphX – Hacker Noon

Degree of separation

I am sure that many readers of this article have heard about the theory of six degrees of separation, it’s an unproven theory that any person on the planet can be connected to any other person on the Earth through a chain of acquaintances that has no more than five intermediaries. Concerning graph theory, this sounds like this: the diameter of the graph does not exceed 6 for any two persons on Earth.

The Degree of separation between Mr. Trump and me:

I was curious about the degree of separation between me and some famous people like Donald Trump or Mark Zuckerberg. After thinking, I recollect the next: my fellow worked on Facebook and personally know Mark, as result degree of separation between me and Mark is two: me — my mate, my mate — Mark. Of course, Mark and Mr. Trump know each other, as a result, +1: degree of separation between Mr. Trump and me is three. If you know me, it means, that you, at least you, have 3 and 4 degrees of separation to these people. Curious is not it?

An example of the degree of separation graph

Let’s start writing the code for this task, we need to traverse a graph in breadth so find contacts of the specified vertex, for this we need to modify the code of the Dijkstra algorithm:

Everything is very similar to what was above with Dijkstra algorithm, but we have to specify the number of iterations — value 10 for my graph I received empirically, for your graph, this may be a different number. Next, we execute join with usernames and take the first 100 values for an arbitrary user:

It’s also feasible to find a degree of separation for two arbitrary vertices:

Spark GraphX from the box makes it possible to get much information about the graph, for example, to get the connected component of the graph:

Get this metric as the number of triangles in the graph (triangle count):

PageRank

read original article here