Effective Tools To Make A Great Relationships Analysis in Game of Thrones [Part 2] | Hacker Noon

In the last post, we showed the character relationship for the Game of Thrones by using NetworkX and Gephi. In this post, we will show you how to access data in Nebula Graph by using NetworkX.

NetworkX is a modeling tool for the graph theory and complex networks written by Python. With its rich, easy-to-use built-in graphs and analysis algorithms, it’s easy to perform complex network analysis and simulation modeling.

In NetworkX, a graph (network) is a collection of nodes together with a collection of edges. Attributes are often associated with nodes and/or edges and are optional. Nodes represent data. Edges are uniquely determined by two nodes, representing the relationship between the two nodes. Nodes and edges can also have many attributes so that more information can be stored.

Graph: undirected graph.DiGraph: directed graph.MultiGraph: A flexible graph class that allows multiple undirected edges between pairs of nodes.MultiDiGraph: A directed version of a MultiGraph.

import networkx as nx
G = nx.Graph()
G.add_node(1)
G.add_nodes_from([2,3,4])
G.add_node(2,name='Tom',age=23)
G.add_edge(2,3)
G.add_edges_from([(1,2),(1,3)])
g.add_edge(1, 2, start_year=1996, end_year=2019)

In the last post (part one of this series), we have displayed the community detection algorithm Girvan-Newman provided by NetworkX.

NetworkX usually uses local files as the data source, which is totally okay for static network researches. But when the graph network changes a lot, for example, some central nodes are deleted or important network topology changes are introduced, it is a little troublesome to generate, load, and analyze the new static files. At this time, we’d better persist the entire change process in a database, and load subgraphs or whole graphs directly from the database for real-time analysis. This post uses Nebula Graph [3] as the graph database for storing the graph data.

The first method is suitable for obtaining certain nodes and edges that meet the requirements through fine filtering and pruning in a large-scale graph network. Whereas the second method is more suitable for the whole graph analysis. It is usually used in the early stage of the project when conducting heuristic exploration. When you further understand the graph, you can use the first method for fine pruning analysis.

 file are the APIs that interact with the underlying storage. These files contain a series of interfaces to scan nodes, scan edges, read columns, and so on. The following two interfaces can be used to scan all nodes and edges:

def scanVertex(self, space, returnCols, allCols, limit, startTime, endTime)
def scanEdge(self, space, returnCols, allCols, limit, startTime, endTime)

1. Initialize a client and a ScanEdgeProcessor. The ScanEdgeProcessor decodes the edge data read from Nebula Graph:

metaClient = MetaClient([('192.168.8.16', 45500)])
metaClient.connect()
storageClient = StorageClient(metaClient)
scanEdgeProcessor = ScanEdgeProcessor(metaClient)

2. Initialize the parameters for the scanEdge interface:

spaceName = 'nba' # The name of the graph space to be read.
returnCols = {} # The edges (or nodes) and their property columns to be returned.
returnCols['serve'] = ['start_year', 'end_year']
returnCols['follow'] = ['degree']
allCols = False # Return all the property columns or not. When set to False, only the specified property columns in the returnCols are returned. When set to True, all property columns are returned.
limit = 100 # Maximum data entry number returned.
startTime = 0
endTime = sys.maxsize

3. Call the scanPartEdge interface to return the iterator for the canEdgeResponse object:

scanEdgeResponseIterator = storageClient.scanEdge(spaceName, returnCols, allCols, limit, startTime, endTime)

4. Keep reading the data in the ScanEdgeResponse object which the iterator points to till all the data are read:

while scanEdgeResponseIterator.hasNext():
    scanEdgeResponse = scanEdgeResponseIterator.next()
    if scanEdgeResponse is None:
        print("Error occurs while scaning edge")
        break
    processEdge(space, scanEdgeResponse)

In the preceding snippet, the process_edge is a custom function for processing the scanned edge data. This function first uses the scan_edge_processor to decode the data in the scan_edge_response. The decoded data can either be directly printed out or processed for other purposes such as importing these data into the calculation framework NetworkX.

5. Process data. Here we import all the scanned data into Graph G in the NetworkX:

def processEdge(space, scanEdgeResponse):
    result = scanEdgeProcessor.process(space, scanEdgeResponse)
    # Get the corresponding rows by edge_name
    for edgeName, edgeRows in result.rows.items():
        for row in edgeRows:
            srcId = row.defaultProperties[0].getValue()
            dstId = row.defaultProperties[2].getValue()
            print('%d -> %d' % (srcId, dstId))
            props = {}
            for prop in row.properties:
                propName = prop.getName()
                propValue = prop.getValue()
                props[propName] = propValue
            G.add_edges_from([(srcId, dstId, props)]) # Add edges to Graph G in the NetworkX

Nodes scanning is similar to the preceding process.

In addition, for some of the distributed graph calculation frameworks [4], Nebula Graph provides batch reading and storing based on the partitions. We will show you in the later post.

When importing all the nodes and edges into the NetworkX according to the preceding process, we can do some basic graph analysis and calculations:

1. Draw a plot:

nx.draw(G, with_labels=True, font_weight='bold')
import matplotlib.pyplot as plt
plt.show()
plt.savefig('./test.png')
print('nodes: ', list(G.nodes))
print('edges: ', list(G.edges))
nodes:  [109, 119, 129, 139, 149, 209, 219, 229, 108, 118, 128, 138, 148, 208, 218, 228, 107, 117, 127, 137, 147, 207, 217, 227, 106, 116, 126, 136, 146, 206, 216, 226, 101, 111, 121, 131, 141, 201, 211, 221, 100, 110, 120, 130, 140, 150, 200, 210, 220, 102, 112, 122, 132, 142, 202, 212, 222, 103, 113, 123, 133, 143, 203, 213, 223, 104, 114, 124, 134, 144, 204, 214, 224, 105, 115, 125, 135, 145, 205, 215, 225]
edges:  [(109, 100), (109, 125), (109, 204), (109, 219), (109, 222), (119, 200), (119, 205), (119, 113), (129, 116), (129, 121), (129, 128), (129, 216), (129, 221), (129, 229), (129, 137), (139, 138), (139, 212), (139, 218), (149, 130), (149, 219), (209, 123), (219, 130), (219, 112), (219, 104), (229, 147), (229, 116), (229, 141), (229, 144), (108, 100), (108, 101), (108, 204), (108, 206), (108, 214), (108, 215), (108, 222), (118, 120), (118, 131), (118, 205), (118, 113), (128, 116), (128, 121), (128, 201), (128, 202), (128, 205), (128, 223), (138, 115), (138, 204), (138, 210), (138, 212), (138, 221), (138, 225), (148, 127), (148, 136), (148, 137), (148, 214), (148, 223), (148, 227), (148, 213), (208, 127), (208, 103), (208, 104), (208, 124), (218, 127), (218, 110), (218, 103), (218, 104), (218, 114), (218, 105), (228, 146), (228, 145), (107, 100), (107, 204), (107, 217), (107, 224), (117, 200), (117, 136), (117, 142), (127, 114), (127, 212), (127, 213), (127, 214), (127, 222), (127, 226), (127, 227), (137, 136), (137, 213), (137, 150), (147, 136), (147, 214), (147, 223), (207, 121), (207, 140), (207, 122), (207, 134), (217, 126), (217, 141), (217, 124), (217, 144), (106, 204), (106, 212), (106, 113), (116, 141), (116, 126), (116, 210), (116, 216), (116, 121), (116, 113), (116, 105), (126, 216), (136, 210), (136, 213), (136, 214), (146, 202), (146, 210), (146, 215), (146, 222), (146, 226), (206, 123), (216, 144), (216, 105), (226, 140), (226, 112), (226, 114), (226, 144), (101, 100), (101, 102), (101, 125), (101, 204), (101, 215), (101, 113), (101, 104), (111, 200), (111, 204), (111, 215), (111, 220), (121, 202), (121, 215), (121, 113), (121, 134), (131, 205), (131, 220), (141, 124), (141, 205), (141, 225), (201, 145), (211, 124), (221, 104), (221, 124), (100, 125), (100, 204), (100, 102), (100, 113), (100, 104), (100, 144), (100, 105), (110, 204), (110, 220), (120, 150), (120, 202), (120, 205), (120, 113), (140, 114), (140, 214), (140, 224), (150, 143), (150, 213), (200, 142), (200, 104), (200, 145), (210, 124), (210, 144), (210, 115), (210, 145), (102, 203), (102, 204), (102, 103), (102, 135), (112, 204), (122, 213), (122, 223), (132, 225), (202, 133), (202, 114), (212, 103), (222, 104), (103, 204), (103, 114), (113, 104), (113, 105), (113, 125), (113, 204), (133, 114), (133, 144), (143, 213), (143, 223), (203, 135), (213, 124), (213, 145), (104, 105), (104, 204), (104, 215), (114, 115), (114, 204), (134, 224), (144, 145), (144, 214), (204, 105), (204, 125)]

3. One of the common application is to calculate the shortest path between two nodes:

4. Calculate the pagerank value for each node in the graph to see their influences:

In addition, similar to the first post in the series, you can access to Gephi to make the visualization look better.

read original article here