Introduction to Data-Driven Network Analytics

Workshop on Big Social Data Analytics, Copenhagen 2015

Jukka Huhtamäki

Based on a SPEED network analysis workshop held at Tampere in February 2015.

Principle

Anatomy of a social network (Gray, 2012)

Image Source: Anatomy of a social network (Gray 2012)

History: Six Handshakes

Milgramin (1967) experiments gave firm evidence on the existence of a small-world: "the diameter of the world" is roughly six handshakes.

Excerpt from Milgram (1967)

Key result: Scale Free networks

Barabási and Bonabeau (2003) presented the principle of scale-free networks and the reason for their existence: preferential attachment process

Random vs. scale-free network (Barabási and Bonabeau, 2003)

Image Source: Anatomy of a social network (Gray 2012)

Example: Finnish Innovation Ecosystem

Finnish Innovation Ecosystem (Still et al., 2013)

Finnish Innovation Ecosystem (Still et al., 2013)

Part 1: Collecting data

The objective is to put together a sociomatrix

Sociomatrix example

Image source: Hoffman (2000): Introduction to Sociometry

Sociomatrix is the matrix representation of a sociogram. (Moreno (1934) may have used the latter in a slightly different meaning.)

Sociomatrix enumerates the individual connections between actors. Matrix representation allows for different kinds of computations, cf. Miilumäki (2011).

In practice connections are simply enumerated one by one

Source Target Type Id Label Weight
2611Directed5531
5526Directed13121
5511Directed13219
2711Directed5817
6258Directed16117
5958Directed14715
2524Directed5113
6259Directed16213
2511Directed5312
5549Directed12812
6462Directed17712

. . .

Network representation of the previous: Les Miserables

Characters in Les Miserables

https://gephi.org/datasets/lesmiserables.gml.zip

Another example: government networks

Olli Parviaisen introduces a straigtforward way to conduct network analysis (if you first learn Finnish ;).

To data upstream: Twitter

Bibliographic data

For a more complex example, we can use bibliographical data

E.g. Scopus allows exporting the results of a search in CSV

Exporting bibliometric data from Scopus

Python-based process is available for scraping the data and converting it into network format

In practice

Interpreting your Facebook friendship network is an educational exercise.

Kari A. Hintikka instructs (this one in Finnish, too), Jukka shows an example should time allow.

Part 2: From data to network

Choises on network structure

  1. Which entities nodes represent?
  2. On what basis are nodes connected to each other?
  3. One, two or multimode network?
  4. Directed or undirected?
  5. Dichotomous vai weighed connections?
  6. Static or dynamic (temporal)?

Examples help here.

From tweets to a network

Let's decompose an example tweet into network data:

  1. Nodes? Connections?
  2. One, two, multimode?
  3. Directed or undirected?
  4. Dichotomous or weighed?
  5. Which criteria one should apply to make the decisions?

Problem: identifying the nodes

Twitter provides natural identifiers for nodes.

E.g. using bibliographical data is more problematic

One approach to finding unique names for nodes: OpenRefine

Part 3: Network layout

Force-driven layout

Layout refers to the act of placing the nodes on canvas

Force-driven layot is a straighforward option:

  1. Nodes repel each other
  2. Connections act as springs pulling the nodes back together
  3. The center of a gravitational field is placed in the middle of the canvas
  4. The process is run and configured in iteration until the visualizer is happy with the result

Network metrics

  • Degree: number of connections
  • Outdegree: number of connections away from a node
  • Indegree: number of connections toward a node
  • Betweenness centrality: shortest paths through a node
  • Authority
  • Clustering coefficient, closure, ...

The Ostinato Model

Diagram: Ostinato Model

Huhtamäki, J., Russell, M. G., Rubens, N., & Still, K. (2015). Ostinato: The exploration-automation cycle of user-centric, process-automated data-driven visual network analytics. In E. Bertino, S. Matei, & M. G. Russell (Eds.), Transparency in Social Media: Tools, Methods and Algorithms for Mediating Online Interactions. Springer.