### Graphs and data science

Data scientists deal a lot with graphs. For instance, in:

- Social graphs
- Brain networks
- Protein-protein interactions
- Graph-based machine learning
- Graph data bases

In a Wired article, Roy Marsten, Chief Scientist at Emcien Corporation in Atlanta, GA, writes:

“Understanding these relationships between data, whether it is Web pages, products purchased, features on a vehicle, words in a message, or symptoms, treatments, and outcomes from sick patients, is the first step to accepting that graphs will be the way we think about data in the future.

The data that we have today, and in often the ways we look at data, are already steeped in the theory of graphs. In the future, our ability to understand data with graphs will take us beyond searching for an answer. Creating and analyzing graphs will bring us to answers automatically. When we let data connect itself, meaning will begin to emerge automatically.”

### Graphs and matrices

Looking at the graph above we can see that this has nothing to do with linear algebra, right?

Well, actually, it has *everything* to do with linear algebra!

That’s because the complete graph information can be coded in a matrix: in this case by an matrix, with one row, and one column for each of the 18 vertices of the graph.

This matrix is called the *adjacency matrix* of the graph and it’s constructed like this:

We put a:

- “1” in row 1, column 2, because vertex 1 is connected by an edge to vertex 2;
- “1” in row 1, column 3, because vertex 1 is connected by an edge to vertex 3;
- “1” in row 1, column 4, because vertex 1 is connected by an edge to vertex 4;
- “1” in row 1, column 5, because vertex 1 is connected by an edge to vertex 5 ;
- “0” in row 1 and all other columns, because vertex 1 is not connected by an edge to any other vertices.

Then we repeat this for the rows corresponding to vertex 2, vertex 3, …, through vertex 18.

Here’s the result:

### And your point is?

Good question!

##### Talk to a computer

First, the matrix can be coded in most programming languages, so we can store the structure of the graph, via the adjacency matrix, in computer memory. How else were we going to do this?

##### Counting paths

Second, the adjacency matrix can be used to tell us some interesting things about the graph. For instance, how many paths of length 2, length 3, length 4, and so on, are there from one vertex to another?

We can try enumerating these, case by case, by hand, but a far niftier, and computationally easier and more efficient, way is to use the idea of a matrix power. Yes, it turns out there’s a reasonably natural way to multiply matrices, and if we multiply a square matrix by itself we get the square of the matrix, and then another multiplication gives us the cube of the matrix, and so on.

So let’s take the adjacency matrix above, use some software to calculate its power (I used *Mathematica*® for this purpose: R or Python will do this just fine, as, of course, will MATLAB):

This tells us, for example that there are:

- 7 paths of length 3 from vertex 1 to vertex 2
- 10 paths of length 3 from vertex 4 to vertex 1
- 14 paths of length 3 from vertex 5 to vertex 16

and much more!

### Paths in big social graphs

So, now imagine you’re interested in a social graph, such as connections between Facebook, Instagram, or Twitter users.

The graphs you get will be HUGE.

If you’ve coded part of them as a matrix you can use the powers of the matrix to compute how many paths there are from one person to another though that part of the social graph.

### So now do you want to learn about matrix multiplication?

Maybe it’s not quite so abstract any more.

Maybe you can see there might be a use for it in data science.

Maybe, linear algebra could turn out to be useful in data science?

Maybe, just maybe, the mathematician’s apparently abstract way of thinking could be useful – if we could just see the motivation first it would be a big help.

## Leave a Reply