Vectors are numeric lists
Vectors, from a data analysis perspective, are just lists.
The sort of lists we are thinking of are lists of numbers, written either horizontally – a row vector:
or vertically – a column vector:
And now we complicate matters by bringing in geometry
A perspective that linear algebra brings to these numeric vectors is that we can interpret them geometrically as points in an appropriate dimensional space.
The two vectors above, for example, can be interpreted as points in 10-dimensional space.
“So what?” you may ask, “how does it help to think of something as concrete as a list of numbers as a point in 10-dimensional space – something I can’t even begin to visualize.”
The first video at the bottom of this post, by Grant Sanderson, explains this geometric interpretation beautifully for 2 dimensions, so watch that when you are ready, and we’ll continue with the advantages of thinking geometrically.
What’s the point?
Matrices as transformations of space
The main point to viewing vectors as points in some higher dimensional space is that matrices of data can then be viewed as transformations in these higher dimensional spaces: matrices move vectors around in a fairly natural way. The second video of by Grant Sanderson, below, explains this idea in a very intuitive way in dimension 2.
Here’s how the matrix of data:
On the other hand if we make the choice then the result of transforming this vector by the matrix is to produce a vector whose entries are the averages of each of the data matrix rows.
So even in this very simple example you can see that thinking of matrices as transforming vectors can give us some meaningful data transformations.
How linear algebra helps in understand the structure of data
What linear algebra is concerned with is gaining understanding of how matrices transform vectors by decomposing the matrices into simpler bits. These decomposition are guided by geometry.
Of course no one “sees” pictures in 10-dimensional space, but many of our geometric ideas from 2 and 3 dimensional space, such as rotations, stretches, shifts and reflections, have exact analogies in higher dimensional space.
So, odd as it may seem at first glance, the idea to think of numeric vectors as points in some space, and matrices of data as transformations of these vectors, is an extremely fruitful route to obtaining useful decomposition of matrices of stored data.
When these matrices are very big, as happens with huge data sets with many variables, these decomposition into simpler pieces are super important.
That, in a nutshell, is the main reason linear algebra is important for data scientists.
Take a course, but keep your data perspective and motivation
If you take a course in linear algebra you will study these ideas usually in relatively low dimensional examples. Some linear algebra courses will get as far as important topics like singular value decomposition. Yet almost nowhere will you find the powerful ideas of linear algebra building on the motivations that matter to a data analyst: how to better understand data.
In our view motivation matters. Studying mathematics for its own sake is great for people who love to do that. For others the motivation and understanding of why certain ideas matter is of the utmost importance. So just taking a course in linear algebra or reading a book is not necessarily going to help much unless and until you get the basic connection between thinking of vectors and matrices in data terms – which comes naturally to data analysts – and in geometric terms, which comes more naturally to a mathematician.
Grant Sanderson’s superb videos