by James Bailey |

Now we're getting deeper. To succeed in Week Three’s programming assignment, we will need to be facile with programming and facile with linear algebra. That second facility needs to be discussed a bit more, because Deep Learning and linear programming actually have little if anything to say to each other. Their current flirtation is an accident of history.

Deep Learning involves doing scads of little tiny independent computations. Every node of every layer of a neural network is busy all the time, just as every neuron in our brain is busy all the time. The important thing about these little computations is that they are independent. The internal steps that one node, or one of our neurons, goes through is totally independent of those going on elsewhere. The computer—or our noggins— can do them all in any order or even all at once in parallel. Notice that none of this independence is true of, for example, a geometric proof. Each step of a proof depends on the one before it, which needs to be completed first. No shuffle mode in Euclid.

The one area where legacy mathematics exhibits the same all-at-once independence is in matrix operations. Multiplying two matrices may involve hundreds of individual multiplications, but none of them depend on any other. Operations on vectors and matrices were long ago formalized in the field of “linear algebra” and computer hardware designers responded by offering special hardware to bang through matrix operations lickety-split.

So the way to get current computer hardware to whip through Deep Learning operations is to make that hardware think they are actually doing vector and matrix operations. Most of Week Two and all of Week Three are dedicated to the masquerade that Deep Learning professionals have adopted to make the hardware think that neural networks are matrices. This subterfuge can get tedious. For example, linear algebra says you cannot multiply two vectors if they are both horizontal rows of values. The second one needs to be a vertical column. (Don't ask.) So sometimes, but not always, we have to "transpose" that second one from a road to a column. Boring.

The originators of the Deep Learning course have worked hard to minimize this mickey mouse (we were transposing much more often in Prof. Ng’s earlier Machine Learning course) but it is still there for now. Hopefully this masquerade will be gone completely in a few more years.

It is easy as we dip into Deep Learning for the first time to lurch all the way from believing that computers can only do what we tell them to do (as a clueless keynote speaker insisted at a recent mathematics teacher symposium) to believing that a single piece of code can learn anything at all. Not so fast. Every problem has its own little nuances, and so every Deep Learning algorithm needs to be tweaked accordingly. It is not obvious in advance which sub-algorithms will fill the bill.

"

It's actually very difficult to know in advance exactly what will work best [so] try them all and see which ones work better and go with that.

"

John Dewey was telling us exactly this a century ago. There are no guarantees in life. You cannot know for sure what the best course will be. You just have to try. When you find something that works, remember it. Not because it is somehow "true," but because it works. The field of philosophy that recognizes that "it's true because it works" is called pragmatism. Dewey was one of its founders. He would have loved this Deep Learning course.

tags:

^
v