Gradient descent & derivatives: how your introduction to calculus is the key to unlocking machine learning

calculus College machine learning

By Cassie P.

Stepping into your first calculus class can be an overwhelming experience. On some level, it feels like everything you just spent the last decade of your life has been tossed out the window. Learning derivates might make your head spin, and it can be easy to throw your hands up and say: “who needs calculus anyways?!” It may remind you of learning cursive in third grade and insisting that you’ll never use this in the real world. While it’s true that most of us won’t become rocket scientists (and learning cursive was almost certainly useless), calculus is the bedrock of a rapidly growing field: machine learning. In fact, nearly every single model in machine learning relies on derivates for optimization.

Gaining an intuitive understanding of how these concepts connect can help you not only ace your next calculus test but also open the door to mastering machine learning.

What is machine learning anyway?

Machine learning is full of buzzwords: artificial intelligence, deep learning, Chat GPT, and much more. But, at its core, machine learning is often defined as fitting a function to data. Imagine you’re a realtor with data on 70 houses you’ve sold, each varying in size. You could make a plot of the size of the house versus the cost and might notice a trendline which you could describe with the equation y=wx+b.

Here, y is the price, x is the size, b is a bias term, and w is the slope. Adjusting our slope w changes the trendline, and our goal is to find the value of w that best fits the data. Once the line is set, if you get data on 30 more houses, you could predict their prices based on their sizes. After selling those houses, you could evaluate how well your trendline performed by comparing predicted prices to actual prices.

However, in the real world, things aren’t that simple. House prices aren’t only determined by size – its location, the year the house was built, and any extra amenities like a pool are all features that can influence the price. While we can’t plot all of these features on a two-dimensional graph, we can expand our equation to y = wx + b. Now, w and x are both vectors. Machine learning, in essence, involves adjusting w to minimize the difference between predicted and actual values. We are refining our function to better fit the data.

Taking a derivative and setting it equal to zero

In calculus, the derivative of a function gives you the slope at any given point. To find the minimum or maximum of a function, you take its derivative and set it equal to zero. For example, consider the function f(x)=x^2−5. Its derivative is f′(x)=2x. By setting 2x=0, you find the minimum of the original function at x=0 (put this into your graphing calculator if you don’t believe me).

Now, imagine you’re standing at x=4 on this curve and you want to find the bottom. The derivative at this point is f′(4)=8, meaning the slope is positive, and you're on an upward incline. If you move in the direction of the derivative, you’ll keep walking up the hill into infinity. So, to reach the minimum, you need to move in the opposite direction of the slope—downhill. In gradient descent, this means taking a step of some size (let's call it α) down that hill. If your step size is 0.1, your next position would x_new = 4 - 0.1*8 = 4 - 0.8 = 3.2. To get to the bottom of the hill, you would iterate through this process over and over again until you reached the bottom.  

Gradient descent

Gradient descent is like navigating a hilly terrain where the goal is to find the lowest point. To get there, you walk in the opposite direction of the gradient. The general formula for gradient descent is:

x_(n+1)= x_n−α⋅∇f(x_n)

Here x_n is your current step, x_(n+1) is your next step, ∇f(x_n) represents your gradient (or derivative), and α determines how large of a step you take. If you repeat this over many iterations, eventually, you’ll get to the bottom of the hill.

This is what our computer is doing when optimizing w in a machine learning model! It calculates the gradients of the function and updates w iteratively until the difference between predicted and actual values is minimized. Whether it’s a simple logistic regression or a complex neural network, the principle remains the same. However, instead of a nice 2-D example, the model adjusts its parameters through thousands (or even millions) of tiny steps in multi-dimensional space to find the best fit.

Putting the pieces together

In under a thousand words, you’ve just read how the derivatives you learned in your calculus class applies to almost all machine learning problems through gradient descent. While machine learning can certainly get more complicated, understanding this fundamental connection makes the field seem a bit less intimidating. So, the next time you’re procrastinating your calculus homework, remember how cool its applications can be. Who knows? Maybe mastering those derivatives today will lead you to create the next groundbreaking machine learning model tomorrow.

Cassie

Cassie graduated with a degree in Computer Science and Biomedical Engineerig from Johns Hopkins University. She is now pursuing a PhD in Medical Engineering and Medical Physics at MIT and Harvard Medical School as part of the Health Science Technologies (HST) program. Her research focuses on fair machine learning for underserved health conditions.

Work with Cassie Meet all our coaches

Did you know we offer tutoring for college students?

Learn more

Comments