Gradient descent & derivatives: how your introduction to calculus is the key to unlocking machine learning

calculus machine learning

Stepping into your first calculus class can be an overwhelming experience. On some level, it feels like everything you just spent the last decade of your life has been tossed out the window. Learning derivates might make your head spin, and it can be easy to throw your hands up and say: “who needs calculus anyways?!” It may remind you of learning cursive in third grade and insisting that you’ll never use this in the real world. While it’s true that most of us won’t become rocket scientists (and learning cursive was almost certainly useless), calculus is the bedrock of a rapidly growing field: machine learning. In fact, nearly every single model in machine learning relies on derivates for optimization.

Gaining an intuitive understanding of how these concepts connect can help you not only ace your next calculus test but also open the door to mastering machine learning. 

What is machine learning anyway? 

Machine learning is full of buzzwords: artificial intelligence, deep learning, Chat GPT, and much more. But, at its core, machine learning is often defined as fitting a function to data. Imagine you’re a realtor with data on 70 houses you’ve sold, each varying in size. You could make a plot of the size of the house versus the cost and might notice a trendline which you could describe with the equation y=wx+b.  

Here, y is the price, x is the size, b is a bias term, and w is the slope. Adjusting our slope w changes the trendline, and our goal is to find the value of w that best fits the data. Once the line is set, if you get data on 30 more houses, you could predict their prices based on their sizes. After selling those houses, you could evaluate how well your trendline performed by comparing predicted prices to actual prices. 

However, in the real world, things aren’t that simple. House prices aren’t only determined by size – its location, the year the house was built, and any extra amenities like a pool are all features that can influence the price. While we can’t plot all of these features on a two-dimensional graph, we can expand our equation to y = wx + b. Now, w and x are both vectors. Machine learning, in essence, involves adjusting w to minimize the difference between predicted and actual values. We are refining our function to better fit the data. 

Taking a derivative and setting it equal to zero

In calculus, the derivative of a function gives you the slope at any given point. To find the minimum or maximum of a function, you take its derivative and set it equal to zero. For example, consider the function f(x)=x^2−5. Its derivative is f′(x)=2x. By setting 2x=0, you find the minimum of the original function at x=0 (put this into your graphing calculator if you don’t believe me). 

Now, imagine you’re standing at x=4 on this curve and you want to find the bottom. The derivative at this point is f′(4)=8, meaning the slope is positive, and you're on an upward incline. If you move in the direction of the derivative, you’ll keep walking up the hill into infinity. So, to reach the minimum, you need to move in the opposite direction of the slope—downhill. In gradient descent, this means taking a step of some size (let's call it α) down that hill. If your step size is 0.1, your next position would x_new​ = 4 - 0.1*8 = 4 - 0.8 = 3.2. To get to the bottom of the hill, you would iterate through this process over and over again until you reached the bottom.   

Gradient descent

Gradient descent is like navigating a hilly terrain where the goal is to find the lowest point. To get there, you walk in the opposite direction of the gradient. The general formula for gradient descent is: 

x_(n+1)​= x_n​−α⋅∇f(x_n​) 

Here x_n is your current step, x_(n+1) is your next step, f(x_n​) represents your gradient (or derivative), and α determines how large of a step you take. If you repeat this over many iterations, eventually, you’ll get to the bottom of the hill. 

This is what our computer is doing when optimizing w in a machine learning model! It calculates the gradients of the function and updates w iteratively until the difference between predicted and actual values is minimized. Whether it’s a simple logistic regression or a complex neural network, the principle remains the same. However, instead of a nice 2-D example, the model adjusts its parameters through thousands (or even millions) of tiny steps in multi-dimensional space to find the best fit. 

Putting the pieces together 

In under a thousand words, you’ve just read how the derivatives you learned in your calculus class applies to almost all machine learning problems through gradient descent. While machine learning can certainly get more complicated, understanding this fundamental connection makes the field seem a bit less intimidating. So, the next time you’re procrastinating your calculus homework, remember how cool its applications can be. Who knows? Maybe mastering those derivatives today will lead you to create the next groundbreaking machine learning model tomorrow. 

Cassie graduated with a degree in Computer Science and Biomedical Engineerig from Johns Hopkins University. She is now pursuing a PhD in Medical Engineering and Medical Physics at MIT and Harvard Medical School as part of the Health Science Technologies (HST) program. Her research focuses on fair machine learning for underserved health conditions.

Comments

topicTopics
academics study skills medical school admissions MCAT SAT college admissions expository writing strategy English MD/PhD admissions writing LSAT physics GMAT GRE chemistry academic advice graduate admissions biology math interview prep law school admissions ACT language learning test anxiety personal statements premed career advice MBA admissions test prep AP exams homework help creative writing MD mathematics computer science study schedules Common Application history summer activities secondary applications research philosophy organic chemistry economics supplements admissions coaching 1L dental admissions grammar statistics & probability PSAT psychology law legal studies ESL reading comprehension CARS PhD admissions SSAT calculus covid-19 logic games engineering USMLE medical school mentorship Latin Spanish admissions advice biochemistry parents AMCAS case coaching verbal reasoning DAT English literature STEM dental school excel genetics political science skills French Linguistics MBA coursework Tutoring Approaches academic integrity astrophysics chinese classics freewriting gap year letters of recommendation mechanical engineering technical interviews units Anki DO Social Advocacy algebra amino acids art history artificial intelligence business careers cell biology cold emails data science diversity statement first generation student geometry graphing kinematics linear algebra mental health pre-dental presentations quantitative reasoning revising software engineering study abroad tech industry time management work and activities writer's block 2L AAMC DMD IB exams ISEE MD/PhD programs MMI Sentence Correction adjusting to college algorithms analysis essay argumentative writing athletics business skills executive function fellowships finance functions genomics infinite information sessions international students internships logic networking office hours outlining poetry proofs resume scholarships science social sciences trigonometry 3L ADHD Academic Interest ChatGPT EMT FlexMed Fourier Series Greek Health Professional Shortage Area Italian JD/MBA admissions Japanese Lagrange multipliers London MD vs PhD Montessori National Health Service Corps Pythagorean Theorem Python Shakespeare Step 2 TMDSAS Taylor Series Truss Analysis Zoom acids and bases active learning architecture art art and design schools art portfolios bacteriology bibliographies biomedicine boarding school brain teaser burnout campus visits cantonese capacitors capital markets central limit theorem centrifugal force chem/phys chemical engineering chess chromatography class participation climate change clinical experience community service competitions constitutional law consulting cover letters creative nonfiction curriculum dementia demonstrated interest dimensional analysis distance learning econometrics electric engineering electricity and magnetism embryology entropy escape velocity evolution extracurriculars fundraising harmonics health policy history of medicine history of science hybrid vehicles hydrophobic effect ideal gas law immunology induction infinite series institutional actions integrated reasoning intermolecular forces