Why Neural Networks are Universal Function Approximators

College Graduate School graphing neural networks

You may have heard that artificial neural networks are "universal function approximators," and that this allows them to do many different things, enabling diverse technologies from chat-GPT to self-driving cars. You may have a good intuition for what universal function approximation means as well; there exists some setting of neural networks parameters that can be fit arbitrarily close to any curve. For example, there are neural networks that can map inputs x to outputs y like any of the continuous functions below:

Screenshot 2025-05-29 at 1.23.57 PM

In this post, we’ll provide a visual explanation for why neural networks are universal function approximators.

We’ll focus on the case where the neural network takes 1-dimensional input and returns 1-dimensional output, but a similar argument can be used for multiple dimensions. 

Linear Splines

We’ll show that a neural network is a ‘linear spline’, which is just a sequence of line segments, connected end to end. Neural networks can be fit to any curve because these linear splines can be fit to any curve. The fit of a linear spline gets better the more line segments it has.

Here’s an easy way to fit a curve with line segments:

  1. Place a number of points on the curve you want to approximate, evenly spaced along the x-axis.
  2. Then connect each of the adjacent points with line segments.


That’s it! These line segments taken together are a linear spline, they draw out a new curve. Observe that as the number of points (and line segments between them) increases, the linear spline approximates the original curve better and better. 

Screenshot 2025-05-29 at 1.26.31 PM

A Neural Network with One Hidden Layer 

Hopefully it’s clear how a linear spline can approximate any function, because now we’ll show that there is a neural network equivalent to any linear spline.

Let’s consider a neural network with 1 hidden layer of 3 neurons. It maps an input, x, to a "hidden state" of 3 numbers, h = [h1, h2, h3], to a final output, it would typically be diagrammed something like this;, to a ‘hidden state’ of 3 numbers, to a final output, y. it would typically be diagrammed something like this:

Screenshot 2025-05-29 at 1.29.10 PM

I’ve labeled each neuron (o) and connection () with its associated weight (w) or bias (b). These are tunable parameters that affect the computation the network performs. A neuron outputs a weighted sum of the inputs to its left, plus a constant bias. For example, the output neuron in the above network computes the following:

Screenshot 2025-05-29 at 1.30.40 PM

A dashed circle, like those in the hidden layer, denotes the neuron applies a ‘relu’ to its output. This means that if the neuron would output a value less than zero, it ‘turns off’ and outputs zero instead. For example, the h1 neuron in our network computes:

Screenshot 2025-05-29 at 1.32.24 PM

Graphically, the relu transforms a line with slope w and intercept b into the ray for that line above the x axis.

Screenshot 2025-05-29 at 1.34.05 PM

A Neural Network as a Linear Spline

Universal function approximation means there is some setting of weights and biases such that our network with 3 hidden neurons can match any linear spline with 3 line segments. To see how this works concretely, lets match our neural network to the following 3 segment spline:

Screenshot 2025-05-29 at 1.34.58 PM

For our network to match this spline, it needs to obey the following equations:

Screenshot 2025-05-29 at 1.35.46 PM

Set Biases to Match the Start of Each Segment 

It turns out there are infinite parameter settings of a neural network with 3 hidden neurons that will match this (or any) 3 segment spline, so here we’ll focus on a particular solution. We’ll start by setting the first layer weights, [w1, w2, w3], to 1, as even with this constraint our network is still flexible enough to match any 3 segment spline.

With this constraint, we can think of the hidden neuron’s bias as uniquely determining when the neuron "turns on." If wi = 1, then hi = x + bi, and hi surpasses 0 and turns on when x = -bi.

We want each hidden neuron to "turn on" at the start of each line segment in our spline, which will allow the network to start behaving like different functions at these points. The first, second, and third segments start at -3, -1, and 1 respectively, so we can set the hidden neuron biases to [b1, b2, b3] = [3, 1, -1].

Screenshot 2025-05-29 at 1.40.07 PM

Fit the First Segment

Here’s the critical part: observe that with these biases, when x < -1, both h2 and h3 are fixed at 0. The overall network obeys the equation.

Screenshot 2025-05-29 at 1.41.55 PM

-3 < x < -1 corresponds to the first spline segment, fit by the function y = .5x + .5, so we know that our network fits the spline when:

Screenshot 2025-05-29 at 1.43.13 PM

Another way to think about this; the function for h1 needs to be move down by 1 and have its slope reduced by .5 to match the function for y at the first line segment.

Screenshot 2025-05-29 at 1.44.22 PM

Fitting Additional Segments 

At this point there are 2 parameters left to fit w5 and w6, which must cause the network to match the remaining two line segments. The procedure for finding these weights is the same for both – or any number of additional segments. 

Observe at the start of the second segment, h2 "turns on," thus our network equation changes from the first segment and (y = .5x + .5) becomes

Screenshot 2025-05-29 at 1.47.11 PM

The second line segment obeys the equation y = x + 1, so we want

Screenshot 2025-05-29 at 1.48.03 PM

The same logic applies to finding the last parameter w6, which must change the equation y = x + 1 to y = -2x + 4:

Screenshot 2025-05-29 at 1.49.03 PMA simple way to think about this is to recognize each additional neuron simply changes the slope of the line. In our spline the 3 line segments have slopes .5, 1, and -2. When each additional neuron turns on, its weight is added to the overall slope.

Screenshot 2025-05-29 at 1.49.39 PM

Graphically the progression looks like this: 

Screenshot 2025-05-29 at 1.50.17 PM

Conclusion  

 Hopefully it’s clear from this example why neural networks can be fit to any continuous curve. In our example we had 3 hidden neurons, which allowed us to draw 3 line segments. But with more hidden neurons we could add more segments, updating the slope of our line with each time.  

Chris attended Tufts University, where he double majored in Math and Philosophy. After working as a computational research assistant at the Stanford Cognitive Systems Neuroscience Lab and completing a PhD at the Harvard Vision Sciences Lab, Chris now works as an AI researcher at FAR labs.

Related Content

Did you know we offer tutoring for all subjects?

Learn more

Comments