This applet explores least squares linear regression fitting of polynomials to data. The user can see the computed least squares fit or guess a polynomial fit and compare it to the least squares fit. There are several methods for inputting or generating new data.

The data model
This applet uses a model that assumes that \(x\) is somehow given and that

\[y = P(x) + e,\]

where \(P(x)\) is a polynomial of a specified order and \(e\) is a random deviation from \(P(x)\) which has mean \(0\) and is independent of \(x\). We also assume that for each data point \(e\) was independently drawn from the same distribution.

The goal of the least squares fit is to find the polynomial \(P_b(x)\) that best fits the data.

The applet uses three polynonials:

Some elements are always available and their behavior is independent of the state of the applet.

The other controls change the state of the applet. They determine what is shown in the graphs, which coefficients the b-sliders control and how new data is added.

Visibility of graphics and error readouts

Data generation

The applet allows new data to be created in several ways. It can also be removed.

The average squared error is indicated by \(\sum \frac{e_i^2}{n}\). That is, \(\sum \frac{e_j^2}{n} = \sum\frac{(y_j - P(x_j))^2}{n}\) where \(P(x)\) is either \(P_b\) or \(P_g\).
If the Best fit checkbox is checked then the average squared error for the best fit is shown in the same color as the text in the checkbox. Likewise if the guessed fit checkbox is checked then the average squared error for the guessed fit is shown in the same color as that checkbox.

*Technically this is not exactly true. If a randomly generated point is outside the range of the plot, then it is rejected and another point is generated. The polynomials and range of \(\sigma\) are chosen so that this is extremely rare. Thus, \(e\) is very close to normally distributed.

© 2014 J. Orloff, H. Miller, J. Claus, H. Petrow