clear
echo on
%In this example we demonstrate the
%techniques of undersampling and the
%bootstrap to calculate the error in
%fitted regression parameters.  We shall
%use a linear regression model in order
%to compare the error with the exact
%formula derived in class, however these
%techniques work equally well on non-linear
%regression problems.
pause
%We shall use our ball-in-the air example
%again.  We set up an array of 100 times:
t=[0:.025:2.5]';
%and the corresponding "data set":
b=(-0.5*t.^2+1.25*t)*9.8+randn(size(t));
%and we solve the regression problem:
a=[ones(size(t)),t,t.^2];
k=inv(a'*a)*a';
x=k*b
pause
%We can estimate the regression error using
%our standard formulae:
[n m]=size(a);
varb=norm(a*x-b)^2/(n-m);
%and thus:
varx=varb*k*k'
pause
%Now we shall try the undersampling approach.
%Let's undersample the data set 10 times.  Thus
%we have to solve the regression problem (a smaller
%one anyway) p=10 times.  Thus:
p=10;
for i=1:p
 bus=b(i:p:n);
 aus=a(i:p:n,:);
 xus(:,i)=aus\bus;
 echo off
end
echo on
pause
%We now have a matrix of values of the fitted
%parameters xus.  Each column of xus is the 
%parameter values corresponding to one of the
%samples of the data set.  We may calculate the
%averages:
avexus=(sum(xus')')/p
%which is close to the value which we had before:
ratio=avexus./x
%particularly for the value of the acceleration.
%The parameter x(1) is a bit off, because it has
%an expectation value of zero for this particular
%problem.
pause
%Now we are ready to compute the error in the
%fitted parameters using our matrix xus:
for i=1:m
 for j=1:m
  varxus(i,j)=(xus(i,:)*xus(j,:)'/p-avexus(i)*avexus(j))/(p-1);
  echo off
 end
end
echo on
varxus
pause
%Finally, we compare this to varx:
ratiovar=varxus./varx
%which closely matches to within about a factor
%of two.  The number of data points would have to be
%much larger for a more accurate estimate.
pause
%Now let's look at the technique of bootstrapping.
%Here we want to periodically replicate our data
%an infinite number of times and then extract N
%data points from this infinite sample.  We calculate
%the regression parameters from this new set, and
%then repeat the operation a total of p times.  We
%calculate the matrix of covariance for the parameters
%by comparing these values.
pause
%Rather than actually generating an infinite array
%of times and positions, we simply generate a vector
%of random indices pointing to the times and positions.
%Thus:
pb=1000;
for i=1:pb
 index=ceil(rand(n,1)*n);
 bboot=b(index);
 aboot=a(index,:);
 xboot(:,i)=aboot\bboot;
 echo off
end
echo on
%Note that this took longer.  That is because we
%are solving for pb times as many points as in the
%undersampling approach!
pause
%Now we evaluate these pb sets of fitting parameters
%as before.  First we compute the averages:
avexboot=(sum(xboot')')/pb
%which is close to the value which we had before:
ratio=avexboot./x
pause
%Now we are ready to compute the error in the
%fitted parameters using our matrix xboot:
for i=1:m
 for j=1:m
  varxboot(i,j)=(xboot(i,:)*xboot(j,:)'/pb- ...
  avexboot(i)*avexboot(j))*pb/(pb-1);
  echo off
 end
end
echo on
varxboot
pause
%Finally, we compare this to varx:
ratiovar=varxboot./varx
%which matches again to within about a factor
%of two.  The number of samples would have to be
%much larger for a more accurate estimate.  You
%will also get a different answer every time you
%run this!