clear
echo on
%This program illustrates the calculation of error
%in linear regression.  In my laboratory we are
%developing a separation process in which the
%transport of solute molecules is selectively
%enhanced by oscillating fluid along the length of
%a pore or a fiber in a membrane.  The paper describing
%these experiments recently appeared in AIChE Journal.
%We obtained the following data for enhancement
%in the transport of 1-butanol and t-butanol as a
%function of tidal displacement (the amplitude of
%oscillation):
pause
tidal=[.0544;.102;.131;.276;.132;.131;.161;.163;.182;.181];
enhancetbut=[.287;1.15;1.45;8.17;1.44;1.58;2.29;2.19;2.88;2.04]*10^4;
enhance1but=[.407;1.72;2.18;10.06;1.93;2.18;3.17;3.08;4.04;2.66]*10^4;
pause
%Let's display this in matrix form:
format short e
[tidal,enhance1but,enhancetbut]
format short
pause
%Let's plot this up:
loglog(tidal,[enhance1but,enhancetbut],'o')
xlabel('tidal displacement')
ylabel('enhancement in transport')
%As you can see from this plot, we have a power law
%dependence of the enhancement on the tidal displacement.
%Also, the enhancement for 1-butanol is greater than
%that of t-butanol, forming the basis for separation.
pause
%From theory we expect the enhancement to go as the
%square of the tidal displacement.  We wish to determine
%if this is satisfied by the data to within error, and
%to determine the ratio of the enhancements, again with
%error limits.  Let's do this using linear regression.
pause
%We expect a power law dependence on tidal displacement,
%thus we must first convert the model from this non-linear
%form to a linear form:
%
%   enhancement = c1 * tidal ^ c2
%
%becomes:
%
%   log(enhancement) = log(c1) + c2 * log(tidal)
%
%The modeling parameters are thus the log of the amplitude
%and the exponent.
pause
%We can use this to set up the matrix A (it will be the
%same for both 1 and t butanol):
n=length(tidal);
a=[ones(n,1),log(tidal)]
pause
%and the two right hand sides:
b1=log(enhance1but);
bt=log(enhancetbut);
%or for display we can put these together:
bcomb=[b1,bt]
pause
%Using the normal equations we can solve for the best
%fit parameters.  We shall solve the problem via the
%relation x = inv(A'*A)*A' * b as described in class.
%Thus:
k=inv(a'*a)*a';
%Let's print out the transpose of this matrix:
k'
pause
%We can actually solve for both of the right hand side
%vectors at the same time:
x=k*bcomb;
x1=x(:,1)
xt=x(:,2)
pause
%As can be seen, the exponent for both sets of data is
%similar, and close to the expected value of 2.0.  We can
%compare this to the data:
hold on
plot(tidal,exp(a*x))
hold off
%which fits the data pretty well.
pause
%Now we calculate the error in the measurements and the error
%in the slope.  First we get the error in the measurements.
%This is just the square of the 2-norm of the residual,
%divided by n-2 because we have two fitting parameters:
r=a*x-bcomb
pause
%Note that this got the residual for both data sets at the
%same time!.  The variance is:
var=sum(r.*r)/(n-2);
std1=var(1)^.5
stdt=var(2)^.5
%thus the error in both 1 and t butanol measurements are
%pretty similar.  Note that the error given here is actually
%the error in the logarithm of the enhancement, since that
%is what we are fitting the model to.
pause
%Now we calculate the error this causes in the calculated
%exponent.  We get the exponent using the second column of
%the matrix K, thus:
varexp=k(2,:)*k(2,:)'*var;
stdexp1=varexp(1)^.5
stdexpt=varexp(2)^.5
%Thus both lie within about one standard deviation of the
%expected value of 2.0.
pause
%Finally, we want to look at the ratio of the transport rates
%since this is what determines the selectivity.  Because the
%exponents are slightly different for the two enhancements, it
%is reasonable to compare the enhancements in the middle of the
%range over which measurements are made.  Thus, we compare the
%enhancement at a tidal displacement of 0.15.  We now calculate
%both the enhancement and the error in the enhancement for each
%species investigated at that tidal displacement.  Thus:
pause
enhance=exp([1,log(.15)]*x);
enhance1=enhance(1)
enhancet=enhance(2)
ratio=enhance1/enhancet
%so there is some selectivity provided by the oscillation scheme.
pause
%Now for the error in each enhancement at a tidal displacement
%of 0.15.  We can get at this by using the vector formula described
%in class:
avec=[1;log(0.15)];
prod=(k'*avec)'*(k'*avec);
predvar=prod*var
pause
%This is the variance in the predicted values - the logarithm
%of the predicted enhancements for 1 butanol and t butanol.  If
%we want the error in the actual enhancement itself, we have to
%use the formula for error propagation for a function of a 
%random variable.  In this case the enhancement is just the
%exponential of the point predicted by the fitted curve, so the
%error calculation is extremely simple.  The error in the -logarithm-
%of a random variable is approximately equal (for small variations)
%to the relative error of the variable itself!  Thus:
pause
relpredstd1=predvar(1)^.5
relpredstdt=predvar(2)^.5
%where these quantities represent the fractional standard deviation
%in the enhancement (the standard deviation by the enhancement).
%The deviation in each of the predicted enhancements is rather
%small, only a bit more than 5% of the predicted enhancement.
pause
%If we wanted to find out the error in the ratio, we could use the
%standard formula for the error in a ratio assuming that the
%variability in the two enhancements was independent.  This would
%give for the error:
pause
ratio
relratiostd=sum(predvar)^.5
%which is pretty large considering that the deviation of the ratio
%in enhancement from unity (which provides the selectivity of the
%process) is only 0.38.  This error, however,is actually incorrect!
%This is because there is a significant degree of covariance between
%the scatter in the observed enhancements of 1 and t butanol.  You
%can see this from the original plot of the data, noticing that while
%the enhancement about each of the fitted lines has a fair amount of
%scatter, the separation between the 1 and t butanol data is pretty
%consistent, indicating a constant ratio.  How do we deal with this
%covariance?
pause
%The simplest way of doing this is to go back to the original data
%and fit a power law model directly to the ratio, rather than the
%individual enhancements.  In this case, the expected power law
%exponent should be zero (the dependence on tidal displacement cancels
%out) and the modeling function, evaluated at a tidal displacement
%of 0.15 should yield both the correct value and the correct error.
%Let's do this:
pause
%First we set up the new right hand side for the regression problem:
bratio=b1-bt
%Note that subtraction of logarithms is equivalent to division in
%the original enhancement.
pause
%The regression matrix remains unchanged, so the solution to the
%regression problem is just:
xratio=k*bratio
pause
%Let's plot this up:
loglog(tidal,exp(bratio),'o')
xlabel('tidal displacement')
ylabel('selectivity')
hold on
loglog(tidal,exp(a*xratio))
hold off
%So from the plot we see that the selectivity is a slightly decreasing
%function of the tidal displacement, in disagreement with the theory.
%Let's see if this discrepancy is statistically significant.
pause
%To do this we calculate the variability in the ratio (actually the
%log of the ratio, remember):
ratiovar=(bratio-a*xratio)'*(bratio-a*xratio)/(n-2);
%And thus we get the error in the two coefficients:
xratiovar=sum(k'.^2)'*ratiovar;
xratio(2)
expstd=xratiovar(2)^.5
pause
%So the decrease in the selectivity with tidal displacement is
%significant at the 97.5% confidence level.  This is because the
%probability of the true exponent being more than two standard
%deviations -above- the observed value is about 2.5%.  Note that
%there is also a 2.5% probability that the exponent is more than
%2 sigma -below- the observed mean as well, so the 2 sigma confidence
%-interval- is the 95% confidence interval.
pause
%OK, now let's get the selectivity ratio at the intermediate tidal
%displacement:
exactratio=exp(avec'*xratio)
prod=(k'*avec)'*(k'*avec);
exactvar=prod*ratiovar;
exactratiorelstd=exactvar^.5
%So the true error in the selectivity ratio is five times smaller
%than what we calculated by ignoring the covariance!!
pause
%The conclusion from all of this is that you have to be very careful
%in analyzing your data.  If we had just looked at the original slopes
%of the enhancement vs. tidal displacement plots, with error, we would
%conclude that both of the enhancements were well within statistical
%uncertainty of the theoretically predicted slope of two, and even
%more so close to each other.  This last part would have led us to
%the conclusion that selectivity is not affected by the tidal
%displacement.  A more precise statistical analysis shows just the
%opposite - that there is a small but statistically significant
%decrease in selectivity with stroke volume!
pause
%In conclusion, statistics is a rather slippery subject, and you
%must understand -exactly- what you are trying to calculate to do it
%right.  In particular, you must understand how all of the calculated
%variables relate to each other, and whether there is any covariance.
%Finally, remember that all of this deals with random error only - 
%if the number of data points is large, the overall error will almost
%certainly be dominated by the systematic error in the experiment, 
%about which these statistical studies can provide no information.
pause
echo off