Statistic Library / Reference

Omikron Basic on the Internet: http://www.berkhan.de

General  -turn page-  Table of Contents

2. Statistic Library Reference
 
2.1 Log in and Log off
2.2 Calculating Basic Statistics

2.3 Distribution and Test Functions
2.4 Some Random Number Generators
2.5 Confidence Intervals
2.6 Testing Agreement with Specified Nominal Values
2.7 Comparing Two Samples
2.8 Goodness of Fit Tests
2.9 Multifold Tables
2.10 Analysis of Variance
2.11 Regression Problems
2.12 Time Series Analysis
2.13 Numerical Functions
2.14 Input/Output Procedures

 
This section will serve to explain the procedures and functions of the Statistic Library. However, it is not possible to illustrate the theoretical background of each individual command at this point. If you are working with the Library more often, it will probably be unavoidable to purchase a statistics methodology book. We recommend the book by SACHS (see bibliography), which explains a vast variety of statistical methods in a very practically oriented way.





2.1 Log in and Log off
 
Stat_Init
Call this procedure one time at the beginning of your program. You cannot use the Statistic Library before.
Important: This procedure changes the DATA pointer. So, if you want to read your own DATA after calling Stat_Init you previously have to restore the DATA pointer to the desired DATA.
 
 
Stat_Exit
Call this procedure one time at the end of your program. After that you cannot use the Statistic Library anymore.
 
 
Statistic
A copyright message of the Statistic Library is displayed.

 
 



2.2 Calculating Basic Statistics
 
FN Mean#(&X#(),N)
X#(1:N) Individual values of sample.
N Number of values.
Calculates the mean of the sample X#().
 
 
FN Variance#(&X#(),N)
X#(1:N) Individual values of sample.
N Number of values.
Calculates the variance of the sample X#().
 
 
FN St_Dev#(&X#(),N)
X#(1:N) Individual values of sample.
N Number of values.
Calculates die standard deviation.
 
 
Mean_Variance &X#(),N,R Mean#,R Var#
X#(1:N) Individual values of sample.
N Number of values.
Mean# Mean of the sample.
Var# Variance of the sample.
Calculates the mean and the variance of the individual values contained in X#(1) to X#(N).
 
 
FN Mean_Sample#(&X#(,),N)
X#(1:N,0:1) Individual values in X#(1:N,0)and respective frequency in X#(1:N,1).
N Number of values.
Calculates the weighted mean of the N sample values.
 
 
FN Variance_Sample#(&X#(,),N)
X#(1:N,0:1) Individual values in X#(1:N,0)and respective frequency in X#(1:N,1).
N Number of values.
Calculates the weighted variance of the N sample values.
 
 
FN St_Dev_Sample#(&X#(,),N)
X#(1:N,0:1) Individual values in X#(1:N,0)and respective frequency in X#(1:N,1).
N Number of values.
Calculates the weighted standard deviation of the N sample values.
 
 
FN Sigma_Approx#(Stdev#,N)
Stdev# Standard deviation.
N Number of values.
The true standard deviation of a normal population distribution yields a biased result due to the empirical variance calculated using the above functions. For N > 10, this function will correct the bias.
 
 
FN Variation_Coeff#(Stdev#,Mean#)
Stdev# Standard deviation.
Mean# Mean.
Calculates the coefficient of variation, i.e., the standard deviation in units of the arithmetical means.
 
 
FN Variation_Coeff_Rel#(Stdev#,Mean#,N)
Stdev# Standard deviation.
Mean# Mean.
N Number of values.
Calculates the relative coefficient of variation, i.e., the coefficient of variation in percent.
 
 
FN Mean_Geo#(&X#(),N)
X#(1:N) Individual values of sample.
N Number of values.
Calculates the geometric mean of the N individual values.
 
 
FN Mean_Harm#(&X#(),N)
X#(1:N) Individual values of sample.
N Number of values.
Calculates the harmonic mean of the N individual values.
 
 
FN Mean_Harm_Sample#(&X#(,),N)
X#(1:N,0:1) Individual values in X#(1:N,0)and respective frequency in X#(1:N,1).
N Number of values.
Calculates the weighted harmonic mean of the N individual values.
 
 




2.3 Distribution and Test Functions
 
Please Note: It is not possible to calculate all distribution and test functions accurately. Especially the inverse functions, which are important for tests, quite often have to be determined through zero algorithms. However, the functions may be used without any problems within the framework of the usual tables and return function values, which are all accurate to more than 3 decimal places, which suffices for normal applications.
 
 
FN Standard#(X#)
X# Variable.
Calculates the expectation value of the standard normal distribution.
 
 
FN Standard_D#(X#)
X# Variable.
Calculates the probability density of the standard normal distribution.
 
 
FN Standard_Inv#(P#)
P# Variable (0<P#<1).
Calculates the P# quantile of the standard normal distribution, i.e., its inverse function. Zero is returned if it was not possible to perform the calculation.
 
 
FN Normal#(X#,Mu#,Var#)
X# Variable.
Mu# Mean.
Var# Variance.
Calculates the expectation value of the general normal distribution.
 
 
FN Normal_D#(X#,Mu#,Var#)
X# Variable.
Mu# Mean.
Var# Variance.
Calculates die probability density of the general normal distribution.
 
 
FN Normal_Inv#(P#,Mu#,Var#)
P# Variable (0<P#<1).
Mu# Mean.
Var# Variance.
Calculates the P# quantile of the general normal distribution, i.e., its inverse function.
 
 
FN Student#(I,X#)
I Number of degrees of freedom.
X# Variable.
Calculates the expectation value of the Student distribution (t-distribution).
 
 
FN Student_D#(I,X#)
I Number of degrees of freedom.
X# Variable.
Calculates the probability density of the Student distribution (t-distribution).
 
 
FN Student_Inv#(I,P#)
I Number of degrees of freedom.
P# Variable (0<P#<1).
Calculates the P# quantile of the Student distribution (t-distribution), i.e., its inverse function. Zero is returned, if it was not possible to perform the calculation.
 
 
FN Chi2#(I,X#)
I Number of degrees of freedom(I>=1).
X# Variable (X#>=1).
Calculates the expectation value of the chi square distribution. Zero is returned, if it was not possible to perform the calculation.
 
 
FN Chi2_D#(I,X#)
I Number of degrees of freedom.
X# Variable.
Calculates the probability density of the chi square distribution.
 
 
FN Chi2_Inv#(I,P#)
I Number of degrees of freedom (I>=1).
P# Variable (0<P#<1).
Calculates the P# quantile of the chi square distribution, i.e., its inverse function. Zero is returned, if it was not possible to perform the calculation.
 
 
FN Fisher#(I1,I2,X#)
I1 Number of degrees of freedom in numerator (I1>=1).
I2 Number of degrees of freedom in denominator (I2>=1).
X# Variable.
Calculates the expectation value of the Fisher distribution (F-distribution). If it was not possible to perform the calculation, zero is returned.
 
 
FN Fisher_D#(I1,I2,X#)
I1 Number of degrees of freedom in numerator.
I2 Number of degrees of freedom in denominator.
X# Variable.
Calculates the probability density of the Fisher distribution (F-distribution).
 
 
FN Fisher_Inv#(I1,I2,P#)
I1 Number of degrees of freedom in numerator (I1>=1).
I2 Number of degrees of freedom in denominator (I2>=1).
P# Variable (0<P#<1).
Calculates the P# quantile of the Fisher distribution (F-distribution), meaning its inverse function. If it was not possible to perform the calculation, zero is returned.
 
 
FN Expo#(X#,Mu#)
X# Variable.
Mu# Mean.
Calculates the expectation value of the exponential distribution.
 
 
FN Expo_Inv#(P#,Mu#)
P# Variable (0<P#<1).
Mu# Mean.
Calculates the P# quantile of the exponential distribution, i.e., its inverse function.
 
 
FN Binomial#(X,N,P#)
X Desired number of elements with character A.
N Total number of elements.
P# Constant probability of success.
Calculates the expectation value of the binomial distribution.
 
 
FN Binomial_D#(X,N,P#)
X Desired number of elements with character A.
N Total number of elements.
P# Constant probability of success.
Calculates the probability density of the binomial distribution.
 
 
FN Hypergeo#(X,N,Sx,Sn)
X Desired number of elements with character A.
N Number of elements with character A.
Sx Sum of desired elements with character A and character B.
Sn Sum of all existing elements with character A and character B.
Calculates the expectation value of the hyper-geometric distribution.
 
 
FN Hypergeo_D#(X,N,Sx,Sn)
X Desired number of elements with character A.
N Number of elements with character A.
Sx Sum of desired elements with character A and character B.
Sn Sum of all existing elements with character A and character B.
Calculates the probability density of the hyper-geometric distribution.
 
 
FN Poisson#(X,Lambda#)
X Variable.
Lambda# Mean (is identical to variance in case of Poisson distribution).
Calculates the expectation value of the Poisson distribution.
 
 
FN Poisson_D#(X,Lambda#)
X Variable.
Lambda# Mean (is identical to variance in case of Poisson distribution).
Calculates the probability density of the Poisson distribution.
 
 




2.4 Some Random Number Generators
 
Random number generators are important aids for simulations. The Statistic Library offers you the following random number generators: standard normal distributed, normal distributed, chi square and Fisher distributed.
If you need other random number generators, just call the inverse of the required distribution using an
RND(0) instruction. For example: the function defined with FN Rnd_Expo#=FN Expo_Inv#(RND(0),5), yields the exponentially distributed random numbers with a mean of 5.
 
FN Rnd_Standard#(Dummy)
Dummy This parameter has no significance but has to be indicated nevertheless.
Calculates a standard normal distributed random number.
 
 
FN Rnd_Normal#(Mu#,Var#)
Mu# Mean.
Var# Variance.
Calculates a normal distributed random number.
 
 
FN Rnd_Chi2#(I)
I Number of degrees of freedom.
Calculates a chi square distributed random number.
 
 
FN Rnd_Fisher#(I1,I2)
I1 Number of degrees of freedom in numerator.
I2 Number of degrees of freedom in denominator.
Calculates a Fisher distributed random number.
 
 


2.5 Confidence Intervals
 
As already discussed above, the functions in 2.2 always supply nothing more than the approximate value for the mean and the variance, respectively. Using the method of "maximum likelihood," it is possible to prove that these values are the best values for the parameter of the distribution function of the population, which spawned the data, but it is still only an estimate. Therefore, one has to indicate confidence limits (confidence intervals) for all basic statistics. The following procedures may be used for the most frequent distribution functions:
 
Conf_Mean_Normal_One Mean#,Var#,N,Alp#,R L#,R R#,Flag
Mean# Mean estimated from the measured values.
Var# Variance of the population.
N Size of sample.
Alp# Accuracy of estimate (0.8<=Alp#<=1).
L# With the probability Alp#,the true mean is larger than this value. L#=0, if the calculation could not be performed.
R# With the probability Alp#, the true mean is smaller than than this value. R#=0, if the calculation could not be performed.
Flag In this parameter, a zero (0) has to be passed, if the variance was estimated from the sample. One (1) has to be passed if known from other information.
This procedure calculates the confidence interval for the mean with a normal distributed population and one-sided delimitation. This means that with a probability of Alp# , the mean is larger than L# or smaller than R#, respectively.

Example:
The example is a normal distributed sample with the mean
Mean#=39.55, a previously known variance Var#=9, a size of N=10, and a confidence probability of Alp#=0.95. Since the variance is already known, we used Flag=1 to obtain the result: L#=37.99.
Thus, the mean has a probability of 95% to be above 37.99.


Stat_Init
Conf_Mean_Normal_One 39.55,9,10,0.95,L#,R#,1
PRINT L#
INPUT "End with [Return]";Dummy
Stat_Exit
END
 
 
Conf_Mean_Normal_Two Mean#,Var#,N,Alp#,R L#,R R#,Flag
Mean# Mean estimated from the measured values.
Var# Variance of population.
N Size of sample.
Alp# Accuracy of estimate (0.8<=Alp#<=1).
L# Left limit of interval, encompassing the true mean with the probability of Alp#. L#=0, if the calculation could not be performed.
R# Right limit of interval, encompassing the true mean with the probability of Alp#. R#=0, if the calculation could not be performed.
Flag In this parameter, a zero (0) has to be passed, if the variance was estimated from the sample. One (1) has to be passed if known from other information.
This procedure calculates the confidence interval for the mean with a normal distributed population and two-sided delimitation. This means that with a probability of Alp# , the mean is located in the interval delimited by L# and R#.

Example:
As an example we chose a normal distributed sample with the mean
Mean#=39.55, a previously known variance of Var#=9, a size of N=10, and a confidence probability of Alp#=0.95. The variance is already known, we receive the following results using Flag=1: L#=37.69 and R#=41.41.
Thus, the mean has a 95% probability to be between 37.69 and 41.41.


Stat_Init
Conf_Mean_Normal_Two 39.55,9,10,0.95,L#,R#,1
PRINT L#,R#
INPUT "End with [Return]";Dummy
Stat_Exit
END
 
 
Conf_Var_Normal_One Var#,N,Alp#,R L#,R R#
Var# Variance estimated from the measured values.
N Size of sample.
Alp# Accuracy of estimate (0.8<=Alp#<=1).
L# With a probability of Alp#, the true variance is larger than this value. L#=0, if the calculation could not be performed.
R# With a probability of Alp#, the true variance is smaller than this value. R#=0, if the calculation could not be performed.
This procedure calculates the confidence interval for the variance with a normal distributed population and one-sided delimitation. This means that with a probability of Alp# , the variance is larger than L# or smaller than R#, respectively.
 
 
Conf_Var_Normal_Two Var#,N,Alp#,R L#,R R#
Var# Variance estimated from the measured values.
N Size of sample.
Alp# Accuracy of estimate (0.8<=Alp#<=1).
L# Left limit of interval, encompassing the true variance with the probability of Alp#. L#=0, if the calculation could not be performed.
R# Right limit of interval, encompassing the true variance with the probability of Alp#. R#=0, if the calculation could not be performed.
This procedure calculates the confidence interval for the variance with a normal distributed population and two-sided delimitation. This means that with a probability of Alp# , the mean is located in the interval delimited by L# and R#.
 
 
Conf_Sigma_Normal_One S#,N,Alp#,R L#,R R#
S# Standard deviation estimated from the measured values.
N Size of sample.
Alp# Accuracy of estimate (0.8<=Alp#<=1).
L# With a probability of Alp#, the true standard deviation is larger than this value. L#=0, if the calculation could not be performed.
R# With a probability of Alp#, the true standard deviation is smaller than this value. R#=0, if the calculation could not be performed.
This procedure calculates the confidence interval for the standard deviation with a normal distributed population and one-sided delimitation. This means that the standard deviation has the probability Alp# to be larger than L# or smaller than R#, respectively.
 
 
Conf_Sigma_Normal_Two S#,N,Alp#,R L#,R R#
S# Standard deviation estimated from the measured values.
N Size of sample.
Alp# Accuracy of estimate (0.8<=Alp#<=1).
L# Left limit of interval, encompassing the true standard deviation with the probability of Alp#. L#=0, if the calculation could not be performed.
R# Right limit of interval, encompassing the true standard deviation with the probability of Alp#. R#=0, if the calculation could not be performed.
This procedure calculates the confidence interval for the standard deviation with a normal distributed population and two-sided delimitation. This means that with a probability of Alp# , the standard deviation is located in the interval delimited by L# and R#.
 
 
Conf_Bin_P_Two X,N,Alp#,R L#,R R#
X Number of elements with character A estimated from the measured values.
N Size of sample.
Alp# Accuracy of estimate (0.8<=Alp#<=1).
L# Left limit of interval, encompassing the true true number of elements with character A with the probability of Alp#. L#=0, if the calculation could not be performed.
R# Right limit of interval, encompassing the true number of elements with character A with the probability of Alp#. R#=0, if the calculation could not be performed.
This procedure calculates the confidence interval for the number of elements with character A with a binomial distributed population and two-sided delimitation. This means that the number of elements with character A have the probability Alp# to be located in the interval delimited by L# and R#. The probability of success P# is linked with X through the formula P#=X/N.
 
 
Conf_Poisson_Lambda_Two X#,Alp#,R L#,R R#
X# Mean (=variance) estimated from the measured values.
Alp# Accuracy of estimate (0.8<=Alp#<=1).
L# Left limit of interval, encompassing the true mean (=variance) with the probability of Alp#. L#=0, if the calculation could not be performed.
R# Right limit of interval, encompassing the true mean (=variance) with the probability of Alp#. R#=0, if the calculation could not be performed.
This procedure calculates the confidence interval for the mean with a Poisson distributed population and two-sided delimitation. This means that the mean (=variance) has the probability Alp# to be located in the interval delimited by L# and R#.

 
 



2.6 Testing Agreement with Specified Nominal Values
 
The procedures described in this chapter serve to check whether the nominal values are met.
 
Test_Normal_Mu0_One Mu0#,Mean#,Var#,N,Alp#,R Res1,
R Res2,Flag
Mu0# Nominal value for mean.
Mean# Mean of the sample.
Var# Variance of the sample.
N Size of sample.
Alp# Confidence probability.
Res1 Results in 1, if the nominal value has been exceeded; otherwise 0.
Res2 Results in 1, if below nominal value; otherwise 0.
Flag If the variance was estimated from the sample, 0 has to be passed. If known from other information, 1 has to be passed.
Under the assumption of a normal distributed population, this procedure tests whether a nominal value fell below or was exceeded within the framework of the confidence probability.
 
 
Test_Normal_Mu0_Two Mu0#,Mean#,Var#,N,Alp#,R Res,Flag
Mu0# Nominal value for the mean.
Mean# Mean of the sample.
Var# Variance of the sample.
N Size of sample.
Alp# Confidence probability.
Res Results in 1, if the null hypothesis is true, i.e., if the mean deviates statistically significant from Mu0#; otherwise 0.
Flag If the variance was estimated from the sample, 0 has to be passed. If known from other information, 1 has to be passed.
Under the assumption of a normal distributed population, this procedure tests whether the nominal value for the mean is adhered to within the framework of the confidence probability.
 
 
Test_Normal_Var_One Var0#,Var#,N,Alp#,R Res1,R Res2
Var0# Nominal value for the variance.
Var# Variance of the sample.
N Size of sample.
Alp# Confidence probability.
Res1 Results in 1, if values fell below the nominal value; otherwise 0.
Res2 Results 1, if the nominal value was exceeded; otherwise 0.
Under the assumption of a normal distributed population, this procedure tests whether a nominal value for the variance fell below or was exceeded within the framework of the confidence probability.
 
 
Test_Normal_Var_Two Var0#,Var#,N,Alp#,R Res
Var0# Nominal value for the variance.
Var# Variance of the sample.
N Size of sample.
Alp# Confidence probability.
Res Results in 1, if the null hypothesis is true, i.e., if the variance is unequal the nominal value; otherwise 0.
Under the assumption of a normal distributed population, this procedure tests whether the variance of the sample is equal the nominal value of the variance within the framework of the confidence probability.
 
 
Test_Bin_P0_Two P0#,X,N,Alp#,R Res
P0# Nominal value for the probability.
X Number of elements with character A.
N Size of sample.
Alp# Confidence probability (0.8<=Alp#<=1).
Res Results in 1, if the null hypotheses is true, i.e., if the probability is unequal the nominal value; otherwise 0. if the test could not be performed, then Res=-1.
Under the assumption of a binomial distributed population, this procedure tests whether the probability of the sample equals the nominal value for the probability within the framework of the confidence probability.
 
 




2.7 Comparing Two Samples
 
When comparing two samples, it is best not to compare the measured values directly when using known distribution laws but rather to compare the parameters of the distributions. We have implemented procedures for normal and binomial populations to be used for comparisons.
 
 
Cmp_Normal_Mean_Two Mean1#,Var1#,N1,Mean2#,Var2#,
N2,Alp#,Flag,R Res
Mean1# Mean of the first sample.
Var1# Variance of the first sample.
N1 Size of the first sample.
Mean2# Mean of the second sample.
Var2# Variance of the second sample.
N2 Size of the second sample.
Alp# Confidence probability.
Flag You may pass 3 different values in Flag with the following significance:
Flag=1 : The variances are known from other sources.
Flag=2 : The variances are unknown but equal.
Flag=3 : The variances are unknown and different.
Res If the mean of both of the normally distributed populations match, Res=0; otherwise Res=1.
This procedure checks whether the mean of two normally distributed populations match.
 
 
Cmp_Normal_Var Var1#,N1,Var2#,N2,Alp#,R Res
Var1# Variance of the first sample.
N1 Size of the first sample.
Var2# Variance of the second sample.
N2 Size of the second sample.
Alp# Confidence probability.
Res If the variances of both of the normally distributed populations match, then Res=0; otherwise Res=1.
This procedure checks whether the variances of two normally distributed populations match.
 
 
Cmp_Binomial_P P1#,N1,P2#,N2,Alp#,R Res
P1# Probability of the first sample.
N1 Size of the first sample.
P2# Probability of the second sample.
N2 Size of the second sample.
Alp# Confidence probability.
Res If the relative frequencies of both of the binomially distributed populations match, then Res=0; otherwise Res=1. If the test could not be performed, then Res=-1.
Comparison of the relative frequencies of two binomially distributed populations with the fourfold test.
The comparison yields practical result only if the sample sizes are sufficiently large, that means if the following is valid:
N1+N2>=20 AND N1*(P1#*N1+P2#*N2)/(N1+N2)>=5 AND N2*(P1#*N1+P2#*N2)/(N1+N2)>=5
 
 
U_Test &X#(),M,&Y#(),N,Alp#,Flag,R Res
X#(1:M) First sample.
M Size of the first sample (M>=8).
Y#(1:N) Second sample.
N Size of the second sample (N>=8).
Alp# Confidence probability (0.8<=Alp#<=1).
Flag You may pass 2 different values in Flag with the following significance:
Flag=1 : One-sided test (Mean_X#<=Mean_Y# versus Mean_X#>Mean_Y#).
Flag=2 : Two-sided test.
Res If both samples originate from the same population, then Res=0; otherwise Res=1. If the test could not be performed, then Res=-1.
If information about the population does not exist, the so-called U-test can still yield rather acceptable results, whether the two samples originate from the same population or not.
 
 




2.8 Goodness of Fit Tests
 
The previous assumptions have always been that any information about the population from which the sample stems was available. In case the population is not known, it is possible to use the goodness of fit tests. The supposed probability function is passed to the following procedures, which in turn then check whether the samples stemmed from this function. The most important and in most cases completely sufficient tests for a discrete or continual distribution function are certainly contained in the Statistic Library. These procedures require classified data in the field X#(,). If only unclassified data exist, these have to be classified first. It was a quite conscious decision not to implement a procedure for this, because the classification of data depends too much on the procedure.

 
 
Fit_Uniform &X#(,),G,M,Alp#,R P#,R Res
X#(1:G,0:1) Contains the classified measured values.
X#(1:G,0)=
Class centers.
X#(1:G,1)= Class frequencies.
G Number of groups.
M Total number of measured values.
Alp# Confidence probability (0.8<=Alp#<=1).
P# The probability that a group will occur.
Res Res=1, if a goodness of fit to the uniform distribution is not possible; otherwise Res=0.
If an error has occurred (e.g.,
Alp#<0.8), then Res=-1.
This procedure performs the chi square goodness of fit test for an important special case, the uniform distribution.

Example:
Imagine a die being rolled 840 times. While doing this, the die shows the following points:

1 = 188 times
2 = 142 times
3 = 114 times
4 = 101 times
5 = 134 times
6 = 161 times

A test is to determine whether one can assume that each number of points occur with the same probability
P#=1/6. A simple BASIC program to test this hypothesis might be as follows:
Stat_Init
-Die:DATA 1,188,2,142,3,114,4,101,5,134,6,161
DIM A#(6,1)
RESTORE Die
FOR I=1 TO 6
 READ A#(I,0),A#(I,1)
NEXT I
Fit_Equal(&A#(,),6,840,0.95,P#,Result)
IF Result=1
 THEN PRINT "Goodness of fit NOT possible."
 ELSE PRINT "Goodness of fit using uniform"
  PRINT "distribution with P=";P#;" is possible."
ENDIF
INPUT "End with [Return]";Dummy
Stat_Exit
END
 
 
Fit_Binomial &X#(,),G,M,N,Mean#,Alp#,R P#,R Res
X#(1:G,0:1) Contains the classified measured values.
X#(1:G,0)=
Class centers.
X#(1:G,1)= Class frequencies.
G Number of groups.
M Total number of measured values.
N Parameters of the binomial distribution (number of independent repetitions).
Mean# Mean of the binomial distribution.
Alp# Confidence probability (0.8<=Alp#<=1).
P# The probability that a group will ocurr.
Res Res=1, if a goodness of fit to the binomial distribution is not possible; otherwise Res=0.
If an error has occurred (e.g.,
Alp#<0.8), then Res=-1.
Performs out a chi square goodness of fit test for a binomial distribution defined by N and Mean#.
 
 
Fit_General &X#(,),G,M,Alp#,&FN Prob#(0),R Res
X#(1:G,0:1) Contains the classified measured values.
X#(1:G,0)=
Class centers.
X#(1:G,1)= Class frequencies.
G Number of groups.
M Total number of measured values.
Alp# Confidence probability (0.8<=Alp#<=1).
FN Prob#(0) This is a probability function, which you have to define yourself and whose address has to be passed to the procedure. The name of the function may be changed of course.
Caution: It is absolutely necessary that a valid function pointer is passed; otherwise the system may experience serious crashes.
Res Res=1, if a goodness of fit to the function defined by you is not possible; otherwise Res=0.
If an error has occurred (e.g.,
Alp#<0.8), then Res=-1.
This procedure performs the chi square goodness of fit test for a general discrete probability function defined by you, which has to possess the following characteristics:
The function has to be a double float type and take over a parameter, in which the class centers are passed. The result must be that it returns the corresponding probability density, i.e., it has to be defined at least for the sample range of
X#(1:G,0).
 
 
Kolmo_Smir_Normal &X#(,),Mean#,Var#,G,M,Alp#,Flag,R Res
X#(1:G,0:1) Contains the classified measured values.
X#(1:G,0)=
Class centers.
X#(1:G,1)= Class frequencies.
Mean# Mean of the binomial distribution.
Var# Variance of the binomial distribution.
G Number of groups.
M Total number of measured values (M>=33).
Alp# Confidence probability (0.8<=Alp#<=1).
Flag If Flag=1, then the test will be significantly more accurate. However, only the values 0.8,0.85,0.9,0.95,0.99 are still permitted for Alp#.
Res Res=1, if a goodness of fit to the function defined by you is not possible; otherwise Res=0.
If an error has occurred (e.g.,
Alp#<0.8), then Res=-1.
If it is possible to assume that a continual probability function exists, then the Kolmogoroff-Smirnoff goodness of fit test is used. This procedure tests the goodness of fit to a normal distribution.
 
 
Kolmo_Smir_General &X#(,),G,M,Alp#,Flag,&FN Prob#(0),R Res
X#(1:G,0:1) Contains the classified measured values.
X#(1:G,0)=
Class centers.
X#(1:G,1)= Class frequencies.
G Number of groups.
M Total number of measured values (M>=33).
Alp# Confidence probability (0.8<=Alp#<=1).
Flag If Flag=1, then the test will be significantly more accurate. However, only the values 0.8,0.85,0.9,0.95,0.99 are still permitted for Alp#.
FN Prob#(0) This is a probability function, which you have to define yourself and whose address has to be passed to the procedure. The name of the function may be changed of course.
Caution: It is absolutely necessary that a valid function pointer is passed; otherwise the system may experience serious crashes.
Res Res=1, if a goodness of fit to the function defined by you is not possible; otherwise Res=0.
If an error has occurred (e.g.,
Alp#<0.8), then Res=-1.
This procedure performs the Kolmogoroff-Smirnoff goodness of fit test for a general discrete probability function defined by you, which has to have the following characteristics:
The function has to be of double float type and take over a parameter, in which the class centers are passed. The result must be that it returns the corresponding probability density, i.e., it has to be defined at least for the sample range of
X#(1:G,0).
 
 


2.9 Multifold Tables
 
Unfortunately, the exact use of fourfold, K*2, or RC fold tables cannot be explained here. For further details, please consult the literature referenced in the bibliography in the appendix.
 
Fourfold A,B,C,D,N,Alp#,Flag,R Res
A Number of elements with character (+) from first sample.
B Number of elements with alternate character (-) from first sample.
C Number of elements with character (+) from second sample.
D Number of elements with alternate character (-) from second sample.
N Size of both combined samples.
Alp# Confidence probability (0.8<=Alp#<=1).
Flag You may pass 2 different values in Flag with the following significance:
Flag=1 : One-sided test.
Flag=2 : Two-sided test.
Res If both samples stem from the same population, then Res=0; otherwise Res=1. If the test could not be performed, then Res=-1.
Performs the fourfold chi square test to check whether the samples in the fourfold schematic stem from the same population.

The schematic looks as follows:

I \ II

(+)

(-)

Total

1. Sample

A

B

A+B

2. Sample

C

D

C+D

Total

A+C

B+D

N

 
 
Brandt_Snedecor &X(,),K,Alp#,R Res
X#(1:K,0:1) X#(1:K,0)= Corresponding number of elements with
character (+).
X#(1:K,1)= Corresponding number of elements with
character (-).
K Number of samples.
Alp# Confidence probability (0.8<=Alp#<=1).
Res If the samples stem from the same population, then Res=0; otherwise Res=1.
If the test could not be performed, then
Res=-1.
This procedure performs the K*2 fold chi square test according to BRANDT and SNEDECOR. It is the result of the fourfold test, which considers the possibility that it might not always be practical or useful to examine only two samples for their respective homogeneity, but quite often rather has to examine K samples, which feature two characters (+) and (-). The K*2 fold schematic is represented in this procedure by the field X#(,).

Example from SACHS:

Please assume that a total of 80 patients have been treated in a course of therapy. Of this population, 40 patients have been treated only symptomatically (i.e., only the symptoms but none of the causes were treated). The other group with 40 patients received a standard dose of a new medication. The result of the treatment is expressed in the valence (occupation number) through the following 3*2 schematic (red sections):

Therapeutical
Success

Therapy

Total

Symptomatic

Specific
(normal dosage)

Cured Quickly

14

22

36

Cured Slowly

18

16

34

Deceased

8

2

10

Total

40

40

80


The objective is now to analyze on the 95% level whether the therapeutical results for both therapies were equal or whether they differed. For this evaluation, we only need the inner (red section) of the schematic, since the rest can be obtained through a simple summation:

Stat_Init
-Patients:DATA 14,22,18,16,8,2
DIM X#(3,1)
RESTORE Patients
FOR I=1 TO 3
 READ X#(I,0),X#(I,1)
NEXT I
Brandt_Snedecor(&X#(,),3,0.95,Result)
IF Result
 THEN PRINT "Different results"
 ELSE PRINT "Equal results"
ENDIF
INPUT "End with [Return]";Dummy
Stat_Exit
END
 
 
Twoway_R_C &X(,),R,C,Alp#,R Res
X(1:R,1:C) X#(1:K,1)= Corresponding number of elements with
character (1).
X#(1:K,2)= Corresponding number of elements with
character (2).
........
........
X#(1:K,C)= Corresponding number of elements with
character (C).
R Number of samples.
C Number of characters.
Alp# Confidence probability (0.8<=Alp#<=1).
Res If the samples stem from the same population, then Res=0, otherwise Res=1.
If the test could not be performed, then
Res=-1.
The R*C schematic is the expansion of the K*2 schematic to include R rows and C columns. Thus, we have not only two characters (+ and -) but C different characters. Otherwise, the calculation does not undergo any significant changes.

Example:
In order better to illustrate our point, we will take the example from the BRAND-SNEDECOR test, with the only difference that now a third group enters the picture, which has received twice the amount of the normal dosage (this example is also from the book by SACHS):

Therapeutical Success

Therapy

Total

Symptomatic

Specific

Normal Dosage

2x Normal Dosage

Cured Quickly

14

22

32

68

Cured Slowly

18

16

8

42

Deceased

8

2

0

10

Total

40

40

40

120



The objective is now to analyse on the 95% level whether the therapeutical results for all three therapies were equal or whether they differed. For this evaluation, we only need the inner (red section) of the schematic, since the rest can be obtained through a simple summation:
Stat_Init
-Patients:DATA 14,22,32,18,16,8,8,2,0
DIM X#(3,3)
RESTORE Patients
FOR I=1 TO 3
 FOR J=1 TO 3
  READ X#(I,J)
 NEXT J
NEXT I
Twoway_R_C(&X#(,),3,3,0.95,Result)
IF Result
 THEN PRINT "Different results"
 ELSE PRINT "Equal results"
ENDIF
INPUT "End with [Return]";Dummy
Stat_Exit
END

 
 



2.10 Analysis of Variance
 
The analysis of variance is one of the most difficult areas of the field of statistics. It was impossible to integrate many variance analytical methods because it is hardly possible to generalize the methods within the framework of these instructions. However, in order to prevent them from being completely lost, we have implemented at least one rather useful procedure, which calculates the data required for a twofold analysis of variance. The procedure in question is a BASIC version of the FORTRAN program by BRANDT, taken from his book "Datenanalyse" (Data Analysis).
 
Variance_Analysis &X#(,,),Ni,Nj,Nk,&Xbi#(),&Xbj#(),
&Xbij#(,),&Q#(),&Df#(),&F#()
X#(1:Ni,1:Nj,1:Nk) This field contains the measured values. The individual field element X#(I,J,K) indicates how many elements of the Kth sample in the Ith group carry the Jth character.
Ni Number of groups.
Nj Number of characters per sample.
Nk Number of samples per group.
Xbi#(1:Ni) This field has to be dimensioned to Ni. It will return the mean of the Ith group with all Nj characters about all Nk samples.
Xbj#(1:Nj) This field has to be dimensioned to Nj. It will return the mean of the Jth group with all Ni characters about all Nk samples.
Xbij#(1:Ni,1:Nj) This field has to be dimensioned to Ni,Nj. It will return the mean of the Ith group with all J characters about all Nk samples.



Q#(1:6) Q#(1)= Sum of the deviation squares of all Ni groups from the corresponding mean of the Ith group.
Q#(2)= Sum of the deviation squares of all Nj characters from the corresponding mean of the Jth characters.
Q#(3)= Sum of the deviation squares of all Ni groups from the corresponding mean of the Ith group and all Nj characters from the corresponding mean of the Jth characters.
Q#(4)=Q#(2)+Q#(3).
Q#(5)
= Sum of the deviation squares about all Ni*Nj*Nk measured values of the corresponding mean of the Ith group with the Jth characters about all Nk samples.
Q#(6)= Sum of the deviation squares about all Ni*Nj*Nk measured values from the mean of all measured values.
Df#(1:6) This field contains the degrees of freedom of the sum in Q#(1:6).
Df#(1)=Ni-1
Df#(2)=Nj-1
Df#(3)=(Ni-1)*(Nj-1)
Df#(4)=(Ni-1)*(Nj-1)*(Nk-1)
Df#(5)=Ni*Nj*(Nk-1)
Df#(6)=Ni*Nj*Nk-1
F#(1:4) Contains the F-quotients:
F#(I)=Q#(I)*Df#(5)/Df#(I)*Q#(5)
This procedure calculates the data required for a twofold analysis of variance from the measured values passed in X#(,,).

Example:
An example for this procedure can be found in the DEMO folder (VarianceAnalysis.BAS).

 
 



2.11 Regression Problems
 
Up to now we have dealt only with the comparison of measured values or with tests. Now we will add a new factor: the dependence of measured values on one another. Thus, the now described procedures are supposed to enable the approximation of dependent data by line, polynomial, hyperplane, and the like. We will begin with the simplest case, using a line to depict measured value pairs.
 


Lin_Reg &X#(,),N,R R#,R Ax#,R Bx#,R Ay#,R By#,R Sx#,R Sy#,
R Sxy#,R Syx2#,R Sxy2#
X#(1:N,0:1) measured values:
X values in
X#(1:N,0).
Y values in X#(1:N,1).
N Number of measurement points.
R# Correlation coefficient (the closer ABS(R#) is to 1, the better the correlation).
Ax# Slope of the first line.
Bx# Point of intersection of the first line with the X axis.
Ay# Slope of the second line.
By# Point of intersection of the second line with the Y axis.
Sx# Variance of the X values.
Sy# Variance of the Y values.
Sxy# Covariance.
Syx2# Residual variance concerning the Y line.
Sxy2# Residual variance concerning the X line.
This procedure calculates both of the regression lines X=Ax*Y+Bx and Y=Ay*X+By.

Example:
An example for the application of this procedure can be found in the DEMO folder (LinearRegression.BAS).
 
 
Test_Corr R#,N,Alp#,R Res
R# Correlation coefficient such as, e.g., returned from Lin_Reg.
N Number of measurement points.
Alp# Confidence probability.
Res Res=1, if no correlation exists; otherwise Res=0.
This procedure may be used to check whether a correlation between the two variables of the linear regression truly exists.
 
 
Cmp_Rho Rho#,R#,N,Alp#,R Res
Rho# Predefined correlation coefficient with which should be compared.
R# Correlation coefficient such as, e.g., returned from Lin_Reg.
N Number of measurement points.
Alp# Confidence probability.
Res Res=0, if both of the correlation coefficients match within the framework of the confidence probability; otherwise Res=1.
This useful procedure compares the correlation coefficient of a sample with N points with a specified correlation coefficient.
 
 
Mult_Reg &X#(,),N,M,&Y#(),&Z#()
X#(1:N,0:M) N measured values of M independent variables:
Constant term in
X#(1:N,0).
Measured values of the first independent variable in X#(1:N,1).
.......
.......

Measured values of the Mth independent variable in X#(1:N,M).
N Number of measurement points.
M Number of independent variables.
Y#(1:N) N measured values of the dependent variables.
Z#(1:M,0) Contains the coefficients of the hyperplane after the regression has been performed.
This procedure calculates the multiple regression of N measured values of M dependent variables, using the method of the smallest squares according to Gauss, which meet the requirement for "maximum likelihood."
We would like to take this opportunity to recommend the excellent book "Applied Regression Analysis," which also explains the work with even better procedures, dealing with the correlation coefficient between the dependent variables. You should always have these returned to you with the procedure
Correlation_Coeffs, in order to check whether it is necessary to drag along all of the variables.

Example:
An example for the application of this procedure can be found in the DEMO folder (MultiLinearRegression.BAS). At this point, we would like to emphasize that the procedure cannot directly adapt a third degree polynomial, for example, to the measured values. Here you have to substitute the squared and cubed terms of the variable
X with Y=X^2 and Z=X^3. The demo program explains this as well.
 
 
Correlation_Coeffs &X#(,),N,M,&R#(,)
X#(1:N,0:M) N measured values of M independent variables:
Constant term in
X#(1:N,0).
Measured values of the first independent variables in X#(1:N,1).
.......
.......

Measured values of the Mth independent variables in X#(1:N,M).
N Number of measurement points.
M Number of independent variables.
R#(0:M,0:M) This field returns the correlation coefficients between the variables.
This procedure calculates the correlation coefficients between the M independent variables.
 
 

2.12 Time Series Analysis
 
Only one efficient procedure for a time series analysis is currently in existence: it serves to calculate a autocorrelation function of a series of equidistant measured values. For example: Let's assume that the oxygen content in a hothouse is being measured hourly for one year, and that the object is to find out whether the oxygen production is time dependent (e.g., more at night than during the day). Of course, you could draw a curve of your measured values for this analysis; however, this is rather bothersome if many points exist. It is simpler to use a mathematical method to counter shift the measurement curve and to calculate how well this function matches itself. The Statistic Library will take over this task for you.


Autocorr &X#(),N,M,&R#(),Flag
X#(1:N) Contains the measured values.
N Number of measured values.
M Maximum counter shift of X#(1:N).
R#(1:M) This field returns the calculated correlation coefficients for each of the M shifts.
Flag If Flag=1, then X#(1:N) will be normalized. It is recommended to perform this normalization anytime the mean of X#(1:N) is unequal zero.
This procedure calculates the autocorrelation coefficient for the N measured values in X#(1:N). If many measured values exist, this procedure may take some time. Therefore, do not think right away that your computer has crashed.

Example:
An example for the application of this procedure can be found in the DEMO folder (Autocorrelation.BAS). You should have a look at that program before working with real data.





2.13 Numerical Functions
 
The functions and procedures described below have proven themselves as rather useful when programming statistical problems. They all stem from the area of numerical analyses or serve to depict measured values.
 
FN Binomial#(N,K)
N Upper part of the binomial coefficient.
K Lower part of the binomial coefficient.
Calculates the binomial coefficient even if N and K is very large, without producing an overflow, by permanently shortening terms of the numerator against terms of the denominator during the analysis.
 
 
Normalize &X#(,),N,R Sum#
X#(1:N,1) This field has to be used to pass the values to be normalized. After the return, this field will then contain the relative frequencies.
N Number of values.
Sum# Sum of all values.
This procedure normalizes the grouped data in X#(1:N,1) in such a way as to feature the relative frequency after the call instead of the group frequency. This procedure is very useful when calculating percentages.





2.14 Input and Output Procedures
 
Now a few procedures for the simple gathering of data and the display and depiction of this data.
 
Input_1 &X#(),N,Flag
X#(1:N) This is the field receiving the read data.
N Number of measured values.
Flag If Flag=1, then the data are read-in from a file using INPUT #1. Of course, this file has to have been opened first using OPEN "I",1,Fsspec$.
If
Flag=0, then the data will be fetched from the keyboard, which means it has to be typed in individually.
This procedure serves to enter a one-dimensional field with N elements.
 
 
Input_2 &X#(,),N,M,Flag
X#(1:N,1:M) Data is entered into this field in such a way as to read-in the elements of the second index (1:M) for each constant first index.
N Maximum first index.
M Maximum second index.
Flag If Flag=1, then the data are read-in from a file using INPUT #1. Of course, this file has to have been opened first using OPEN "I",1,Fsspec$.
If
Flag=0, then the data will be fetched from the keyboard, which means it has to be typed in individually.
This procedure serves to enter a two-dimensional field with N columns and M rows.
 
 
Block_Chart &X#(,),N,X,Y,B,H[,Flag]
X#(1:N,1) Contains the values to be depicted.
N Number of values.
X X value of upper left corner of diagram.
Y Y value of upper left corner of diagram.
B Width of diagram.
H Height of diagram.
Flag If Flag=0 or is omitted, the type 2 fill styles are used for the individual segments (FILL STYLE= 2,1:24). This has its advantages when display is in black and white (e.g., on printer).
If
Flag=1, then the colors 16 to 255 (FILL COLOR= 16:255) are used in a solid fill (FILL STYLE= 1,1).
If
Flag=3, then the colors 16 to 255 (FILL COLOR= 16:255) as well as the fill styles (FILL STYLE= 2,1:24) will be varied.
If all fill styles or colors have been exhausted, the cycle starts again from the beginning.

This procedure draws a block diagram into the area defined by X,Y,B,H. The lower 10% of the box remain free for any possible labeling by the user. All other drawing attributes (e.g., line thickness, outer line, etc.) can be set by the user. It is also possible to reroute output with the BASIC command GRAF_PORT.

Example:
An example for the application of this procedure can be found in the DEMO folder (Chart.BAS).
 
 
Pie_Chart &X#(,),X,Y,R[,Flag]
X#(1:N,1) Contains the values to be depicted.
X X value of the center of the circular chart.
Y Y value of the center of the circular chart.
R Radius of the circular chart.
Flag If Flag=0 or is omitted, the type 2 fill styles are used for the individual segments (FILL STYLE= 2,1:24). This has its advantages when display is in black and white (e.g., on printer).
If
Flag=1, then the colors 16 to 255 (FILL COLOR= 16:255) are used in a solid fill (FILL STYLE= 1,1).
If
Flag=3, then the colors 16 to 255 (FILL COLOR= 16:255) as well as the fill styles (FILL STYLE= 2,1:24) will be varied.
If all fill styles or colors have been exhausted, the cycle starts again from the beginning.
This procedure draws a circular chart into the area defined by X,Y,R. The procedure only sets the fill style and/or the fill color. All other drawing attributes (e.g., line thickness, outer line, etc.) can be set by the user. It is also possible to reroute output with the BASIC command GRAF_PORT.

Example:
An example for the application of this procedure can be found in the DEMO folder (Chart.BAS).
 
 
Plot_2d_1 &X#(,),N,X,Y,B,H,R X_Max,R X_Min,R Y_Max,R Y_Min
X#(1:N,0:1) Contains the function values:
X values in
X#(1:N,0).
Y values in
X#(1:N,1).
N Number of value pairs.
X X value of upper left corner of diagram.
Y Y value of upper left corner of diagram.
B Width of diagram.
H Height of diagram.
X_Max Maximum value of X coordinates in X#(1:N,0).
X_Min Minimum value of X coordinates in X#(1:N,0).
Y_Max Maximum value of Y coordinates in X#(1:N,1).
Y_Min Minimum value of Y coordinates in X#(1:N,1).
This procedure draws the value pairs contained in X#(1:N,0:1) into the box defined by X,Y,B,H. The function values are represented by little crosses. Five percent of the size of each box is available for your labels.
 
 
Plot_2d_2 &X#(,),N,X,Y,B,H
X#(1:N,0:1) Contains the function values:
X values in
X#(1:N,0).
Y values in
X#(1:N,1).
N Number of value pairs.
X X value of upper left corner of diagram.
Y Y value of upper left corner of diagram.
B Width of diagram.
H Height of diagram.
This procedure draws a curve through the points in X#(1:N,0:1). Thus, they have to be sorted.
 
 
Plot_2d_3 &X#(,),N,X,Y,B,H,Ax#,Bx#
X#(1:N,0:1) Contains the function values:
X values in
X#(1:N,0).
Y values in
X#(1:N,1).
N Number of value pairs.
X X value of upper left corner of diagram.
Y Y value of upper left corner of diagram.
B Width of diagram.
H Height of diagram.
Ax# Slope of the line.
Bx# Point of intersection of the line with the Y axis.
This procedure draws the points in X#(1:N,0:1) and the line Y=Ax*X+Bx into the same diagram.
 
 
Plot_2d_4 &X#(,),N,X,Y,B,H,X_Max#,X_Min#,Y_Max#,Y_Min#
X#(1:N,0:1) Contains the function values:
X values in
X#(1:N,0).
Y values in
X#(1:N,1).
N Number of value pairs.
XI X value of upper left corner of diagram.
YI Y value of upper left corner of diagram.
B Width of diagram.
H Height of diagram.
X_Max# Maximum value of X coordinates.
X_Min# Minimum value of X coordinates.
Y_Max# Maximum value of Y coordinates.
Y_Min# Minimum value of Y coordinates.
This procedure draws a curve through the points in X#(1:N,0:1) into the system of coordinates described by the remaining parameters. The points have to be sorted.
 
 
 

General  -turn page-  Table of Contents

Omikron Basic on the Internet: http://www.berkhan.de

Copyright 1998 by Berkhan-Software