|
Statistic Library / Reference |
|
Omikron Basic on the Internet: http://www.berkhan.de |
|
|
| This section will serve to explain the procedures and functions of the Statistic Library. However, it is not possible to illustrate the theoretical background of each individual command at this point. If you are working with the Library more often, it will probably be unavoidable to purchase a statistics methodology book. We recommend the book by SACHS (see bibliography), which explains a vast variety of statistical methods in a very practically oriented way. |
| 2.1 Log in and Log off | ||||
|
||||
| Stat_Exit | |
| Call this procedure one time at the end of your program. After that you cannot use the Statistic Library anymore. |
| Statistic | |
| A copyright message of the Statistic Library is displayed. |
| 2.2 Calculating Basic Statistics | ||||||||
|
||||||||
| FN Variance#(&X#(),N) | |
| X#(1:N) | Individual values of sample. |
| N | Number of values. |
| Calculates the variance of the sample X#(). | |
| FN St_Dev#(&X#(),N) | |
| X#(1:N) | Individual values of sample. |
| N | Number of values. |
| Calculates die standard deviation. | |
| Mean_Variance &X#(),N,R Mean#,R Var# | |
| X#(1:N) | Individual values of sample. |
| N | Number of values. |
| Mean# | Mean of the sample. |
| Var# | Variance of the sample. |
| Calculates the mean and the variance of the individual values contained in X#(1) to X#(N). | |
| FN Mean_Sample#(&X#(,),N) | |
| X#(1:N,0:1) | Individual values in X#(1:N,0)and respective frequency in X#(1:N,1). |
| N | Number of values. |
| Calculates the weighted mean of the N sample values. | |
| FN Variance_Sample#(&X#(,),N) | |
| X#(1:N,0:1) | Individual values in X#(1:N,0)and respective frequency in X#(1:N,1). |
| N | Number of values. |
| Calculates the weighted variance of the N sample values. | |
| FN St_Dev_Sample#(&X#(,),N) | |
| X#(1:N,0:1) | Individual values in X#(1:N,0)and respective frequency in X#(1:N,1). |
| N | Number of values. |
| Calculates the weighted standard deviation of the N sample values. | |
| FN Sigma_Approx#(Stdev#,N) | |
| Stdev# | Standard deviation. |
| N | Number of values. |
| The true standard deviation of a normal population distribution yields a biased result due to the empirical variance calculated using the above functions. For N > 10, this function will correct the bias. | |
| FN Variation_Coeff#(Stdev#,Mean#) | |
| Stdev# | Standard deviation. |
| Mean# | Mean. |
| Calculates the coefficient of variation, i.e., the standard deviation in units of the arithmetical means. | |
| FN Variation_Coeff_Rel#(Stdev#,Mean#,N) | |
| Stdev# | Standard deviation. |
| Mean# | Mean. |
| N | Number of values. |
| Calculates the relative coefficient of variation, i.e., the coefficient of variation in percent. | |
| FN Mean_Geo#(&X#(),N) | |
| X#(1:N) | Individual values of sample. |
| N | Number of values. |
| Calculates the geometric mean of the N individual values. | |
| FN Mean_Harm#(&X#(),N) | |
| X#(1:N) | Individual values of sample. |
| N | Number of values. |
| Calculates the harmonic mean of the N individual values. | |
| FN Mean_Harm_Sample#(&X#(,),N) | |
| X#(1:N,0:1) | Individual values in X#(1:N,0)and respective frequency in X#(1:N,1). |
| N | Number of values. |
| Calculates the weighted harmonic mean of the N individual values. | |
| 2.3 Distribution and Test Functions |
| Please Note: It is not possible to calculate all distribution and test functions accurately. Especially the inverse functions, which are important for tests, quite often have to be determined through zero algorithms. However, the functions may be used without any problems within the framework of the usual tables and return function values, which are all accurate to more than 3 decimal places, which suffices for normal applications. |
| FN Standard#(X#) | |
| X# | Variable. |
| Calculates the expectation value of the standard normal distribution. | |
| FN Standard_D#(X#) | |
| X# | Variable. |
| Calculates the probability density of the standard normal distribution. | |
| FN Standard_Inv#(P#) | |
| P# | Variable (0<P#<1). |
| Calculates the P# quantile of the standard normal distribution, i.e., its inverse function. Zero is returned if it was not possible to perform the calculation. | |
| FN Normal#(X#,Mu#,Var#) | |
| X# | Variable. |
| Mu# | Mean. |
| Var# | Variance. |
| Calculates the expectation value of the general normal distribution. | |
| FN Normal_D#(X#,Mu#,Var#) | |
| X# | Variable. |
| Mu# | Mean. |
| Var# | Variance. |
| Calculates die probability density of the general normal distribution. | |
| FN Normal_Inv#(P#,Mu#,Var#) | |
| P# | Variable (0<P#<1). |
| Mu# | Mean. |
| Var# | Variance. |
| Calculates the P# quantile of the general normal distribution, i.e., its inverse function. | |
| FN Student#(I,X#) | |
| I | Number of degrees of freedom. |
| X# | Variable. |
| Calculates the expectation value of the Student distribution (t-distribution). | |
| FN Student_D#(I,X#) | |
| I | Number of degrees of freedom. |
| X# | Variable. |
| Calculates the probability density of the Student distribution (t-distribution). | |
| FN Student_Inv#(I,P#) | |
| I | Number of degrees of freedom. |
| P# | Variable (0<P#<1). |
| Calculates the P# quantile of the Student distribution (t-distribution), i.e., its inverse function. Zero is returned, if it was not possible to perform the calculation. | |
| FN Chi2#(I,X#) | |
| I | Number of degrees of freedom(I>=1). |
| X# | Variable (X#>=1). |
| Calculates the expectation value of the chi square distribution. Zero is returned, if it was not possible to perform the calculation. | |
| FN Chi2_D#(I,X#) | |
| I | Number of degrees of freedom. |
| X# | Variable. |
| Calculates the probability density of the chi square distribution. | |
| FN Chi2_Inv#(I,P#) | |
| I | Number of degrees of freedom (I>=1). |
| P# | Variable (0<P#<1). |
| Calculates the P# quantile of the chi square distribution, i.e., its inverse function. Zero is returned, if it was not possible to perform the calculation. | |
| FN Fisher#(I1,I2,X#) | |
| I1 | Number of degrees of freedom in numerator (I1>=1). |
| I2 | Number of degrees of freedom in denominator (I2>=1). |
| X# | Variable. |
| Calculates the expectation value of the Fisher distribution (F-distribution). If it was not possible to perform the calculation, zero is returned. | |
| FN Fisher_D#(I1,I2,X#) | |
| I1 | Number of degrees of freedom in numerator. |
| I2 | Number of degrees of freedom in denominator. |
| X# | Variable. |
| Calculates the probability density of the Fisher distribution (F-distribution). | |
| FN Fisher_Inv#(I1,I2,P#) | |
| I1 | Number of degrees of freedom in numerator (I1>=1). |
| I2 | Number of degrees of freedom in denominator (I2>=1). |
| P# | Variable (0<P#<1). |
| Calculates the P# quantile of the Fisher distribution (F-distribution), meaning its inverse function. If it was not possible to perform the calculation, zero is returned. | |
| FN Expo#(X#,Mu#) | |
| X# | Variable. |
| Mu# | Mean. |
| Calculates the expectation value of the exponential distribution. | |
| FN Expo_Inv#(P#,Mu#) | |
| P# | Variable (0<P#<1). |
| Mu# | Mean. |
| Calculates the P# quantile of the exponential distribution, i.e., its inverse function. | |
| FN Binomial#(X,N,P#) | |
| X | Desired number of elements with character A. |
| N | Total number of elements. |
| P# | Constant probability of success. |
| Calculates the expectation value of the binomial distribution. | |
| FN Binomial_D#(X,N,P#) | |
| X | Desired number of elements with character A. |
| N | Total number of elements. |
| P# | Constant probability of success. |
| Calculates the probability density of the binomial distribution. | |
| FN Hypergeo#(X,N,Sx,Sn) | |
| X | Desired number of elements with character A. |
| N | Number of elements with character A. |
| Sx | Sum of desired elements with character A and character B. |
| Sn | Sum of all existing elements with character A and character B. |
| Calculates the expectation value of the hyper-geometric distribution. | |
| FN Hypergeo_D#(X,N,Sx,Sn) | |
| X | Desired number of elements with character A. |
| N | Number of elements with character A. |
| Sx | Sum of desired elements with character A and character B. |
| Sn | Sum of all existing elements with character A and character B. |
| Calculates the probability density of the hyper-geometric distribution. | |
| FN Poisson#(X,Lambda#) | |
| X | Variable. |
| Lambda# | Mean (is identical to variance in case of Poisson distribution). |
| Calculates the expectation value of the Poisson distribution. | |
| FN Poisson_D#(X,Lambda#) | |
| X | Variable. |
| Lambda# | Mean (is identical to variance in case of Poisson distribution). |
| Calculates the probability density of the Poisson distribution. | |
| 2.4 Some Random Number Generators |
| Random number generators are important aids for simulations. The Statistic Library offers you the
following random number generators: standard normal distributed, normal distributed, chi square and Fisher distributed.
If you need other random number generators, just call the inverse of the required distribution using an RND(0) instruction. For example: the function defined with FN Rnd_Expo#=FN Expo_Inv#(RND(0),5), yields the exponentially distributed random numbers with a mean of 5. |
| FN Rnd_Standard#(Dummy) | |
| Dummy | This parameter has no significance but has to be indicated nevertheless. |
| Calculates a standard normal distributed random number. | |
| FN Rnd_Normal#(Mu#,Var#) | |
| Mu# | Mean. |
| Var# | Variance. |
| Calculates a normal distributed random number. | |
| FN Rnd_Chi2#(I) | |
| I | Number of degrees of freedom. |
| Calculates a chi square distributed random number. | |
| FN Rnd_Fisher#(I1,I2) | |
| I1 | Number of degrees of freedom in numerator. |
| I2 | Number of degrees of freedom in denominator. |
| Calculates a Fisher distributed random number. | |
| 2.5 Confidence Intervals |
| As already discussed above, the functions in 2.2 always supply nothing more than the approximate value for the mean and the variance, respectively. Using the method of "maximum likelihood," it is possible to prove that these values are the best values for the parameter of the distribution function of the population, which spawned the data, but it is still only an estimate. Therefore, one has to indicate confidence limits (confidence intervals) for all basic statistics. The following procedures may be used for the most frequent distribution functions: |
| Conf_Mean_Normal_One Mean#,Var#,N,Alp#,R L#,R R#,Flag | |
| Mean# | Mean estimated from the measured values. |
| Var# | Variance of the population. |
| N | Size of sample. |
| Alp# | Accuracy of estimate (0.8<=Alp#<=1). |
| L# | With the probability Alp#,the true mean is larger than this value. L#=0, if the calculation could not be performed. |
| R# | With the probability Alp#, the true mean is smaller than than this value. R#=0, if the calculation could not be performed. |
| Flag | In this parameter, a zero (0) has to be passed, if the variance was estimated from the sample. One (1) has to be passed if known from other information. |
| This procedure calculates the confidence interval for the mean with a normal distributed population and one-sided delimitation. This means that with a probability of Alp# , the mean is larger than L# or smaller than R#, respectively. | |
| Example: The example is a normal distributed sample with the mean Mean#=39.55, a previously known variance Var#=9, a size of N=10, and a confidence probability of Alp#=0.95. Since the variance is already known, we used Flag=1 to obtain the result: L#=37.99. Thus, the mean has a probability of 95% to be above 37.99. Stat_Init Conf_Mean_Normal_One 39.55,9,10,0.95,L#,R#,1 PRINT L# INPUT "End with [Return]";Dummy Stat_Exit END |
| Conf_Mean_Normal_Two Mean#,Var#,N,Alp#,R L#,R R#,Flag | |
| Mean# | Mean estimated from the measured values. |
| Var# | Variance of population. |
| N | Size of sample. |
| Alp# | Accuracy of estimate (0.8<=Alp#<=1). |
| L# | Left limit of interval, encompassing the true mean with the probability of Alp#. L#=0, if the calculation could not be performed. |
| R# | Right limit of interval, encompassing the true mean with the probability of Alp#. R#=0, if the calculation could not be performed. |
| Flag | In this parameter, a zero (0) has to be passed, if the variance was estimated from the sample. One (1) has to be passed if known from other information. |
| This procedure calculates the confidence interval for the mean with a normal distributed population and two-sided delimitation. This means that with a probability of Alp# , the mean is located in the interval delimited by L# and R#. | |
| Example: As an example we chose a normal distributed sample with the mean Mean#=39.55, a previously known variance of Var#=9, a size of N=10, and a confidence probability of Alp#=0.95. The variance is already known, we receive the following results using Flag=1: L#=37.69 and R#=41.41. Thus, the mean has a 95% probability to be between 37.69 and 41.41. Stat_Init Conf_Mean_Normal_Two 39.55,9,10,0.95,L#,R#,1 PRINT L#,R# INPUT "End with [Return]";Dummy Stat_Exit END |
| Conf_Var_Normal_One Var#,N,Alp#,R L#,R R# | |
| Var# | Variance estimated from the measured values. |
| N | Size of sample. |
| Alp# | Accuracy of estimate (0.8<=Alp#<=1). |
| L# | With a probability of Alp#, the true variance is larger than this value. L#=0, if the calculation could not be performed. |
| R# | With a probability of Alp#, the true variance is smaller than this value. R#=0, if the calculation could not be performed. |
| This procedure calculates the confidence interval for the variance with a normal distributed population and one-sided delimitation. This means that with a probability of Alp# , the variance is larger than L# or smaller than R#, respectively. | |
| Conf_Var_Normal_Two Var#,N,Alp#,R L#,R R# | |
| Var# | Variance estimated from the measured values. |
| N | Size of sample. |
| Alp# | Accuracy of estimate (0.8<=Alp#<=1). |
| L# | Left limit of interval, encompassing the true variance with the probability of Alp#. L#=0, if the calculation could not be performed. |
| R# | Right limit of interval, encompassing the true variance with the probability of Alp#. R#=0, if the calculation could not be performed. |
| This procedure calculates the confidence interval for the variance with a normal distributed population and two-sided delimitation. This means that with a probability of Alp# , the mean is located in the interval delimited by L# and R#. | |
| Conf_Sigma_Normal_One S#,N,Alp#,R L#,R R# | |
| S# | Standard deviation estimated from the measured values. |
| N | Size of sample. |
| Alp# | Accuracy of estimate (0.8<=Alp#<=1). |
| L# | With a probability of Alp#, the true standard deviation is larger than this value. L#=0, if the calculation could not be performed. |
| R# | With a probability of Alp#, the true standard deviation is smaller than this value. R#=0, if the calculation could not be performed. |
| This procedure calculates the confidence interval for the standard deviation with a normal distributed population and one-sided delimitation. This means that the standard deviation has the probability Alp# to be larger than L# or smaller than R#, respectively. | |
| Conf_Sigma_Normal_Two S#,N,Alp#,R L#,R R# | |
| S# | Standard deviation estimated from the measured values. |
| N | Size of sample. |
| Alp# | Accuracy of estimate (0.8<=Alp#<=1). |
| L# | Left limit of interval, encompassing the true standard deviation with the probability of Alp#. L#=0, if the calculation could not be performed. |
| R# | Right limit of interval, encompassing the true standard deviation with the probability of Alp#. R#=0, if the calculation could not be performed. |
| This procedure calculates the confidence interval for the standard deviation with a normal distributed population and two-sided delimitation. This means that with a probability of Alp# , the standard deviation is located in the interval delimited by L# and R#. | |
| Conf_Bin_P_Two X,N,Alp#,R L#,R R# | |
| X | Number of elements with character A estimated from the measured values. |
| N | Size of sample. |
| Alp# | Accuracy of estimate (0.8<=Alp#<=1). |
| L# | Left limit of interval, encompassing the true true number of elements with character A with the probability of Alp#. L#=0, if the calculation could not be performed. |
| R# | Right limit of interval, encompassing the true number of elements with character A with the probability of Alp#. R#=0, if the calculation could not be performed. |
| This procedure calculates the confidence interval for the number of elements with character A with a binomial distributed population and two-sided delimitation. This means that the number of elements with character A have the probability Alp# to be located in the interval delimited by L# and R#. The probability of success P# is linked with X through the formula P#=X/N. | |
| Conf_Poisson_Lambda_Two X#,Alp#,R L#,R R# | |
| X# | Mean (=variance) estimated from the measured values. |
| Alp# | Accuracy of estimate (0.8<=Alp#<=1). |
| L# | Left limit of interval, encompassing the true mean (=variance) with the probability of Alp#. L#=0, if the calculation could not be performed. |
| R# | Right limit of interval, encompassing the true mean (=variance) with the probability of Alp#. R#=0, if the calculation could not be performed. |
| This procedure calculates the confidence interval for the mean with a Poisson distributed population and two-sided delimitation. This means that the mean (=variance) has the probability Alp# to be located in the interval delimited by L# and R#. | |
| 2.6 Testing Agreement with Specified Nominal Values | ||||||||||||||||||||
| The procedures described in this chapter serve to check whether the nominal values are met. | ||||||||||||||||||||
|
||||||||||||||||||||
| Test_Normal_Mu0_Two Mu0#,Mean#,Var#,N,Alp#,R Res,Flag | |
| Mu0# | Nominal value for the mean. |
| Mean# | Mean of the sample. |
| Var# | Variance of the sample. |
| N | Size of sample. |
| Alp# | Confidence probability. |
| Res | Results in 1, if the null hypothesis is true, i.e., if the mean deviates statistically significant from Mu0#; otherwise 0. |
| Flag | If the variance was estimated from the sample, 0 has to be passed. If known from other information, 1 has to be passed. |
| Under the assumption of a normal distributed population, this procedure tests whether the nominal value for the mean is adhered to within the framework of the confidence probability. | |
| Test_Normal_Var_One Var0#,Var#,N,Alp#,R Res1,R Res2 | |
| Var0# | Nominal value for the variance. |
| Var# | Variance of the sample. |
| N | Size of sample. |
| Alp# | Confidence probability. |
| Res1 | Results in 1, if values fell below the nominal value; otherwise 0. |
| Res2 | Results 1, if the nominal value was exceeded; otherwise 0. |
| Under the assumption of a normal distributed population, this procedure tests whether a nominal value for the variance fell below or was exceeded within the framework of the confidence probability. | |
| Test_Normal_Var_Two Var0#,Var#,N,Alp#,R Res | |
| Var0# | Nominal value for the variance. |
| Var# | Variance of the sample. |
| N | Size of sample. |
| Alp# | Confidence probability. |
| Res | Results in 1, if the null hypothesis is true, i.e., if the variance is unequal the nominal value; otherwise 0. |
| Under the assumption of a normal distributed population, this procedure tests whether the variance of the sample is equal the nominal value of the variance within the framework of the confidence probability. | |
| Test_Bin_P0_Two P0#,X,N,Alp#,R Res | |
| P0# | Nominal value for the probability. |
| X | Number of elements with character A. |
| N | Size of sample. |
| Alp# | Confidence probability (0.8<=Alp#<=1). |
| Res | Results in 1, if the null hypotheses is true, i.e., if the probability is unequal the nominal value; otherwise 0. if the test could not be performed, then Res=-1. |
| Under the assumption of a binomial distributed population, this procedure tests whether the probability of the sample equals the nominal value for the probability within the framework of the confidence probability. | |
| 2.7 Comparing Two Samples |
| When comparing two samples, it is best not to compare the measured values directly when using known distribution laws but rather to compare the parameters of the distributions. We have implemented procedures for normal and binomial populations to be used for comparisons. |
| Cmp_Normal_Mean_Two Mean1#,Var1#,N1,Mean2#,Var2#, N2,Alp#,Flag,R Res |
|
| Mean1# | Mean of the first sample. |
| Var1# | Variance of the first sample. |
| N1 | Size of the first sample. |
| Mean2# | Mean of the second sample. |
| Var2# | Variance of the second sample. |
| N2 | Size of the second sample. |
| Alp# | Confidence probability. |
| Flag | You may pass 3 different values in Flag with
the following significance: Flag=1 : The variances are known from other sources. Flag=2 : The variances are unknown but equal. Flag=3 : The variances are unknown and different. |
| Res | If the mean of both of the normally distributed populations match, Res=0; otherwise Res=1. |
| This procedure checks whether the mean of two normally distributed populations match. | |
| Cmp_Normal_Var Var1#,N1,Var2#,N2,Alp#,R Res | |
| Var1# | Variance of the first sample. |
| N1 | Size of the first sample. |
| Var2# | Variance of the second sample. |
| N2 | Size of the second sample. |
| Alp# | Confidence probability. |
| Res | If the variances of both of the normally distributed populations match, then Res=0; otherwise Res=1. |
| This procedure checks whether the variances of two normally distributed populations match. | |
| Cmp_Binomial_P P1#,N1,P2#,N2,Alp#,R Res | |
| P1# | Probability of the first sample. |
| N1 | Size of the first sample. |
| P2# | Probability of the second sample. |
| N2 | Size of the second sample. |
| Alp# | Confidence probability. |
| Res | If the relative frequencies of both of the binomially distributed populations match, then Res=0; otherwise Res=1. If the test could not be performed, then Res=-1. |
| Comparison of the relative frequencies of two binomially distributed populations with the fourfold
test. The comparison yields practical result only if the sample sizes are sufficiently large, that means if the following is valid: N1+N2>=20 AND N1*(P1#*N1+P2#*N2)/(N1+N2)>=5 AND N2*(P1#*N1+P2#*N2)/(N1+N2)>=5 |
|
| U_Test &X#(),M,&Y#(),N,Alp#,Flag,R Res | |
| X#(1:M) | First sample. |
| M | Size of the first sample (M>=8). |
| Y#(1:N) | Second sample. |
| N | Size of the second sample (N>=8). |
| Alp# | Confidence probability (0.8<=Alp#<=1). |
| Flag | You may pass 2 different values in Flag with
the following significance: Flag=1 : One-sided test (Mean_X#<=Mean_Y# versus Mean_X#>Mean_Y#). Flag=2 : Two-sided test. |
| Res | If both samples originate from the same population, then Res=0; otherwise Res=1. If the test could not be performed, then Res=-1. |
| If information about the population does not exist, the so-called U-test can still yield rather acceptable results, whether the two samples originate from the same population or not. | |
| 2.8 Goodness of Fit Tests |
| The previous assumptions have always been that any information about the population from which the sample stems was available. In case the population is not known, it is possible to use the goodness of fit tests. The supposed probability function is passed to the following procedures, which in turn then check whether the samples stemmed from this function. The most important and in most cases completely sufficient tests for a discrete or continual distribution function are certainly contained in the Statistic Library. These procedures require classified data in the field X#(,). If only unclassified data exist, these have to be classified first. It was a quite conscious decision not to implement a procedure for this, because the classification of data depends too much on the procedure. |
| Fit_Uniform &X#(,),G,M,Alp#,R P#,R Res | |
| X#(1:G,0:1) | Contains the classified measured values. X#(1:G,0)= Class centers. X#(1:G,1)= Class frequencies. |
| G | Number of groups. |
| M | Total number of measured values. |
| Alp# | Confidence probability (0.8<=Alp#<=1). |
| P# | The probability that a group will occur. |
| Res | Res=1, if a goodness of fit to the uniform distribution
is not possible; otherwise Res=0. If an error has occurred (e.g., Alp#<0.8), then Res=-1. |
| This procedure performs the chi square goodness of fit test for an important special case, the uniform distribution. | |
| Example: Imagine a die being rolled 840 times. While doing this, the die shows the following points: 1 = 188 times 2 = 142 times 3 = 114 times 4 = 101 times 5 = 134 times 6 = 161 times A test is to determine whether one can assume that each number of points occur with the same probability P#=1/6. A simple BASIC program to test this hypothesis might be as follows: |
| Stat_Init -Die:DATA 1,188,2,142,3,114,4,101,5,134,6,161 DIM A#(6,1) RESTORE Die FOR I=1 TO 6 READ A#(I,0),A#(I,1) NEXT I Fit_Equal(&A#(,),6,840,0.95,P#,Result) IF Result=1 THEN PRINT "Goodness of fit NOT possible." ELSE PRINT "Goodness of fit using uniform" PRINT "distribution with P=";P#;" is possible." ENDIF INPUT "End with [Return]";Dummy Stat_Exit END |
| Fit_Binomial &X#(,),G,M,N,Mean#,Alp#,R P#,R Res | |
| X#(1:G,0:1) | Contains the classified measured values. X#(1:G,0)= Class centers. X#(1:G,1)= Class frequencies. |
| G | Number of groups. |
| M | Total number of measured values. |
| N | Parameters of the binomial distribution (number of independent repetitions). |
| Mean# | Mean of the binomial distribution. |
| Alp# | Confidence probability (0.8<=Alp#<=1). |
| P# | The probability that a group will ocurr. |
| Res | Res=1, if a goodness of fit to the binomial distribution
is not possible; otherwise Res=0. If an error has occurred (e.g., Alp#<0.8), then Res=-1. |
| Performs out a chi square goodness of fit test for a binomial distribution defined by N and Mean#. | |
| Fit_General &X#(,),G,M,Alp#,&FN Prob#(0),R Res | |
| X#(1:G,0:1) | Contains the classified measured values. X#(1:G,0)= Class centers. X#(1:G,1)= Class frequencies. |
| G | Number of groups. |
| M | Total number of measured values. |
| Alp# | Confidence probability (0.8<=Alp#<=1). |
| FN Prob#(0) | This is a probability function, which you have to define yourself and whose address has to be passed
to the procedure. The name of the function may be changed of course. Caution: It is absolutely necessary that a valid function pointer is passed; otherwise the system may experience serious crashes. |
| Res | Res=1, if a goodness of fit to the function defined by
you is not possible; otherwise Res=0. If an error has occurred (e.g., Alp#<0.8), then Res=-1. |
| This procedure performs the chi square goodness of fit test for a general discrete probability function
defined by you, which has to possess the following characteristics: The function has to be a double float type and take over a parameter, in which the class centers are passed. The result must be that it returns the corresponding probability density, i.e., it has to be defined at least for the sample range of X#(1:G,0). |
|
| Kolmo_Smir_Normal &X#(,),Mean#,Var#,G,M,Alp#,Flag,R Res | |
| X#(1:G,0:1) | Contains the classified measured values. X#(1:G,0)= Class centers. X#(1:G,1)= Class frequencies. |
| Mean# | Mean of the binomial distribution. |
| Var# | Variance of the binomial distribution. |
| G | Number of groups. |
| M | Total number of measured values (M>=33). |
| Alp# | Confidence probability (0.8<=Alp#<=1). |
| Flag | If Flag=1, then the test will be significantly more accurate. However, only the values 0.8,0.85,0.9,0.95,0.99 are still permitted for Alp#. |
| Res | Res=1, if a goodness of fit to the function defined by
you is not possible; otherwise Res=0. If an error has occurred (e.g., Alp#<0.8), then Res=-1. |
| If it is possible to assume that a continual probability function exists, then the Kolmogoroff-Smirnoff goodness of fit test is used. This procedure tests the goodness of fit to a normal distribution. | |
| Kolmo_Smir_General &X#(,),G,M,Alp#,Flag,&FN Prob#(0),R Res | |
| X#(1:G,0:1) | Contains the classified measured values. X#(1:G,0)= Class centers. X#(1:G,1)= Class frequencies. |
| G | Number of groups. |
| M | Total number of measured values (M>=33). |
| Alp# | Confidence probability (0.8<=Alp#<=1). |
| Flag | If Flag=1, then the test will be significantly more accurate. However, only the values 0.8,0.85,0.9,0.95,0.99 are still permitted for Alp#. |
| FN Prob#(0) | This is a probability function, which you have to define yourself and whose address has to be passed
to the procedure. The name of the function may be changed of course. Caution: It is absolutely necessary that a valid function pointer is passed; otherwise the system may experience serious crashes. |
| Res | Res=1, if a goodness of fit to the function defined by
you is not possible; otherwise Res=0. If an error has occurred (e.g., Alp#<0.8), then Res=-1. |
| This procedure performs the Kolmogoroff-Smirnoff goodness of fit test for a general discrete probability
function defined by you, which has to have the following characteristics: The function has to be of double float type and take over a parameter, in which the class centers are passed. The result must be that it returns the corresponding probability density, i.e., it has to be defined at least for the sample range of X#(1:G,0). |
|
| 2.9 Multifold Tables | ||||||||||||||||||||||||||||||||||||
| Unfortunately, the exact use of fourfold, K*2, or RC fold tables cannot be explained here. For further details, please consult the literature referenced in the bibliography in the appendix. | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
| Brandt_Snedecor &X(,),K,Alp#,R Res | |
| X#(1:K,0:1) | X#(1:K,0)= Corresponding number of elements with character (+). X#(1:K,1)= Corresponding number of elements with character (-). |
| K | Number of samples. |
| Alp# | Confidence probability (0.8<=Alp#<=1). |
| Res | If the samples stem from the same population, then Res=0; otherwise Res=1. If the test could not be performed, then Res=-1. |
| This procedure performs the K*2 fold chi square test according to BRANDT and SNEDECOR. It is the result of the fourfold test, which considers the possibility that it might not always be practical or useful to examine only two samples for their respective homogeneity, but quite often rather has to examine K samples, which feature two characters (+) and (-). The K*2 fold schematic is represented in this procedure by the field X#(,). | |
| Example from SACHS: Please assume that a total of 80 patients have been treated in a course of therapy. Of this population, 40 patients have been treated only symptomatically (i.e., only the symptoms but none of the causes were treated). The other group with 40 patients received a standard dose of a new medication. The result of the treatment is expressed in the valence (occupation number) through the following 3*2 schematic (red sections): |
|
|||||||||||||||||||||||
| The objective is now to analyze on the 95% level whether the therapeutical results for both therapies were equal or whether they differed. For this evaluation, we only need the inner (red section) of the schematic, since the rest can be obtained through a simple summation: |
| Stat_Init -Patients:DATA 14,22,18,16,8,2 DIM X#(3,1) RESTORE Patients FOR I=1 TO 3 READ X#(I,0),X#(I,1) NEXT I Brandt_Snedecor(&X#(,),3,0.95,Result) IF Result THEN PRINT "Different results" ELSE PRINT "Equal results" ENDIF INPUT "End with [Return]";Dummy Stat_Exit END |
| Twoway_R_C &X(,),R,C,Alp#,R Res | |
| X(1:R,1:C) | X#(1:K,1)= Corresponding number of elements with character (1). X#(1:K,2)= Corresponding number of elements with character (2). ........ ........ X#(1:K,C)= Corresponding number of elements with character (C). |
| R | Number of samples. |
| C | Number of characters. |
| Alp# | Confidence probability (0.8<=Alp#<=1). |
| Res | If the samples stem from the same population, then Res=0, otherwise Res=1. If the test could not be performed, then Res=-1. |
| The R*C schematic is the expansion of the K*2 schematic to include R rows and C columns. Thus, we have not only two characters (+ and -) but C different characters. Otherwise, the calculation does not undergo any significant changes. | |
| Example: In order better to illustrate our point, we will take the example from the BRAND-SNEDECOR test, with the only difference that now a third group enters the picture, which has received twice the amount of the normal dosage (this example is also from the book by SACHS):
The objective is now to analyse on the 95% level whether the therapeutical results for all three therapies were equal or whether they differed. For this evaluation, we only need the inner (red section) of the schematic, since the rest can be obtained through a simple summation: |
||||||||||||||||||||||||||||||||
| Stat_Init -Patients:DATA 14,22,32,18,16,8,8,2,0 DIM X#(3,3) RESTORE Patients FOR I=1 TO 3 FOR J=1 TO 3 READ X#(I,J) NEXT J NEXT I Twoway_R_C(&X#(,),3,3,0.95,Result) IF Result THEN PRINT "Different results" ELSE PRINT "Equal results" ENDIF INPUT "End with [Return]";Dummy Stat_Exit END |
| 2.10 Analysis of Variance | ||||||||||||||||
| The analysis of variance is one of the most difficult areas of the field of statistics. It was impossible to integrate many variance analytical methods because it is hardly possible to generalize the methods within the framework of these instructions. However, in order to prevent them from being completely lost, we have implemented at least one rather useful procedure, which calculates the data required for a twofold analysis of variance. The procedure in question is a BASIC version of the FORTRAN program by BRANDT, taken from his book "Datenanalyse" (Data Analysis). | ||||||||||||||||
|
||||||||||||||||
| Q#(1:6) | Q#(1)= Sum of the deviation squares of all Ni groups from the corresponding mean of the Ith group. Q#(2)= Sum of the deviation squares of all Nj characters from the corresponding mean of the Jth characters. Q#(3)= Sum of the deviation squares of all Ni groups from the corresponding mean of the Ith group and all Nj characters from the corresponding mean of the Jth characters. Q#(4)=Q#(2)+Q#(3). Q#(5)= Sum of the deviation squares about all Ni*Nj*Nk measured values of the corresponding mean of the Ith group with the Jth characters about all Nk samples. Q#(6)= Sum of the deviation squares about all Ni*Nj*Nk measured values from the mean of all measured values. |
| Df#(1:6) | This field contains the degrees of freedom of the sum in Q#(1:6). Df#(1)=Ni-1 Df#(2)=Nj-1 Df#(3)=(Ni-1)*(Nj-1) Df#(4)=(Ni-1)*(Nj-1)*(Nk-1) Df#(5)=Ni*Nj*(Nk-1) Df#(6)=Ni*Nj*Nk-1 |
| F#(1:4) | Contains the F-quotients: F#(I)=Q#(I)*Df#(5)/Df#(I)*Q#(5) |
| This procedure calculates the data required for a twofold analysis of variance from the measured values passed in X#(,,). | |
| Example: An example for this procedure can be found in the DEMO folder (VarianceAnalysis.BAS). |
| 2.11 Regression Problems |
| Up to now we have dealt only with the comparison of measured values or with tests. Now we will add a new factor: the dependence of measured values on one another. Thus, the now described procedures are supposed to enable the approximation of dependent data by line, polynomial, hyperplane, and the like. We will begin with the simplest case, using a line to depict measured value pairs. |
| Lin_Reg &X#(,),N,R R#,R Ax#,R Bx#,R Ay#,R By#,R Sx#,R Sy#, R Sxy#,R Syx2#,R Sxy2# |
|
| X#(1:N,0:1) | measured values: X values in X#(1:N,0). Y values in X#(1:N,1). |
| N | Number of measurement points. |
| R# | Correlation coefficient (the closer ABS(R#) is to 1, the better the correlation). |
| Ax# | Slope of the first line. |
| Bx# | Point of intersection of the first line with the X axis. |
| Ay# | Slope of the second line. |
| By# | Point of intersection of the second line with the Y axis. |
| Sx# | Variance of the X values. |
| Sy# | Variance of the Y values. |
| Sxy# | Covariance. |
| Syx2# | Residual variance concerning the Y line. |
| Sxy2# | Residual variance concerning the X line. |
| This procedure calculates both of the regression lines X=Ax*Y+Bx and Y=Ay*X+By. | |
| Example: An example for the application of this procedure can be found in the DEMO folder (LinearRegression.BAS). |
| Test_Corr R#,N,Alp#,R Res | |
| R# | Correlation coefficient such as, e.g., returned from Lin_Reg. |
| N | Number of measurement points. |
| Alp# | Confidence probability. |
| Res | Res=1, if no correlation exists; otherwise Res=0. |
| This procedure may be used to check whether a correlation between the two variables of the linear regression truly exists. | |
| Cmp_Rho Rho#,R#,N,Alp#,R Res | |
| Rho# | Predefined correlation coefficient with which should be compared. |
| R# | Correlation coefficient such as, e.g., returned from Lin_Reg. |
| N | Number of measurement points. |
| Alp# | Confidence probability. |
| Res | Res=0, if both of the correlation coefficients match within the framework of the confidence probability; otherwise Res=1. |
| This useful procedure compares the correlation coefficient of a sample with N points with a specified correlation coefficient. | |
| Mult_Reg &X#(,),N,M,&Y#(),&Z#() | |
| X#(1:N,0:M) | N measured values of M independent variables: Constant term in X#(1:N,0). Measured values of the first independent variable in X#(1:N,1). ....... ....... Measured values of the Mth independent variable in X#(1:N,M). |
| N | Number of measurement points. |
| M | Number of independent variables. |
| Y#(1:N) | N measured values of the dependent variables. |
| Z#(1:M,0) | Contains the coefficients of the hyperplane after the regression has been performed. |
| This procedure calculates the multiple regression of N measured values of M dependent variables,
using the method of the smallest squares according to Gauss, which meet the requirement for "maximum likelihood." We would like to take this opportunity to recommend the excellent book "Applied Regression Analysis," which also explains the work with even better procedures, dealing with the correlation coefficient between the dependent variables. You should always have these returned to you with the procedure Correlation_Coeffs, in order to check whether it is necessary to drag along all of the variables. |
|
| Example: An example for the application of this procedure can be found in the DEMO folder (MultiLinearRegression.BAS). At this point, we would like to emphasize that the procedure cannot directly adapt a third degree polynomial, for example, to the measured values. Here you have to substitute the squared and cubed terms of the variable X with Y=X^2 and Z=X^3. The demo program explains this as well. |
| Correlation_Coeffs &X#(,),N,M,&R#(,) | |
| X#(1:N,0:M) | N measured values of M independent variables: Constant term in X#(1:N,0). Measured values of the first independent variables in X#(1:N,1). ....... ....... Measured values of the Mth independent variables in X#(1:N,M). |
| N | Number of measurement points. |
| M | Number of independent variables. |
| R#(0:M,0:M) | This field returns the correlation coefficients between the variables. |
| This procedure calculates the correlation coefficients between the M independent variables. | |
| 2.12 Time Series Analysis |
| Only one efficient procedure for a time series analysis is currently in existence: it serves to calculate a autocorrelation function of a series of equidistant measured values. For example: Let's assume that the oxygen content in a hothouse is being measured hourly for one year, and that the object is to find out whether the oxygen production is time dependent (e.g., more at night than during the day). Of course, you could draw a curve of your measured values for this analysis; however, this is rather bothersome if many points exist. It is simpler to use a mathematical method to counter shift the measurement curve and to calculate how well this function matches itself. The Statistic Library will take over this task for you. |
| Autocorr &X#(),N,M,&R#(),Flag | |
| X#(1:N) | Contains the measured values. |
| N | Number of measured values. |
| M | Maximum counter shift of X#(1:N). |
| R#(1:M) | This field returns the calculated correlation coefficients for each of the M shifts. |
| Flag | If Flag=1, then X#(1:N) will be normalized. It is recommended to perform this normalization anytime the mean of X#(1:N) is unequal zero. |
| This procedure calculates the autocorrelation coefficient for the N measured values in X#(1:N). If many measured values exist, this procedure may take some time. Therefore, do not think right away that your computer has crashed. | |
| Example: An example for the application of this procedure can be found in the DEMO folder (Autocorrelation.BAS). You should have a look at that program before working with real data. |
| 2.13 Numerical Functions | ||||||||
| The functions and procedures described below have proven themselves as rather useful when programming statistical problems. They all stem from the area of numerical analyses or serve to depict measured values. | ||||||||
|
||||||||
| Normalize &X#(,),N,R Sum# | |
| X#(1:N,1) | This field has to be used to pass the values to be normalized. After the return, this field will then contain the relative frequencies. |
| N | Number of values. |
| Sum# | Sum of all values. |
| This procedure normalizes the grouped data in X#(1:N,1) in such a way as to feature the relative frequency after the call instead of the group frequency. This procedure is very useful when calculating percentages. | |
| 2.14 Input and Output Procedures | ||||||||||
| Now a few procedures for the simple gathering of data and the display and depiction of this data. | ||||||||||
|
||||||||||
| Input_2 &X#(,),N,M,Flag | |
| X#(1:N,1:M) | Data is entered into this field in such a way as to read-in the elements of the second index (1:M) for each constant first index. |
| N | Maximum first index. |
| M | Maximum second index. |
| Flag | If Flag=1, then the data are read-in
from a file using INPUT #1. Of course, this file
has to have been opened first using OPEN "I",1,Fsspec$. If Flag=0, then the data will be fetched from the keyboard, which means it has to be typed in individually. |
| This procedure serves to enter a two-dimensional field with N columns and M rows. | |
| Block_Chart &X#(,),N,X,Y,B,H[,Flag] | |
| X#(1:N,1) | Contains the values to be depicted. |
| N | Number of values. |
| X | X value of upper left corner of diagram. |
| Y | Y value of upper left corner of diagram. |
| B | Width of diagram. |
| H | Height of diagram. |
| Flag | If Flag=0 or is omitted, the type
2 fill styles are used for the individual segments (FILL STYLE= 2,1:24). This has its advantages when display is in black and white (e.g., on printer). If Flag=1, then the colors 16 to 255 (FILL COLOR= 16:255) are used in a solid fill (FILL STYLE= 1,1). If Flag=3, then the colors 16 to 255 (FILL COLOR= 16:255) as well as the fill styles (FILL STYLE= 2,1:24) will be varied. If all fill styles or colors have been exhausted, the cycle starts again from the beginning. |
| This procedure draws a block diagram into the area defined by X,Y,B,H. The lower 10% of the box remain free for any possible labeling by the user. All other drawing attributes (e.g., line thickness, outer line, etc.) can be set by the user. It is also possible to reroute output with the BASIC command GRAF_PORT. |
| Example: An example for the application of this procedure can be found in the DEMO folder (Chart.BAS). |
| Pie_Chart &X#(,),X,Y,R[,Flag] | |
| X#(1:N,1) | Contains the values to be depicted. |
| X | X value of the center of the circular chart. |
| Y | Y value of the center of the circular chart. |
| R | Radius of the circular chart. |
| Flag | If Flag=0 or is omitted, the type
2 fill styles are used for the individual segments (FILL STYLE= 2,1:24). This has its advantages when display is in black and white (e.g., on printer). If Flag=1, then the colors 16 to 255 (FILL COLOR= 16:255) are used in a solid fill (FILL STYLE= 1,1). If Flag=3, then the colors 16 to 255 (FILL COLOR= 16:255) as well as the fill styles (FILL STYLE= 2,1:24) will be varied. If all fill styles or colors have been exhausted, the cycle starts again from the beginning. |
| This procedure draws a circular chart into the area defined by X,Y,R. The procedure only sets the fill style and/or the fill color. All other drawing attributes (e.g., line thickness, outer line, etc.) can be set by the user. It is also possible to reroute output with the BASIC command GRAF_PORT. | |
| Example: An example for the application of this procedure can be found in the DEMO folder (Chart.BAS). |
| Plot_2d_1 &X#(,),N,X,Y,B,H,R X_Max,R X_Min,R Y_Max,R Y_Min | |
| X#(1:N,0:1) | Contains the function values: X values in X#(1:N,0). Y values in X#(1:N,1). |
| N | Number of value pairs. |
| X | X value of upper left corner of diagram. |
| Y | Y value of upper left corner of diagram. |
| B | Width of diagram. |
| H | Height of diagram. |
| X_Max | Maximum value of X coordinates in X#(1:N,0). |
| X_Min | Minimum value of X coordinates in X#(1:N,0). |
| Y_Max | Maximum value of Y coordinates in X#(1:N,1). |
| Y_Min | Minimum value of Y coordinates in X#(1:N,1). |
| This procedure draws the value pairs contained in X#(1:N,0:1) into the box defined by X,Y,B,H. The function values are represented by little crosses. Five percent of the size of each box is available for your labels. | |
| Plot_2d_2 &X#(,),N,X,Y,B,H | |
| X#(1:N,0:1) | Contains the function values: X values in X#(1:N,0). Y values in X#(1:N,1). |
| N | Number of value pairs. |
| X | X value of upper left corner of diagram. |
| Y | Y value of upper left corner of diagram. |
| B | Width of diagram. |
| H | Height of diagram. |
| This procedure draws a curve through the points in X#(1:N,0:1). Thus, they have to be sorted. | |
| Plot_2d_3 &X#(,),N,X,Y,B,H,Ax#,Bx# | |
| X#(1:N,0:1) | Contains the function values: X values in X#(1:N,0). Y values in X#(1:N,1). |
| N | Number of value pairs. |
| X | X value of upper left corner of diagram. |
| Y | Y value of upper left corner of diagram. |
| B | Width of diagram. |
| H | Height of diagram. |
| Ax# | Slope of the line. |
| Bx# | Point of intersection of the line with the Y axis. |
| This procedure draws the points in X#(1:N,0:1) and the line Y=Ax*X+Bx into the same diagram. | |
| Plot_2d_4 &X#(,),N,X,Y,B,H,X_Max#,X_Min#,Y_Max#,Y_Min# | |
| X#(1:N,0:1) | Contains the function values: X values in X#(1:N,0). Y values in X#(1:N,1). |
| N | Number of value pairs. |
| XI | X value of upper left corner of diagram. |
| YI | Y value of upper left corner of diagram. |
| B | Width of diagram. |
| H | Height of diagram. |
| X_Max# | Maximum value of X coordinates. |
| X_Min# | Minimum value of X coordinates. |
| Y_Max# | Maximum value of Y coordinates. |
| Y_Min# | Minimum value of Y coordinates. |
| This procedure draws a curve through the points in X#(1:N,0:1) into the system of coordinates described by the remaining parameters. The points have to be sorted. | |
|
|
|
Omikron Basic on the Internet: http://www.berkhan.de |
Copyright 1998 by Berkhan-Software