scribblings on the bedlam wall: C++: Dependent T and Regression.

A little something I wrote for my stats class.

//Runs Dependent T on a set of data.

This part is called a comment. It explains what a piece of code does to anyone reading the code. It does nothing else; the compiler/interpreter ignores it.

#include <iostream>
#include <sstream>
#include <math.h>

These are packages used by the program: pieces of code written by someone else. All three packages are included with the compiler/IDE (Integrated Development Envirionemt, a program that helps to write code by making sure that the programmer is writing correctly) that I was using (Visual C++ Express, in case you were wondering).

IOStream handles the console, which looks a lot like the MS-DOS prompt (lay off. I'm old, so I'm allowed to go spouting jargon that you young whippersnappers have never seen).

SStream handles strings, which are how programmers say text.

Math.h handles calculations more complicated than basic arithmetic.

using namespace std;

This is called a namespace. It controls what code this program has access to, and what code has access to this program. Most C++ code uses the "std", or standard, namespace.

int main ()
{

This is called a function. It is a piece of code that can be run. The part inside the parentheses is where arguments (information that some methods need in order to run) would go, but this method does not require any information that it does not collect itself.

Incidentally, the main() method is the "starting point", or the part of the code that runs first.

    bool collecting=true;
    double sum1, sum2, count, sumdiff, sumdiffsquare, sumproducts, sum1square, sum2square;
    double correllation;

These are called variable declarations. They tell the compiler to set aside part of the computer's memory to store information, and describe the type of information that it will need to store.

Bool is short for boolean. It describes a piece of information that is either true or false.

Double is a type of information that stores numbers and fractions.

As you can see, it is possible to create more than one variable at a time.

Also, these statements all end in semicolons. This tells the computer that the statement is over and that it should prepare to read the next one.

    sum1=0;
    sum2=0;
    count=0;
    sumdiff=0;
    sumdiffsquare=0;
    sumproducts=0;
    sum1square=0;
    sum2square=0;
    string string1, string2 ="";

These are called variable assignments. They tell the computer to change the information stored in its memory to the values provided.

All of the assignments change the information in memory to what is called literals, or values clearly defined in the code. This is somewhat bad practice; if I needed to change the code, there's a chance I'd forget why I chose some numbers instead of others. To get around this, most of the time a programmer will use something like this:

const int initialValue=0;
int myvar=initialValue;

string input;

while(collecting)
{

Here, we will use what is called a while loop. A loop is a piece of code that is run repeatedly, without any major changes. A loop might be used to keep a game running, or to add one to every item in a list; in this case, we're using it to gather information from the user.

try
{

This is what is called a try-catch block. What it does is it runs this piece of code:

double in1;
cout << "Enter the first datum for this point. Press Enter to calculate.\n";

(this, by the way, is how C++ writes something to the console; the "\n" at the end is called an escape character, and it applies some special effect to the text, in the case of \n it ends the line and starts another)

getline(cin,input);

(this takes console input that ends with an Enter, and puts it into the variable "input")

stringstream(input) >> in1;

(this takes the variable of type string [used to store text] called "input" and puts it into the variable of type double called "in1")

if(input=="")
{

(this is called an if block. It runs a piece of code if and only if a particular condition is met, in this case if "input" is what is called a blank string)

collecting=false;
break;

(a break statement will cause the computer to instantly end one loop it is running)

}

(and this ending curve bracket tells the computer that it has reached the end of a loop, block, or method)

            double in2;
            cout << "Enter the second datum for this point. Press Enter to calculate.\n";
            getline(cin,input);
            stringstream(input) >> in2;
            if(input=="")
            {
                collecting=false;
                break;
            }

(this is where a bit of calculation takes place)

sum1+= in1;

(this increases the value stored in sum1 by in1)

sum2+=in2;
count++;

(this increases the value stored in count by 1)

            sumdiff+=in1-in2;
            sumdiffsquare += (in1-in2) * (in1-in2);
            sumproducts += in1*in2;
            sum1square+=(in1*in1);
            sum2square += in2*in2;

}

And then, if the computer makes a mistake (called an exception) at any point in that, for instance if it cannot convert whatever the user typed into a double, it instead runs this catch block and instantly ends the try block:

        catch (exception e)
        {
            collecting=false;
        }
    }

And here we end the for loop.

cout << "Dependent T:" << endl;

Here is the part where the computer handles output. endl, by the way, is another way of ending a line.

cout << "Mean X:" << sum1/count << endl;

As you can see, it is possible for the computer to include things other than strings in text. This is because the computer runs an implicit cast on the variable, a piece of code included in the compiler that changes one type of variable into another.

    cout << "Mean Y:" << sum2/count << endl;
    cout << "Sum of differences:" << sumdiff << endl;
    cout << "Sum of square differences:" << sumdiffsquare << endl;
    cout << "Count:" << count << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);

And here, we have the part of the code that tells it to wait until the user has pressed Enter.

    cout << "Difference of means:" << (sum1-sum2)/count << endl;
    cout << "Average square difference" << sumdiffsquare/count << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    cout << "Count * count-1:" << count*(count-1) << endl;
    cout << "Square sum of differences - average square difference:" << sumdiff*sumdiff-(sumdiffsquare/count) << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    cout << "Square root of square sum of differences - average square difference:" << sqrt(sumdiff*sumdiff-(sumdiffsquare/count)) << endl;

And here we see an actual function in use! sqrt(number) returns (that is, when included in code, its value is equal to) the (positive) square root of its argument. In this case, we can include an expression (essentially, a calculation) as its argument, because that expression returns a number.

    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    cout << "T value:" << ((sum1-sum2)/count)/sqrt(sumdiff*sumdiff-(sumdiffsquare/count)) << endl;
    cout << "Degrees of freedom:" << count-1 << endl;
    cout << "--------------------------------------------" << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);

Dependent T, by the way, is a way of figuring out whether a difference in two numbers associated with the same experimental uint and measured on the same scale (for instance, the number of cigarettes a person smokes per hour while wearing a sticker and while wearing a nicotine-laced sticker). The way it works is that the T-value (an adjustment of the difference between averages of two groups) is calculated, along with a number called the degrees of freedom. These are then looked up on a table, and if the t-value exceeds the range listed for a given probability at the degrees of freedom, the odds that the difference in averages was purely because of chance are less than the probability for which it was looked up.

    cout << "Regression:" << endl;
    cout << "Count:" << count << endl;
    cout << "Sum of products:" << sumproducts << endl;
    cout << "Sum X:" << sum1 << endl;
    cout << "Sum Y:" << sum2 << endl;
    cout << "Sum X square" << sum1square << endl;
    cout << "Sum Y square" << sum2square << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    cout << "Count * sum of products:" << count * sumproducts << endl;
    cout << "Product of sums:" << sum1*sum2 << endl;
    cout << "Count * sum of square X:" << count * sum1square << endl;
    cout << "Square sum of X:" << sum1*sum1 << endl;
    cout << "Count * sum of square Y:" << count * sum2square << endl;
    cout << "Square sum of Y:" << sum2*sum2 << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    cout << "Count * sum of products - product of sums:" << count * sumproducts - sum1 * sum2 << endl;
    cout << "Count * sum of square X - square sum of X:" << count * sum1square - sum1*sum1 << endl;
    cout << "Count * sum of square Y - square sum of Y:" << count * sum2square - sum2*sum2 << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    cout << "X-part * Y-part:" <<(count * sum1square - sum1*sum1)*(count * sum2square - sum2*sum2) << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    cout << "Square root X-part * Y-part:" << sqrt((count * sum1square - sum1*sum1)*(count * sum2square - sum2*sum2)) << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    correllation= (count * sumproducts - sum1 * sum2)/sqrt((count * sum1square - sum1*sum1)*(count * sum2square - sum2*sum2));
    cout << "Correllation coefficient:" << correllation << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    cout << "Begin T value derivation:" << endl;
    cout << "Count:" << count << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    cout << "Square of correllation coefficient:" << correllation*correllation << endl;
    cout << "Count - 2:" << count-2 << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    cout << "1-square of correllation coefficient:" << 1-correllation*correllation << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    cout << "Deviance:" << (1-correllation*correllation)/(count-2) << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    cout << "Standard deviation:" << sqrt((1-correllation*correllation)/(count-2)) << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);
    cout << "T value:" << correllation/sqrt((1-correllation*correllation)/(count-2)) << endl;
    cout << "Degrees of freedom:" << count-2 << endl;
    cout << "--------------------------------------------" << endl;
    cout << "--------------------------------------------" << endl;
    getline(cin,input);

Nothing new here. Regression is a way of determining whether there is a correlation between two things, that is, whether they tend to appear together, or whether they tend to appear apart, or whether they appear independently. It also produces a T-value, which is used the same way as the one above.

//reach this line last
return 0;
And here is what the function returns. If a function does not need to return anything, in C++ there is a convention of having it return 0.
}

scribblings on the bedlam wall

Pages

Friday, March 9, 2012

C++: Dependent T and Regression.

No comments:

Post a Comment