Internet SAS Course Tips

Here are some hints and tips that may help you get through the course. They don't specifically tell you how to solve the various programming assignments, but they suggest ways to accomplish specific programming tasks that comprise many problems.

1. Setting your linesize for Windows and your output.

The default linesize for SAS is often longer than most of us would like. You can set it for the output with the linesize option on the options statement that comes first in a SAS program, e.g.,

OPTIONS LINESIZE = 74

or you can set your SAS global option to a smaller size. You will need to do this to properly submit your assignments. Be sure to follow these instructions.

2. Using built in statistical routines or PROCs.

There can two different components to a SAS program--the data step and the PROCs. The data step is the program language of SAS. It is here that we input, manipulate, and output data. The PROCs are built in statistical routines that are fed data from a data step, and produce output that is pre-programmed. You can control the output to some extent with various options, but your control is limited compared to the data step where the output produced is what you decide. A SAS program begins with a data step, and may also include one or more PROCs. A PROC ends the data step. If you put a PROC in the middle of your program (data step) you will get an error. If you wish to continue the data step, you must begin a new one with a DATA … command.

3. Debugging with PROC PRINT (and other PROCs).

PROC PRINT is the most valuable tool for debugging. It tells you the value of every variable for every subject (if there's more than one subject). This includes every variable in your INPUT statement, plus every variable created throughout your program. The values shown are the final values when the entire program has been completed. It is a good idea when you are working on a program to put a PROC PRINT at the end when you are ready to debug. If your have more than a few subjects, insert the following:

IF _n_ < 6

to limit your PROC PRINT to the first 5 cases or subjects. You can replace 6 with any number you like. You can check your final results from the PROC PRINT with the expected results to be sure the program is working properly. I recommend as a last step in debugging to run the PROC PRINT without the IF statement to see what the final case results are. Sometimes results are correct for all but the last case, so it is important to verify this. Most of the time, if the first, second, and last case are correct, all cases are correct. Of course, sometimes this isn't true if special combinations of variable values produce odd results, e.g., missing values. It is important to check all such possibilities (that you can anticipate) as well.

4. Debugging with PUT statements.

The PUT statement is a good debugging tool. You can use it to output the value of variables almost anywhere within your data step to see what your program has done at that point. Placed before and after a statement can show you the effect a statement had on a variable. This can be particularly useful when used in conjunction with the PROC PRINT command to trace where a program has gone wrong.

5. Limiting the number of cases processed.

Sometimes you wish to limit the number of cases processed, or you wish to process certain cases (see Tip #3). The _n_ is a variable automatically generated by SAS that indicates case number. The first case is given a value of 1, the second 2, and so on. You can tell SAS to process only certain cases by using this variable with an IF, as in

IF _n_ < 11;

This statement tells SAS to process only the first 10 cases. This handy variable can be used with a THEN to execute certain statements for specific cases, such as the first one in a dataset.

6. The use of flags.

A flag is a variable that tells SAS whether or not to execute a statement, depending upon some condition that exists at the time. Your text (pp. 99-103) shows how you can make a flag flip-flop between two values to split a sample into two parts, with even numbered cases in one and odd numbered in the other. The rules governing the flag can be changed to split samples into any number of parts, e.g., if the flag takes on three values, each will be associated with a different one of three groups corresponding to three subsamples. Flags can be used in many other ways, e.g., to keep track of what happened on a prior case.

7. Keeping the values of a variable from case to case.

When SAS begins to read a new case, all variables are initially set equal to missing or '.' (period). Values for that variable from the prior case do not carry forward to the new case, unless you specifically tell SAS to do so. The RETAIN statement does this. Be very careful that you use the RETAIN on accumulators and counters that operate from case to case, e.g., when you are compiling sums across subjects.

8. Figuring out what to put in the loop and what to put outside the loop.

Using nested loops is perhaps the toughest part of SAS that we will do in this course. Master this, and you are well on your way to being an accomplished SAS programmer. A single loop usually doesn't give much trouble, so my advice is to build your nested loop programs slowly from the inside (innermost) to the outside (outermost). Think carefully about the steps involved and what gets done where. Get each part working before you move on to the next. With complex programs it is often easier to do a little at a time, debugging and being sure each part is correct before moving on, rather than trying to write an entire program and then debugging it all. Lets take a simple example involving two nested loops. The problem is to find the average of 10 random numbers, and then repeat this 5 times. What we do in the inner loop is generate the 10 numbers and sum them. This is the only function of this loop. When the loop finishes it's tenth iteration, we fall outside and take the average. Remember we must initialize the accumulator.

sumx = 0;

*initialize accumulator to 0

do i = 1 to 10;

*begin loop

x = rannor;

*generate random number

sumx = sumx + x;

*sum numbers

end;

*end loop

meanx = sumx/10;

*compute mean

Our task is to repeat this 5 times. To do so, we can put this entire program inside another outer loop that repeats the process five times.

do j = 1 to 5;

INSERT ABOVE PROGRAM HERE

end;

Suppose we wish to take the mean of the means. We must add an accumulator for use within the outer loop, as well as an initializer for that accumulator (outside the loop), and a statement to compute the mean of means. Putting the entire thing together we have the following:

data a;

sumtot=0;

do j = 1 to 5;

sumx = 0;

do i = 1 to 10;

x = rannor(0);

sumx = sumx + x;

end;

meanx = sumx/10;

sumtot = sumtot + meanx;

end;

meantot = sumtot/5;

file print;

put meantot;

run;

I put a PUT statement at the end to output the results of each run. Of course, we could put the whole thing inside a third loop and have it repeat the entire process, outputting a mean of means each time, but I will leave it to you to try this.

Copyright Paul E. Spector, All rights reserved. Last modified April 16, 2004.