-

 

 

 

 

 

UNDERSTANDING

 

SAS

   


PREFACE

 

SAS stands for Statistical Analysis System.  It is widely used for data analysis in industry, government, bank systems, and academia. 

    

There are many books from which one can learn how to write SAS programs, and there is a SAS online manual where one can find syntaxes, definitions, and examples. Basic programming skills can be quickly learned from any of the many available books, and in-depth knowledge can be acquired with a bit of extra effort by studying the SAS manual.

    

So what is the purpose of this book? Compared to typical introductory books on the subject, this one has much more content, not only in scope, but in depth as well. This can be appreciated from the list of topics in the CONTENTS section as well as from the size of the book.  In many places throughout the book, the reader will be informed not only about what will happen but also why this would happen. That is the reason that the book is titled UNDERSTANDING SAS.  After reading this book, a reader will have a considerable understanding of BASE SAS at the intermediate level and will be able to write SAS programs with great flexibility.

 

Although the SAS manual provides plenty of detail, it does not include everything, and this book can be used as a supplement to the SAS manual. Furthermore, this book has been written more like a textbook, and it provides emphasis on a number of points.

 

For the writing of this book, I read the SAS manual and many other sources, including published books, some of which are listed in the REFERENCES section; papers posted on the Web, particularly from many conferences such as SUGI, NESUG; and material from discussion forums.  Moreover, plenty of content in this book comes from my own research. I tried to keep the book at the "state of the art" level for BASE SAS, and the reader will appreciate especially the inclusion of the ODS procedure. 

 

The book covers primarily BASE SAS software, and it does so comprehensively.

However, coverage of statistical procedures has been avoided here, for the inclusion of them at an intermediate level would have significantly increased the size of the book.

 

As for the style of the book, it combines simplicity and thoroughness. Some topics are discussed briefly and some in full detail. For instance, regarding the statistical functions SUM and MEAN, if the usage of the first one of these is discussed, then readers will obviously know the usage of the second one, for the argument types of both functions are exactly the same; the only thing readers would need to do in this case is to replace the function name. However, the book discusses in detail the allowable types of arguments, and there are some tricky aspects of which readers may not be aware. Similarly, in the case of the ODS procedure in which there are many different attributes, once an example is given for one of them, readers should simply play the "replace-and-run" game to learn them well.

 

Comparison. Comparison. Comparison. Comparison is the soul of this book. There are more than eighty tables in the book, most of which I designed; they are the culmination of all my endeavors; and they constitute the main characteristic of the book. Tables have many advantages. From them it is easy to check, compare, and select their content.  However, they also have a shortcoming: a cell cannot contain too much detailed information.

 

Software programs differ from books in the sense that sometimes bugs and inconsistencies are incurable, while an error in a book can be corrected in the next printing. aA software developer has to be extremely careful because the consistency between different versions of a product is of paramount importance.

   

Let’s see a famous example. As we know 1900 is not a leap year. It means February of 1900 has 28 days. If you run the following program you will have problem.

 

DATA _NULL_;

a='29Feb1900'd;

RUN;

 

However in Excel, it is OK to put 2/29/1900. In other words, Excel considers 1900 as leap year. Actually this is not a "bug".  Indeed, it is by design.  Excel works this way because it was truly a bug in Lotus 123.  When Excel was introduced, Lotus 123 has nearly the entire market for spreadsheet software.  Microsoft decided to continue Lotus' bug, in order to fully compatible. 

 

Therefore, a software developer has to be extremely careful because the consistency between different versions of a product is of paramount importance. For example, the main differences between an "IF" statement and a "WHERE" statement should be kept about the same from version to version. Even though every SAS book talks about these differences, this book provides you with the most detailed discussion.

 

It is the author's hope that the book will be helpful in pointing out inconsistencies and that it can be used as a comprehensive reference book.

    

In my writing, I met many problems, and I want to share some of them with you.

 

Problem 1. What is the printout in the following program?

 

DATA s;

INPUT a @@;

CARDS;

-3 0 . 2

;

DATA t;

SET s;

WHERE +a>0;

PROC PRINT;RUN;

 

Problem 2. What is the output of the following program, and how do you explain it?

 

 

PROC FORMAT;

INVALUE ss '10'-'6'=1 5-12=2;

INVALUE sp 1-5=3;

DATA s;

INPUT a ss. b sp.;

CARDS;

11 5

20 7

60 23

;

PROC PRINT;RUN;

 

      Problem 3. What is the printout of the following program? If you feel that something is wrong, how do you fix it?

 

DATA s;

INPUT a $ b p;

CARDS;

1 1 1

2 2 2

4 4 3

;

PROC REPORT NOWD ;

COLUMN p a,N a,PCTN b,N b,PCTN;

DEFINE p/GROUP;

RUN;

 

  /*

PROC FORMAT;

INVALUE s 20-HIGH=4;

INVALUE t '20'-HIGH=8;

INVALUE $s 20-HIGH=4;

INVALUE $t '20'-HIGH=8;

VALUE $s 20-HIGH=4;

VALUE $t '20'-HIGH=8;

DATA p;

INPUT @1 a $s. @1 b $t. @1 a1 s. @1 b1 t. @1 c $ @1 d $;

p=INPUT(6,s.);

q=INPUT(6,t.);

r=INPUT('135',t.);

FORMAT c $s. d $t.;

CARDS;

6

135

;

PROC PRINT;RUN;*/

 

Problem 4. In the following function calls, all variables and arrays are defined. Which function calls are OK, and which ones are not?

 

a=SUM(OF x1-x6 p,x1+x2,LOG(p), OF q y1-y5, OF z(*));

a=MEAN(OF x1-x6 test: y—-a);

a=MIN(OF _NUMERIC_);

a=MEAN(OF x1-x6 p, OF q y1-y5, OF z1 z2);

A=MAX(OF x1-x6 y1+y2);

A=MEAN(x1 x2 x3);

A=MEAN(OF x1-x5, y2 y3);

A=MEAN(OF z(1) – z(3));

 

Problem 5. What is data set is created by the following program? What happens if I change the keyword IF to the keyword WHERE?

 

 

DATA s;

INPUT a b c;

CARDS;

3 5 2

-3 5 2

2 5 2

-2 5 2

;

DATA t;

SET s;

IF a=-3 MAX b MIN c;

RUN;

PROC PRINT;RUN;

 

Problem 6. What is data set is created by the following program, and how do you explain the error messages?

 

DATA student;

INPUT name $  gender $  height  weight test 

       dob MMDDYY8.  phone;

CARDS;

George   M 56 111  89

01/04/60 7345678765

Joe      M 57 115  87

01/05/60 8763459875

;

RUN;

 

Problem 7. Suppose we have a txt file with the following contents:

 

12 12 1/1/60

12

11 13 11/11/60

11

14 15 12/3/76

20

 

Run the following program. What data set is created and how do you explain it?

 

DATA ss;

INFILE 'c:\sas\test2.txt';

INPUT a  b  c MMDDYY8. d;

RUN;

PROC PRINT;

FORMAT c MMDDYY8.;

RUN;

 

Problem 8. The following program creates two reports. One is a detailed report and the other is a summary report. Give the necessary and sufficient conditions for PROC SQL to create a summary report.

 

DATA s;

INPUT a b;

CARDS;

1 2

3 4

;

proc sql;

SELECT * FROM s;

SELECT SUM(a),N(b) FROM s;

QUIT;

 

Problem 9. If you think the following program is not OK, how would you explain the result?

 

%MACRO ss(company);

%IF &company=ge %THEN %PUT company is ge;

%MEND;

%ss(ge)

 

  /*Problem 9. What is the printout of the following program and how do you explain it?

 

DATA s;

a0=1;

PROC REPORT NOWD;

COLUMN a0 y z ;

COMPUTE y;

y=b+1;b=b+6;

ENDCOMP;

COMPUTE z;

z=c+1;

ENDCOMP;

RUN;*/

Dr. Xu, you have two problems numbered 9. Do you plan to eliminate one of them? What about removing the /* and */ symbols?

Problem 10. Write a program to create the following table (rtf file):

Problem 11. The following table was is produced by using PROC FREQ on a data set.

 

                                        Cumulative    Cumulative

     branch    Frequency     Percent     Frequency      Percent

     ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

     A                1       33.33             1        33.33

     B                1       33.33             2        66.67

     C                1       33.33             3       100.00

 

After I copied it to a "doc" file and changed its font to Courier New, it became

 

 

                                        Cumulative    Cumulative

     branch    Frequency     Percent     Frequency      Percent

     ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

     A                1       33.33             1        33.33

     B                1       33.33             2        66.67

     C                1       33.33             3       100.00

 

Do you know what happened to the solid line?

 

      The book uses the following typographical conventions:

    

    Times New Roman               basic type style used for most text

    UPPER CASE TIMES NEW ROMAN for SAS keywords

    Italian Times New Roman               for definitions

    Courier new                      for SAS program code

    SAS monospace                        for SAS printouts

    

      There is no need to have prior knowledge of SAS although some statistical background is helpful for Chapter 6. The following mathematical notation is used in this book, mainly in Chapters 3 and 8: Let A={1,2,3} and B={1,2,4} be two sets.

    

    AΘB={1,2,3,4}       union of A and B

    AΗB={1,2}             intersection of A and B

    A\B={3}                  difference of A and B

    A+B={1,2,3,1,2,4}        repeated union of A and B

    

    You can find SAS documentation online. The URL is

    

    http://support.sas.com/onlinedoc/913/docMainpage.jsp

 

    

SAS GLOBAL FORUM (originally SUGI) provides many professional and advanced topics on SAS. The following is its URL:

 

http://support.sas.com/events/sasglobalforum/

 

 

There are also some discussions online. You can post your questions there. The URLs are the following:

    

    http://www.listserv.uga.edu/archives/sas-l.html

 

    

    http://groups.google.com/group/comp.soft-sys.sas

 

    

You can also get help from the SAS institute. The URL is

 

  http://support.sas.com/ctx/supportform/index.jsp

 

    

Finally, any suggestions, comments, and criticisms are very welcome. Please send them to:

    

sasbook@beauthor.com

 

 

    

BACK COVER

    

    SAS Certified Base Programmer and SAS Certified Advanced Programmer.

    M.S. in Statistics, Rutgers University.

    Ph.D. in Operations Research, Rutgers University.

Co-author of the book Linear Programming (in Chinese, with Dr. Jianzhong Zhang) published by Science Publishing House, Beijing, China 1990.