Automated SAS Data Step from a Data Dictionary
Alan C. Elliott, Linda S. Hynan, Joan S. Reisch, Janet P. Smith
UT Southwestern Medical Center Dallas
ABSTRACT
In a medical research environment, researchers and statisticians collaborate on hundreds of projects each year. Much of the data gathered for these projects are first entered in Excel spreadsheets. In a previous paper (Elliott, 2006) we described techniques for double data entry of research data using an Excel spreadsheet. As a part of the best practices for data collection and entry, we also described how to create a data dictionary. This dictionary contains descriptions of all variables in a data set, including variable name, type, labels, information about range or list restrictions, and missing values codes. For many research data sets, this dictionary can become rather large and detailed. And although this process helps the researcher prepare their Excel spreadsheet for importation into SAS, the statistician must still hand-enter much of the information in a data define labels and perform data checking. The SAS macro described here, DDENTRY, reads the data dictionary and creates necessary PROC FORMAT statement, labels, and creates the code to check for outliers and to assign missing value codes.
Example 1 Excel File Data Dictionary (DDEXAMPLE.XLS) - contains data dictionary for example 1
Example 2 Excel File Data Dictionary (DDEXAMPLE2.XLS) - contains data dictionary for example 2
(READ_DD.SAS) - A SAS program that reads the DDEXAMPLE.XLS data dictionary and uses the DDENTRY macro to create SAS code.
(READ_DD2.SAS) -A SAS program that reads the DDEXAMPLE2.XLS data dictionary and uses the DDENTRY macro to create SAS code.
The DDENTRY SAS MACRO (DDENTRYMACRO.SAS) (required)
This file was updated on 12-1-2009 with the following changes:
* For Variables that end with a number, the format is appended with a _. For example, the format created for the variable TYPE1 will be named FMTTYPE1_;
* Quotes in labels are replaced with a _.
* Numeric and String are detected as well as Number and Text inData_type column.* Note that the sheet in the data dictionary Excel file MUST be named data_dictionary.
The Survey data file (SURVEY.XLS)
Additional information:
If your created code file includes PROC FORMAT, and you store your data in a permanent SAS data set, it is advisable to store FORMATS in a library using the code:
libname libref;
PROC FORMAT library=libref;
That is, change the line in the file created by DDENTRY to include the library option (and optionally include a libname statment in your code if needed) where libref is the SAS library name where you want the formats to be stored.
To subsequently use these formats, use the code
libname library libref;
for example
libname library 'C:\MYSASLIB';
before referring to the data set in a PROC or SET statement such as:
PROC MEANS DATA=LIBREF.MYDATA;RUN;
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Alan C. Elliott
Department of Clinical Sciences
Division of Biostatistics
5323 Harry Hines Blvd
Dallas, TX 75390
(214)648-2712
alan.elliott@utsouthwestern.edu
The completed macro (latest version) is available at the following web address
REFERENCE
Elliott AC, Hynan LS, Reisch JS, Smith JP. “Preparing Data for Analysis Using Microsoft Excel,” J Investig Med 54(6):334-41, 2006.