From: walt@mathcs.emory.edu (Walt Hultgren {rmy}) To: stetzer@convex.csd.uwm.edu Subject: dbftosas Date: Thu, 6 May 93 15:23:07 -0400 Frank, I saw a posting on Usenet that mentioned a perl program dbftosas that you have that will convert dBase files to an SAS format. Since the article said that dbftosas converts the DBF files to intermediate ASCII files, I thought that I might be able to convert your program to load Informix tables directly from DBF files. To that end, I'd like a copy of your program if you're letting it out. Thanks in advance, Walt. -- Walt Hultgren Internet: walt@rmy.emory.edu (IP 128.140.8.1) Emory University UUCP: {...,gatech,rutgers,uunet}!emory!rmy!walt 954 Gatewood Road, NE BITNET: walt@EMORY Atlanta, GA 30329 USA Voice: +1 404 727 0648 From: Frank Stetzer To: walt@mathcs.emory.edu Subject: Re: dbftosas Date: Thu, 6 May 93 14:26:36 -0500 Here is the program. I'm not a perl whiz (or a dbf whiz) but it has worked for me with dbf files from several sources. I have appended a "man page" to the end of the file, for reference. -- Frank Stetzer #!/usr/bin/perl # # This program reads a dbf file (somename.dbf) and outputs two ascii # files. The first (somename.sas) contains the "infile", "format" , # and "input" statements of a SAS data step. The second (somename.dat) # contains the raw data to be read by the SAS program. # -- # Frank Stetzer, CSD # August 12, 1992 # ------------------------------------------------------------------------ # # First check whether we are big or little endian # $a=ord(pack("I",0xdeadbeef)); if ($a == 0xef) {$endian="LITTLE";} elsif ($a == 0xde) {$endian="BIG";} else {die "Can't determine the byte order of this computer\n";} print "This is a $endian\-ENDIAN machine\n"; # if ($#ARGV == -1) {die "Usage: dbftosas file.dbf\n (outputs file.sas and file.dat)\n";} # $indbf=$ARGV[0]; if (substr($indbf,length($indbf)-4) ne '.dbf') { $froot=$indbf; $indbf=$froot.'.dbf'; } else {$froot=substr($indbf,0,length($indbf)-4);} open (IN,"$indbf")||die "file not found\n"; binmode(IN); # # these are the output files # $outvar=$froot.'.sas'; $outdat=$froot.'.dat'; open (VAR,">$outvar"); open (DAT,">$outdat"); # print VAR "data one;\n"; # # read the initial dbf header line and interpret # read(IN,$buf,32); ($vn,$year,$month,$day,$nr,$hs,$lr,$junk)= unpack("A1 c1 c1 c1 a4 a2 a2 A20",$buf); if ($endian eq "BIG") { $rn=reverse($nr); $sh=reverse($hs); $rl=reverse($lr); } elsif ($endian eq "LITTLE") { $rn=$nr; $sh=$hs; $rl=$lr; } $nrec=unpack("i",$rn); $heads=unpack("s",$sh); $lenrec=unpack("s",$rl); print VAR " infile \'$outdat\' lrecl=",$lenrec-1," pad;\n"; print "Date of dbf file=$month/$day/$year\n"; print "Number of records=$nrec\n"; print "Length of header=$heads\n"; print "Length of records=$lenrec\n"; $nfields=int($heads/32)-1; print "Number of fields=$nfields\n"; # # Read the header fields for each dbf variable # Store the info in arrays @name, @flen, @type, @fds, and @fdec # $nv=0; $nlong=0; for ($i=1;$i<=$nfields;$i++) { read (IN,$buf,32); ($name,$type,$fds,$flen,$fdec,$junk)=unpack("A11 A1 a4 c1 c1 A14",$buf); if ($endian eq "BIG") {$sdf=reverse($fds);} elsif ($endian eq "LITTLE") {$sdf=$fds;} $fds=unpack("i",$sdf); $name[$i]=$name; if (length($name) >8) {$nlong++;} $flen[$i]=$flen; $type[$i]=$type; $fds[$i]=$fds; $fdec[$i]=$fdec; } # # Check all variable names for length. If longer than 8 characters, # reduce length by (first), eliminating any of -, _, *, A, E, I, O, or U # from the end of the variable forward down to 8 characters. If the # resulting name is not unique, make a name "VDBn" for n=1,2,... # if ($nlong >0){ print "$nlong field names are longer than 8 characters:\n"; for ($i=1;$i<=$nfields;$i++) { if (length($name[$i]) > 8) { $name1=$name[$i]; $l=length($name1); $eman=reverse($name1); foreach (9..$l) {$eman =~ s/[\-\_\*AEIOU]//;} $name=reverse($eman); if (length($name) >8) {$name=substr($name,0,8);} if ($i > 1) { for ($j=$i-1;$j>=1;$j--) { if ($name[$j] eq $name) { $nv++; $lnv=5-length("$nv"); $name="VDB" . "0" x $lnv . $nv; $last; } } } print "Old name: $name1 New name: $name\n"; $name[$i]=$name; } } } # # Put out the SAS format statement # print VAR " format "; for ($i=1;$i<=$nfields;$i++) { $j=int(($i-1)/3)*3 - ($i-1); if ($j == 0) {printf VAR "%4s","\n ";} printf VAR "%10s",$name[$i]; $f=" %10s "; if ($type[$i] eq "C") {printf VAR $f,' $'.$flen[$i].".";} if ($type[$i] eq "N") {printf VAR $f,' '.$flen[$i].'.'.$fdec[$i];} if ($type[$i] eq "D") {printf VAR $f," MMDDYY8.";} } print VAR ";\n"; # # put out the SAS input statement # print VAR " input"; for ($i=1;$i<=$nfields;$i++){ $j=int(($i-1)/3)*3 - ($i-1); if ($j == 0) {printf VAR "%4s","\n ";} if ($type[$i] eq "D") { $range='@'.$fds[$i]." ".$name[$i]. " YYMMDD".$flen[$i].". "; printf VAR "%25s",$range; next; } printf VAR "%10s" ,$name[$i]; printf VAR " "; if ($type[$i] ne "N") {print VAR "\$ ";} else {print VAR " ";} $lb=$fds[$i]; $ub=$lb+$flen[$i]-1; if ($flen[$i] > 1) {$range=$lb.'-'.$ub;} else {$range=$lb;} printf VAR "%12s",$range; } print VAR ";\n"; close VAR; print "\nFile $outvar written\n"; # # Read the next character; should be a carriage return "\r" # read(IN,$del,1); if ($del ne "\r") {die "Out of Sync\n";} # # Put out the data file...a straight copy with line feeds added # for ($i=1;$i<=$nrec;$i++) { read(IN,$del,1); read(IN,$buf,$lenrec-1); if ($del ne '*') {print DAT "$buf\n";} else {print "Record $i deleted\n";} } close DAT; print "File $outdat written\n"; ##########################end of program dbftosas############################# DBFTOSAS(1) UWM Local Documentation DBFTOSAS(1) NAME dbftosas - Convert dbf files from a PC for input to SAS SYNOPSIS dbftosas somefile.dbf DESCRIPTION Dbftosas processes a "dbf" file (e.g. somefile.dbf), pro- duced by dBase or another PC-based database program, and produces two output files beginning with the same file name and ending in the extensions .dat and .sas (e.g. somefile.dat and somefile.sas). The latter contains a skeletal SAS data step to read the former, defining variable names, data types, and formats. This file can be enhanced by the user to include variable labels, missing value declara- tions, permanent SAS libraries, etc. Dbftosas properly han- dles numeric, string, and date data types and shortens long field names to eight unique characters. A summary of the number of fields and cases processed, field names changed, etc., is written to STDOUT. The SAS program file can be modified by the user into an SPSS program file since the data description and input operations have similar syntax. DIAGNOSTICS Dbf files contain few distinguishing markers. It is possi- ble to run dbftosas on a damaged or bad dbf file without detecting any problem. Be sure to examine the output sum- mary for any anomolies. If the program detects a problem, an error message "Out of Sync" will be printed and the pro- gram will stop. Dbf files contain binary fields; therefore they must be uploaded from the PC with the file transfer program (e.g. kermit, ftp) in binary mode. This is one possible source of difficulties. BUGS Some database programs allow strings to be entered into fields declared as numeric. Also, the numeric formats speci- fied in the data base may not be appropriate for the data. Dbftosas does not check for correspondence between data declaration and actual data; these inconsistencies will gen- erate errors when the SAS program is run. If problems are encountered, send mail to "help". AUTHOR Frank Stetzer CSD Printed 11/10/92 August 12, 1992 1