Newsgroups: comp.databases.informix Subject: Splitting string on blanks in 4GL From: shj@dknet.dk (Stig Jacobsen) Date: 14 Aug 1995 17:12:48 GMT Howdy, I have a bunch of strings which looks like "500 328 729", that I'd like to seperate by whitespace and store into integer variables - basically like: define str char(100) define counts array[3] of integer select value from table into str let counts = split(str, ...) .. after which counts[1] equals 500, counts[2] is 328, etc. What is the best way to do this? I thought of unloading str to an ascii file, using run to call a shell script or somesuch which would massage the unload file and produce a load file, which is the loaded into a suitable table created on the fly. Pretty horrible thought, yes? I haven't tried it yet, but I expect it to be a lot slower than my users would appreciate. Any input is welcomed. I'm running INFORMIX-4GL Version 4.11.UC2 on Solaris 2.3. -- Stig Jacobsen / shj@dknet.dk http://www.dknet.dk/~shj From: Nils.Myklebust@CCMAIL.telemax.no Date: 14 Aug 1995 14:28:03 -0400 I know of no way the above syntax can work in 4GL, but why not simply: let counts[1] = str[1,3] let counts[2] = str[5,7] let counts[3] = str[9,11] if your input data is as well formed as you indicate. If not a for loop will do it with variables for str[p_start,p_end] possibly with a whenever any error continue and test on status or perhaps better, tests inside the for loop on every character in str. If you don't have millions of rows with these data even the for loop approach should be reasonably fast. Nils.Myklebust@ccmail.telemax.no NM-data, Dalsbergstien 7, N-0170 Oslo, Norway My opinions are those of my company From: perryd@fourgen.com (Perry Dillard) Date: 14 Aug 1995 17:03:03 -0400 I must warn you that the Informix substring manipulation is _exteremely_ expensive in the compiled version. I'd think about writing a routine in C to split your data. -perryd (Guru) ==================================================================== = Perry Dillard - CASE Tools Development Team Leader = = FourGen Software, Inc. = = FourGen Building = = 115 NE 100th Street = = Seattle, W.A. 98125-8098 = = = = e-mail: perryd@fourgen.com = ==================================================================== From: steinar@balder.no (Steinar O. Cook) Date: 14 Aug 1995 17:28:04 -0400 An even better idea would be to use strtok(3c): 1) Write a C function called c_strtok() which basically is only an interface to the strtok(3C) call. 2) Now you can do like this: LET i=1 WHILE TRUE LET array[i] = c_strtok(str) IF (array[i] IS NULL) THEN EXIT WHILE END IF LET i = i + 1 IF (i > max_no_of_elements) THEN MESSAGE "Too many numbers ...." EXIT WHILE END IF END WHILE -- Steinar Overbeck Cook Balder Programvare AS From: Nils.Myklebust@CCMAIL.telemax.no Date: 15 Aug 1995 11:10:26 -0400 A good point, but we use it a lot without any performance problems. Even things that are "expensive" is often masked completely by database access. Most of the time we find 4GL programs easier to write than C code. Also we want as little C code as possible to avoide problems should we decide to use RDS. For future compiler technologies from Informix we do however see it as important that they optimise better so these problems disappear. I you look at the C-code generated it can't be fast doing anything. An example of something that is too expensive is if you use a lot of sum/group sum in a report. You may see runtimes where these calculations take most of the time to finish the program. Nils.Myklebust@ccmail.telemax.no NM-data, Dalsbergstien 7, N-0170 Oslo, Norway My opinions are those of my company From: jparker@hpbs3645.boi.hp.com (Jack Parker) Date: 15 Aug 1995 19:40:02 -0400 This is an obvious candidate for a 'c' function. Does one such exist already? Anyone want to write one? I'm too busy and my 'c' code stinks, but will do it if there are no other takers. I can think of two approachs at the moment, 1 - chop up the entire string and then worry about popping off the stack upon return to 4gl. 2 - split on the first delimiter and return the first and remainder field, the caller then calls as many times as necessary. cheers j. _____________________________________________________________________________ Jack Parker - Hewlett Packard, BSMC Boise, Idaho, USA jparker@hpbs3645.boi.hp.com _____________________________________________________________________________ Discover America, get lost on a rally. _____________________________________________________________________________ Any opinions expressed herein are my own and not those of my employers. _____________________________________________________________________________ From: marco greco Date: 16 Aug 1995 09:54:09 GMT My Lit.32 on the subject: one of the few c routines in 4glWorks #include int get_token(); get_token() { int i; char *p, *q; char bf[132]; popquote(bf, 132); p=bf; if (!*p) { retquote(""); retquote(""); return(2); } i=strspn(p, " \t"); if (i==132) { retquote(""); retquote(""); return(2); } p=p+i; i=strcspn(p, " \t"); q=p+i; *q=0; q++; i=strspn(q, " \t"); q=q+i; retquote(p); retquote(q); return(2) } the following 4gl code demonstrates its use: define i integer, val array[20] of char(20), str char(100), let i=1 while (str is not null) and (i<21) call get_token(str) returning val[i], str let i=i+1 end while cheers, marco ........................................................................ tear along dotted line! marco greco mar.greco@agora.stm.it Work Achea Srl tel 39 95 503117 / 447828 Pbx Catania, Italy fax 446558 From: Mark.Denham@bbc.co.uk (Mark Denham) Date: 16 Aug 1995 08:28:03 -0400 I have a function, called as follows: CALL split_string(str, delimstr, returns, strict) RETURNING stat, strpart1, strpart2.... WHERE: str - String you want to break up delimstr - Delimiter string returns - Integer indicating how many STRING variables you want returned. strict - Boolean TRUE/FALSE. When true, the function returns a stat of -1 if the string to split does not contain EXACTLY returns sub-strings in it. On the return side, stat is always returned along with the number of strings requested. I will did it out if anyone is interested. Mark Denham BBC London, UK Mark.Denham@bbc.co.uk From: Mark.Denham@bbc.co.uk (Mark Denham) Date: 22 Aug 1995 18:13:03 -0400 By popular demand, here is the string splitting function that I use. Usage call split_string(src, delim, return, strict) returning stat, str1..strn src - Is string to split, max 5k delim - Field separator return - No. of strings to return (max). strict - Set status to -ve value if the no. of strings in src is not the same as return. Note: Get rid of any bugs yourself! Only kidding. Mark Denham BBC London, UK Mark.Denham@bbc.co.uk --------------------------- Cut here -------------------------------------- /* c source *} {************************************************************************* * * $Author$ * * $Date$ * * $Revision$ * * Doc Refs: * * Purpose: Allows an RDS program to split a string using a delimiter * given by the user. * * Usage: split_string string String to split. * delimiter Delimiter string. * returns Max no. of strings to return. * When there are less strings in * source than returns and strict is * FALSE, null strings are returned * for the remaining values. * strict If set causes the routine to return * a failure status if the number of * elements in string does not match * the number specified by returns. * * where: * * string char[5120] Null terminated string to split. * delimiter char[10] Null terminated delimiter string. * returns smallint Max. no. of strings to return. * strict integer TRUE/FALSE. Causes function to * fail when no. of strings found * does not match returns. * * Returns: stat integer 0 for ok, -ve as below: * -1 = Too few strings in source * -2 = retcnt > MAX_STRINGS * -3 = Source contains more strs * str[0..n] string As many strings as required. * * Library * Functions: * * Notes: WHEN USING RDS. * This function is MUST be linked with the RDS runner and debugger * using the cfglgo and cfgldb commands. Refer to the Interactive * debugger manual for details. * If you have a NULL value separated by 2 delimiters, * fred||john|... * This routine will not produce the result you expect! * * Modification Log *============================================================================ * * $Log$ * **************************************************************************** *} /* */ int dummy_string_split() { static char *rcsid = "@(#)$Header: $"; } #include #include #define MAX_STRINGS 320 #define MAX_DELIM_LEN 10 #define MAX_STR_SIZE 5120 int split_string(numargs) int numargs; { char *strtok(); char src_string[MAX_STR_SIZE+1]; char delim[MAX_DELIM_LEN+1]; char *strlist[MAX_STRINGS]; char *nullstr = "", *str; int strict = 0, retcnt = 0, rval = 0, stat = 0, curridx = 0, idx = 0; if( numargs != 4 ) { /* This is a problem and cannot easily */ rval=-1; /* 4GL will most likely produce an error*/ retint(rval); /* indicating that the no. of returned */ return(1); /* values is incorrect....*/ } /* Read parameter list */ popint(&strict); popint(&retcnt); popvchar(delim, MAX_DELIM_LEN); popvchar(src_string, MAX_STR_SIZE); /* Check that the no. strings required does not exceed max allowed */ if( retcnt <= MAX_STRINGS ) { str = src_string; /* Split up string, fail if srtict and end of source string reached */ for(; curridx < retcnt; curridx++) { strlist[curridx] = strtok(str, delim); str = NULL; if( strict && strlist[curridx] == NULL ) { rval = -1; break; } } } else rval = -2; if( rval == 0 ) { if( strict ) { /* See if the source string is empty, if not fail */ if(strtok(str, delim) != NULL ) rval = -3; } } /* Return extraction status */ retint(rval); /* Return all filled in values, may help debugging when an error occurs */ for(idx=0; idx < curridx; idx++ ) { retvchar(strlist[idx]); } /* Now do rest of strings, if any */ while(idx < retcnt) { retvchar(nullstr); idx++; } return(idx+1); }