The NetRexx Tutorial
- Operations on Strings
NetRexx Tutorial - Operations on Strings


Operations on Strings

Introduction

As we already said, in NetRexx there is only ONE native data type: the string. We already saw how to define a string; now we will concentrate our attention on how to operate on the strings, starting with the simplest operations (such as concatenating two strings together) and ending with one of the most powerful features of NetRexx, the string parsing. This chapter unfortunately contains long reference sections. I hope you will not get too tired going through them.

The string.

I remind you that we defined a string as "a sequence of characters" of arbitrary length and content. Strings are defined like this:

 
string = 'This is a string'
string_new = 'and this is another one'
 

You can use ' or " quotation marks to delimit a string when you define it.

String Concatenation

The first operation you might want to perform on a string (better on two or more strings) is to concatenate them, i.e. form a single string with a set of strings. NetRexx provides you with three ways of performing this:

 
 
(blank)      Concatenate terms with one blank in between;
 
||           Concatenate without an intervening blank;
 
(abuttal)    Concatenate without an intervening blank;
 
 

Concatenation without a blank might be forced by using the || operator. The same result can be obtained if a literal string and a symbol are abutted. This is the abuttal operator. Suppose you have a variable p1 that contains the string 'my' and a variable p2 that contains the string 'simple test'. Look at the concatenation:

 
simple
  say p1 p2     -> 'my simple test'
 
no blanks
  say p1||p2     -> 'mysimple test'
 
abuttal
  say 'my'p2     -> 'mysimple test'
 

The following additional examples might better clarify how concatenation works:

 
/* setting */                     /* s values */
s1 = 'Tyranno'
s2 = '-'
s3 = 'Saurus'
 
s = s1 s3                         s = 'Tyranno Saurus'

-- notice I put MANY spaces between s1 and s3: they
-- have no effect
s = s1       s3                   s = 'Tyranno Saurus'

s = s1||s3                        s = 'TyrannoSaurus'

s = s1||' '||s3                   s = 'Tyranno Saurus'

-- Here spaces count!
s = s1||'    '||s3                s = 'Tyranno    Saurus'

s = s1 s2 s3                      s = 'Tyranno - Saurus'

s = s1||s2||s3                    s = 'Tyranno-Saurus'

s = s1      s2       s3           s = 'Tyranno - Saurus'

s = s1'-'s3                       s = 'Tyranno-Saurus'
 

Comparative operators.

The very same comparative operations that can be done with numbers can, of course, be done with strings. The comparative operators return:

 
 
   1 if the result of the comparison is true
 
   0 otherwise
 
 

NetRexx has two sets of operators: the normal comparison and the strict comparison. The strict comparison is really what its name suggests: two strings must be strictly identical in order to pass the comparison.

 
NORMAL comparative operators:
 
     =                 True if terms are equal;
     \= , ^=           Not equal;
     >                 Greater than;
     <                 Less than;
 
     >< , <>           Greater than or less than
                         (same as NOT EQUAL)
     >= , ^< , \<      Greater than or equal to,
                         not less than;
     <= , ^> , \>     Less than or equal to,
                         not greater than;
 
STRICT comparative operators:
 
     ==               True if the terms are strictly equal
                         (identical)
     \== , ^==        True if terms are strictly not
                         equal
     >>               strictly greater than;
     <<               strictly less than
     >>= , ^<< , \<<  strictly greater than or equal to,
                         strictly  not less than;
     <<= , ^<< , \>>  strictly less than or equal to,
                         strictly not greater than;
 
 
BOOLEAN operators:
 
     &                AND;
 
     |                Inclusive OR;
 
     &&               Exclusive OR;
 
     ^  , \           LOGICAL NOT
 
 

You will probably never need some of these operators, although it is good to know that they exist in order to avoid 'reinventing the wheel' when faced with a particular problem. The most important operators are definitely = , ^= , < , >; you will be using them for 99% of your comparisons.

A small program for checking comparisons.

We give a small example that shows the difference between the strict and the normal operators: the program we run is as follows:

 
+----------------------------------------------------------------------+
| -- strict test                                                       |01
| --                                                                   |02
| str    = 'test'                                                      |03
| str[1] = 'test'                                                      |04
| str[2] = ' test'                                                     |05
| str[3] = 'test '                                                     |06
| say 'Comparing "'str'".'                                             |07
| loop i = 1 to 3                                                      |08
|   normal = (str = str[i])                                            |09
|   strict = (str == str[i])                                           |10
|   say normal strict                                                  |11
| end                                                                  |12
| exit 0                                                               |13
+----------------------------------------------------------------------+
                                                           strstrict.nrx
Download the source for the strstrict.nrx example

and the result is:

 
....................................................................
rsl3pm1 (39) java strstrict
Comparing string "test".
with "test"      is normal: 1 ; strict: 1.
with " test"     is normal: 1 ; strict: 0.
with "test   "   is normal: 1 ; strict: 0.
rsl3pm1 (40) 
....................................................................
                                                           strc1.out

Miscellaneous functions on strings.

Although this book is not a true reference, I would like to present some of the many built-in functions available in NetRexx. For a complete list, consult the NetRexx Reference. The purpose of including this list here is so that I can be sure that you at least know that some instructions exist. In fact, I have to admit that once I wrote myself a function in order to find out the last occurrence of a character in a string. A colleague later showed me that this function already existed (it is called lastpos()).

 
---------------------------------------------------------
Standard NetRexx functions
 
information.abbrev(info,length)
        Check if 'info' is a valid abbreviation for the
        string 'information';
 
string.center(length,pad)
        Centers a string;
 
string1.compare(string2,pad)
        Compares 2 strings Ñ 0 is returned if the strings
        are identical, and if they are not, it returns the
        position of the first character not the
        same;
 
string.copies(n)
        Makes 'n' copies of the given string 'string';
 
string.delstr(n,length)
        Deletes the sub-string of 'string' that begins at the
        n-th character, for 'length' characters;
 
string.delword(n,length)
        Same as above, but now the integers 'n' and 'length'
        indicate words instead of characters, i.e. space
        delimited sub-strings;
 
new.insert(target,n,length,pad)
        Inserts a string ('new') into another ('target');
 
haystack.lastpos(needle,start)
        Returns the position of the last occurrence of the
        string 'needle' into another, 'haystack'; if the
        string is NOT found, 0 is returned; see also pos();
 
string.left(length[,pad])
        Returns the string 'length' characters with the
        left-most characters of 'string';
 
string.length()
        Returns the 'string' length;
 
string.lower([n[,length])
        Returns a lower case copy of the string.
        Lowering will be performed from character n
        for length characters. If nothing
        is specified, lower() will lowercase the
        whole string, from the 1st character.
 
new.overlay(target,n,length,pad)
        Overlays the string 'new' onto the string 'target',
        starting at n-th character;
 
haystack.pos(needle,start)
        Returns the position of one string 'needle' inside
        another one (the 'haystack');
 
string.reverse()
        Returns the 'string' , swapped from end to start;
 
string.right(length,pad)
        Returns a string of length 'length' with the 'length'
        of right-most characters of a string 'string';
 
start.sequence(end)
        Returns a string of all one-byte character
        representations  starting from characters 'start'
        up to character 'end';
        It replaces REXX's xrange() function;
 
string.space(n,pad)
        Formats the blank-delimited words in string 'string'
        with 'n' 'pad' characters;
 
string.strip(option,char)
        Removes Leading, Trailing, or Both (Leading and
        Trailing) spaces from string 'string';
 
string.substr(n,length,pad)
        Returns the substring of string that begins at the
        'n'-th character;
 
string.subword(n,length)
        Returns the sub-string of string 'string' that starts
        at the 'n'-th word (for 'length' words: DEFAULT is
        up to the end of string);
 
string.translate(tableo,tablei,pad)
        Translates the characters in string 'string'; the
        characters to be translated are in 'tablei', the
        corresponding characters (into which the characters
        will be translated), are in 'tableo';
 
string.verify(reference,option,start)
        Verifies that the string 'string' is composed ONLY of
        characters from 'reference';
 
string.word(n)
        Returns the 'n'-th blank delimited word in string
        'string';
 
string.wordindex(n)
        Returns the character position of the 'n'-th word
        in string 'string';
 
string.wordlength(n)
        As above; but returning its length;
 
string.wordpos(phrase,start)
        Searches string 'string' for the first occurrence
        of the sequence of blank-delimited words in 'phrase';
 
string.words()
        Returns the number of words in string 'string';
 
string.upper()
        Returns the string uppercase;
 
string.lower()
        Returns the string converted lowercase;
 
---------------------------------------------------------
 

You might now say: Thanks a lot for this list, but what are the most important functions, i.e. the most used ones I should remember? To make myself clearer, I have taken a sample of REXX programs written by a group of people and have tried to print out some statistics on the functions you just saw. This is the result:

 
----------------------------------------------------
substr......: 361  19%       length......:  252  13%
wordpos.....: 214  11%       upper.......:  164   8%
right.......: 152   8%       space.......:  147   7%
insert......: 110   5%       words.......:  109   5%
strip.......:  74   3%       translate...:   70   3%
abbrev......:  58   3%       lastpos.....:   48   2%
copies......:  31   1%       pos.........:   30   1%
 
overlay.....:  23   1%       delword.....:   14   0%
reverse.....:   5   0%       verify......:    4   0%
subword.....:   1   0%       xrange......:    1   0%
lower.......:   1   0%       center......:    0   0%
wordindex...:   0   0%       delstr......:    0   0%
compare.....:   0   0%
----------------------------------------------------
                          most used string functions
 

As you can see, at the top of the 'TOP-10' string functions is the substr instruction. Functions such as compare() never appeared. For comparison, the parse instruction (see next chapter) received 567 hits, whilst the do got 690. I've not included those instructions in the list simply because I wanted to look at only the string functions we've seen so far.

Some 'particular' string functions.

Some of the functions you have just seen require a bit more discussion. This will be taken care of in the section that follows.

translate().

The translate function is used Ñ as the name suggestsÊÑ to translate the characters that form a string, following a very simple rule: if a character is in a table (usually called TABLEI), it is translated into the corresponding character present in another table (usually called TABLEO). If a given character is not in the TABLEI, then it remains unchanged. The syntax of the function is:

 
trans = str.translate(tableo,tablei)
 

Some examples will better clarify:

 

'TEST'.translate('O','E')      -> 'TOST'

'CAB'.translate('***','ABC')   -> '***'

'(INFO)'.translate('  ','()')  -> ' INFO '

 

A often-made mistake is to invert the logic for TABLEO and TABLEI: I do this myself, and put TABLEO where TABLEI should be, and vice versa. To avoid this confusion, I suggest you always try to translate before, so that you can be sure that your tables are correctly placed. What's the use of translate()? A typical case is when you want to get rid of characters you do not wish to process. In this way your TABLEI will contain all the unwanted characters, and TABLEO will just be an empty string. Another possible application is an ASCII to EBCDIC converter (or EBCDIC to ASCII).

Parsing.

The parsing feature of NetRexx is, in my opinion, one of the most useful and powerful features of the language and probably deserves a chapter to itself. By the term parsing we mean the splitting up of a selected strings into assigned variables, under the control of a template. The syntax of the instruction is the following:

 
  parse variable template
 

The variable is the original string you want to split-up, whilst the template is the set of rules to be used to do this split-up (together with the variables that will hold the result).

 
              original_string
                     |
               template
                     |
        +---------+--+-------+-----(...)---+     PARSING
        |         |          |             |
        v         v          v             v
     string1   string2    string3       stringN
 
 

You might consider the template as a 'filter', or as a 'set of rules'. NetRexx 'reads' these rules before splitting up the original string into the targeted ones, and then uses the rules to complete the task. There are several ways to parse a string. In brief, you can parse a string

We will now analyse all possible cases for a particular 'flavour' for the parse instruction, the parse var.

Parsing into words.

This is probably the most simple case: the variable is split into words defined by the variable(s) that follow the one we want to parse.

 
--------------------------------------------------------
 
string = 'Very Simple String'
 
parse string word1 word2 word3
         |
         +--->  word1 'Very'
         +--->  word2 'Simple'
         +--->  word3 'String'
 
str = 'This simple string, I hope, is parsed.'
 
parse str p1 p2 rest
       |
       +--->  p1    'This'
       +--->  p2    'simple'
       +--->  rest  'string, I hope, is parsed.'
 
str = 'Short string'
 
parse str p1 p2 rest
       |
       +--->  p1    'Short'
       +--->  p2    'string'
       +--->  rest  " (NULL)
 
--------------------------------------------------------
                                      parsing into words
 

As you can see, the template is simply a set of variables, which will hold the result after the split by word has been performed. Each variable holds a word. A word is a set of characters divided by a SPACE (' ').

Parsing with literal patterns.

In this case NetRexx will scan the data string to find a sequence that matches the value of the literal. Literals are expressed as quoted strings. The literals DO NOT appear in the data that is parsed.

 
--------------------------------------------------------
 
str = 'Here I am.'
 
parse str p1 'I' p2
       |
       +--->  p1    'Here'
       +--->  p2    ' am.'
 
str = 'This simple string, I hope, is parsed.'
 
parse str p1 ',' p2 ',' p3
       |
       +--->  p1    'This simple string'
       +--->  p2    ' I hope'
       +--->  p3    ' is parsed.'
 
parse str p1 'simple' p2 ',' p3 'is' p4'.'
       |
       +--->  p1    'This'
       +--->  p2    'string'
       +--->  p3    ' I hope,'
       +--->  p3    ' parsed'
 
--------------------------------------------------------
                           parsing with literal patterns
 

I stress the fact that the characters (or strings) that you use to build your literal patterns DO NOT appear in the final parsed result.

Parsing using periods as place holder.

The symbol '.' (single dot) acts as a place holder in a template. It can be regarded as a "dummy variable", since its behaviour is exactly the same as a variable, except that the data is not stored anywhere. Use it when you 'really don't care' about some portions of a string.

 
--------------------------------------------------------
 
str = 'This simple string, I hope, is parsed.'
 
parse str . p1 . . p2 .
       |
       +--->  p1    'simple'
       +--->  p2    'hope,'
 
--------------------------------------------------------
                   parsing using periods as place holder
 

As you can see, the terms This , string, , I , and is is parsed. have simply disappeared. It is a common construct to put the '.' at the end of a parsing instruction, simply to avoid the extra arguments that would pollute the last valid argument in the parsing itself. You should keep an eye on the '.' as the /dev/null for parsing. It can eat a word (if in the middle of a pattern) or even all the remaining part of a string, if the '.' is the last term.

parsing using unsigned numbers.

If you put unsigned numbers in a pattern, NetRexx will treat them as references to a particular character column in the input.

 
--------------------------------------------------------
 
str = 'This simple string, I hope, is parsed.'
 
parse str p1 10 p2 20 p3
       |
       +--->  p1    'This simp'
       +--->  p2    'le string,'
       +--->  p3    ' I hope, is parsed.'
 
str = TEST
 
parse str 1 p1 1 p2 1 p3
       |
       +--->  p1    'TEST'
       +--->  p2    'TEST'
       +--->  p3    'TEST'
 
 
 
--------------------------------------------------------
                          parsing using unsigned numbers
 

As you can see, the variable p1 holds the characters from the original str string from the first to the ninth column. The variable p2 holds the characters from the 10th column to the 19th. The variable p3 holds the rest of the input. Note that the space is treated as is any other character. In the second example we see an interesting feature: we can restart from a given position when this is defined by an unsigned integer.

Parsing using signed numbers.

Signed numbers can be used in a template to indicate a displacement relative to the character position at which the last match occurred.

 
--------------------------------------------------------
 
str = 'ABCDEFGHILM'
 
parse str 3 p1 +4 p2
       |
       +--->  p1    'DEFG'
       +--->  p2    'HILM'
 
parse str 3 p1 +4 p2 6 p3
       |
       +--->  p1    'DEFG'
       +--->  p2    'HILM'
       +--->  p3    'GHILM'
 
--------------------------------------------------------
                            parsing using signed numbers
 
 

Let us look at the first example: the first '3' tells the interpreter 'Position yourself at the 3rd character of "str".' (this is "D"). Then 'p1 +4' instructs it to 'Put in "p1" the characters that follow, until you have reached the 4th character from where you were' (this will build "DEFG"). Then we see "p2" which tells it to: 'Put all the rest in 'p2'. So that 'p2' comes to be "HILM".

Parsing with variable patterns.

(Don't worry, this is the last case!) Using '(' ')' to delimit a variable in a template will instruct NetRexx to use the value of that variable as a pattern.

 
--------------------------------------------------------
 
delim = ','
str = 'This simple string, I hope, is parsed.'
 
parse str p1 (delim) p2 (delim) p3
       |
       +--->  p1    'This simple string'
       +--->  p2    ' I hope'
       +--->  p3    ' is parsed.'
 
--------------------------------------------------------
                          parsing with variable patterns
 
 

This is probably the most complex case, since the pattern is variable.

Parsing with ALL methods intermixed.

Of course you will ask yourself: "I've seen all those methods for parsing a string, but can I intermix them?". The answer is Ñ as you can imagine, since I asked this question rhetorically Ñ "Yes!". Your template can intermix all the methods we've seen so far, and it can became extremely complicated. You can write:

 
parse test 1 flag +1 info tape . '-' rest 80 comment
 

Strings & Parsing in the real life.

Implement a stack or a queue using a string.

A stack is an example of abstract data type (see KRUSE, 1987, pg. 150).

Usually the implementation of a stack is done using arrays, which require particular attention for conditions like empty-stack full-stack, etc.

If we make the assumption that you're dealing with numeric quantities (or with space delimited alphanumeric quantities), the implementation of a stack (or a queue) is extremely easy and elegant using a simple string.

This is how you do it:

 
(...)
stack = "              -- empty stack
(...)
stack = n stack         -- push() n into the stack
(...)
parse stack m stack     -- pop() m from the stack
(...)
entries = stack.words() -- count stack items
(...)
 

To be even more clear, let's follow the example:

 

op                       stack
--                       -----
stack = "               "
stack = 1 stack          1
stack = 2 stack          2 1
stack = 3 stack          3 2 1
parse stack m stack      2 1       m = 1
stack = 4 stack          4 2 1
parse stack n stack      2 1       n = 1

 

Parsing a list of words.

You will often find yourself with a string that contains a list of items (words). If you need to process all the items from this list, here is a simple trick for doing it. The basic idea is the following:

 
do while list <> "
  parse list item list
  (...)
  processing over 'item'
  (...)
end
 

the variable list is parsed with itself, and what we obtain is only its first word, keeping what remains. In fact, we are just 'eating-up' list word by word, in each iteration. This small piece of code illustrates the trick:

 
+----------------------------------------------------------------------+
| -- pex1.nrx                                                          |01
| --                                                                   |02
| list = 'MARTIN DAVID BOB PETER JEFF'                                 |03
| i = 0                                                                |04
| loop while list <> ''                                                |05
|   parse list item list                                               |06
|   i = i+1                                                            |07
|   say i.right(2,'0') item.left(10) list                              |08
| end                                                                  |09
| exit 0                                                               |10
+----------------------------------------------------------------------+
                                                                pex1.nrx
Download the source for the pex1.nrx example

NOTEs:

Here is what you get when you run it.

 
.............................................................  

01 MARTIN     DAVID BOB PETER JEFF
02 DAVID      BOB PETER JEFF
03 BOB        PETER JEFF
04 PETER      JEFF
05 JEFF
.............................................................
                                                 parseex1.out

Sorting.

In the NetRexx language there are no built-in sort functions.

sorting a string

The following program atom str_sort.regproto does a sort over a string. Even if this is not a built-in function, you call it as if it were:

 
 
sorted = xtring.sort(string , 'R' ) 
 
 

where string is our unsorted string, and 'R' is an optional parameter to signify a reverse sorting. The code is:

 
+----------------------------------------------------------------------+
| -- method......: sort                                                |64
| -- purpose.....: Sort a string                                       |65
| --               A = Ascending: A B C D ...                          |66
| --               R = Reverse:   ... D C B A                          |67
| --                                                                   |68
|   method sort(stri=Rexx,mode=Rexx) public static                     |69
|     if mode <> 'R' then mode = ''                                    |70
|     ws = stri.Words()                                                |71
|     incr = ws%2                                                      |72
|     loop while incr > 0                                              |73
|       loop i = incr+1 for ws                                         |74
|          j = i-incr                                                  |75
|          loop while j > 0                                            |76
|             k = j+incr                                               |77
|             wj = stri.Word(j)                                        |78
|             wk = stri.Word(k)                                        |79
|             if mode = 'R'                                            |80
|               then do ; If wj >= wk Then Leave ; end;                |81
|               else do ; If wj <  wk  Then Leave ; end;               |82
|             stri = stri.Subword(1,j-1) wk     -                      |83
|                    stri.Subword(j+1,k-j-1) wj -                      |84
|                    stri.Subword(k+1)                                 |85
|             j = j-incr                                               |86
|          End                                                         |87
|       End                                                            |88
|       incr = incr%2                                                  |89
|     End                                                              |90
|     stri = stri.space()                                              |91
|     Return stri                                                      |92
|                                                                      |93
+----------------------------------------------------------------------+
                                                xstring.nrx(Method:sort)
Download the complete source for the xstring.nrx library

A sample program that calls such a routine is:

 
+----------------------------------------------------------------------+
| -- composers.nrx                                                     |01
| --                                                                   |02
| composers = 'Bach Vivaldi Verdi Mozart Beethoven Monteverdi'         |03
|                                                                      |04
| say 'Unsorted:' composers'.'                                         |05
| say 'Sorted..:' xstring.sort(composers,'A')'.'                       |06
| say 'Sorted.R:' xstring.sort(composers,'R')'.'                       |07
| exit 0                                                               |08
+----------------------------------------------------------------------+
                                                           composers.nrx
Download the source for the composers.nrx example

and here is a sample output:

 
....................................................................
rsl3pm1 (110) java composers
Unsorted: Bach Vivaldi Verdi Mozart Beethoven Monteverdi.
Sorted..: Bach Beethoven Monteverdi Mozart Verdi Vivaldi.
Sorted.R: Vivaldi Verdi Mozart Monteverdi Beethoven Bach.
rsl3pm1 (111) 
....................................................................
                                                            eso1.out

Other string manipulation examples

A simple "censure"

The following code is a simple implementation of a "censor" over a string. Suppose that you totally want to get rid of a string inside another string, or replace it with 'XXX' characters (like real censors do). The small method described above might help you.

 
+----------------------------------------------------------------------+
| -- method......: censure                                             |44
| -- purpose.....: get totally rid of a string sequence                |45
| --               inside a string                                     |46
| --                                                                   |47
|   method censure(s1=Rexx,s2=Rexx,ch=Rexx) public static              |48
|     -- initialization                                                |49
|     os = ''                                                          |50
|     repl = ''                                                        |51
|     if ch <> '' then                                                 |52
|       do                                                             |53
|         n = s2.length()                                              |54
|         repl = ch.copies(n)                                          |55
|       end                                                            |56
|                                                                      |57
|     -- do the job: this is really easy with parse ()                 |58
|     loop while s1 <> ''                                              |59
|       parse s1 p1(s2)s1                                              |60
|       if s1 <> ''                                                    |61
|         then os = os||p1||repl                                       |62
|         else os = os||p1                                             |63
|     end                                                              |64
|                                                                      |65
|     -- all done                                                      |66
|     return os                                                        |67
|                                                                      |68
|   method censure(s1=Rexx,s2=Rexx) public static                      |69
|     return censure(s1,s2,")                                         |70
|                                                                      |71
+----------------------------------------------------------------------+
                                             xstring.nrx(Method:censure)
Download the complete source for the xstring.nrx library

You should look at the way it is implemented: the string is parsed, till it is exausted, using:

 
  parse string (search) string
 

where search is a value determined at runtime.

An animated status line.

Some programs take a long time to run, so that the person sitting in front of the terminal might ask "What ARE they doing?". So it is often nice to show the user 'where' the program is in the processing. For example, if a program has to process 300 files, and each file takes one or more seconds to process, you might want to use the routine that follows, in order to keep the person sitting at the terminal informed as to how many files the program has done, and how many there are yet to go. The following routine shows:

 
 
 1. a 'rotating' symbol                  : (- \ | / -)
 2. a number of 'done' item              : nnnn/NNNN
 3. a graphic scale of 'done' items      : [****.....]
 4. a numeric percent                    : nnn%
 5. an additional information message    : string
 
 

The routine that is really of interest to you is called info_display. In this example, between the various displays we really do nothing (just a sleep instruction). This 'sleep' should be replaced by your computation intensive/time expensive part of the code.

 
+----------------------------------------------------------------------+
| -- method......: display                                             |62
| -- purpose.....:                                                     |63
| --                                                                   |64
|   method display(i1=Rexx,i2=Rexx,rest=Rexx) public                   |65
|     pt = dinfop//4 +1                                                |66
|     f1 = '-\\|/'.substr(pt,1)                                        |67
|     dinfop = dinfop+1                                                |68
|     n1 = i1/i2*20                                                    |69
|     n2 = i1/i2*100                                                   |70
|     n1 = n1.format(3,0)                                              |71
|     n2 = n2.format(3,0)                                              |72
|     cu = '.'.copies(20)                                              |73
|     cu = cu.overlay('*',1,n1,'*')                                    |74
|     s1 = i1.right(4,'0')                                             |75
|     str = f1 s1||'/'||i2.right(4,'0') '['cu'] -' rest                |76
|     System.out.print(str'\x0D')                                      |77
|                                                                      |78
+----------------------------------------------------------------------+
                                             xstring.nrx(Method:display)
Download the complete source for the xstring.nrx library

Of course, you cannot see the motion in the figure, but you can use your imagination. You should simply try it on a real terminal, and you will get, on the very same line, something that 'moves' and shows (more or less) this:

 
.....................................................................
rsl3pm1 (80) display
 
\ 0001/0010 [**..................]  10%  |
(...)                                    | ALWAYS
| 0005/0010 [**********..........]  50%  | ON
/ 0006/0010 [************........]  60%  | THE
(...)                                    | SAME
- 0010/0010 [********************] 100%  | LINE
 
rsl3pm1 (81)
.....................................................................
                                                      display example

A hashing function.

I will not discuss in detail the concepts of hashing. I leave this to more specialised literature [ KRUSE, LEUNG , TONDO ; 1991]. I will simply note that hashing is used to perform fast searches in databases, and hashing functions are used to index a hashing table. The basic idea of a hashing table is to allow a set of keys to be mapped into the same location as that of an array by means of an index function. For the moment we are not interested in implementing a full hashing table algorithm, so we will will concentrate on the hashing function itself. We need an algorithm that takes a key (a string) and builds a number. The algorithm must be quick to compute and should have an even distribution of the keys that occur across the range of indices. The following function hash can be used for hashing keys of alphanumeric characters into an integer of the range:

 
  0 ... hash_size
 

You call the function issuing:

 
  nn = hash(key)
 

 
+----------------------------------------------------------------------+
| -- method......: hash                                                |02
| -- purpose.....:                                                     |03
| --                                                                   |04
|   method hash(str=Rexx) public static                                |05
|     hash_size = 1023                                                 |06
|     t = 0                      -- zero total                         |07
|     l = str.length()           -- str length                         |08
|     loop while str <> ''       -- loop over all CHARS                |09
|       parse str ch +1 str      --  get one                           |10
|       t = t+ch.c2d()           --  add to total                      |11
|     end                        --                                    |12
|     out = (t*l)//hash_size     -- fold it to SIZE                    |13
|     return out                                                       |14
|                                                                      |15
+----------------------------------------------------------------------+
                                                xstring.nrx(Method:hash)
Download the complete source for the xstring.nrx library

The algorithm shown is rather fast, and produces a relatively even distribution. The basic idea is in the loop that adds-up the decimal value of each character. I then multiply this value with the original length of the string, and modulo for the hash table size.

Converting from/to BASE64 (MIME).

The small programs that we analyse in this section are merely two small examples of how you can implement a BASE-64 converter. You can find more info on the Sun Implementation for a BASE64 Decoder/Undecoder methods at the URL:

 
http://www.java.no/javaBIN/docs/api/sun.misc.BASE64Decoder.html
http://www.java.no/javaBIN/docs/api/sun.misc.BASE64Encoder.html
 

Keep in mind that the MIME protocol (see RFC 1341 and 1342) is a mechanism by which you can send binary files by mail. The basic idea is the following: you take a set of bytes, you split by chunks of 6 bits each, you build 4 new bytes and you map this new quantity in base 64 (2**6 = 64). Suppose you want to translate the string "Thi" to base 64. Here is the procedure:

 
  1. Original string:
     'Thi'
 
  2. Translated in HEX:
     '54 68 69'
 
  3. translated in BINARY:
     '01010100 01101000 01101001'
 
  4. ditto (group by 6):
     '010101 000110 100001 101001'
 
  5. Add '00' in front of each 6 bits:
     '00010101 00000110 00100001 00101001'
 
  6. New quantities (in HEX):
     '15 06 21 29'
 
  7. Convert to Base 64:
     'VGhp'
 

The two following programs will convert one (a2m) from a generic string to a BASE-64 string, and the opposite for the other (m2a). Look at the listing for a2m. From line 16 to line 21 I put into comments the steps which I described above for the conversion (note how each step is an instruction). The whole algorithm is based on the parse and the translate function.

 
+----------------------------------------------------------------------+
| -- method......: a2m                                                 |16
| -- purpose.....: Convert a string from ASCII to MIME                 |17
| --                                                                   |18
|   method a2m(str=Rexx) public static                                 |19
|     b64 = '\x00'.sequence('\X3F')                                    |20
|     e64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" || -                          |21
|           "abcdefghijklmnopqrstuvwxyz" || -                          |22
|           "0123456789+/"                                             |23
|                                                                      |24
|     out = ''                                                         |25
|     loop while str <> ''                                             |26
|       parse str bl +3 str                               /* 1 */      |27
|       bit = c2x(bl).x2b()                               /* 2 , 3 */  |28
|       parse bit p1 +6 p2 +6 p3 +6 p4                    /* 4 */      |29
|       bitn = '00'p1'00'p2'00'p3'00'p4                   /* 5 */      |30
|       bln = x2c(bitn.b2x)                               /* 6 */      |31
|       base = bln.translate(e64,b64)                     /* 7 */      |32
|       if base.length()<>4 then                                       |33
|         do                                                           |34
|           app = '='.copies(4-base.length())                          |35
|           base = base||app                                           |36
|         end                                                          |37
|       out = out||base                                                |38
|     end                                                              |39
|     return out                                                       |40
|                                                                      |41
+----------------------------------------------------------------------+
                                                 xstring.nrx(Method:a2m)
Download the complete source for the xstring.nrx library

The opposite of a2m is m2a:

 
+----------------------------------------------------------------------+
| -- method......: m2a                                                 |42
| -- purpose.....: Convert a string from MIME to ASCII                 |43
| --                                                                   |44
|   method m2a(str=Rexx) public static                                 |45
|     b64 = '\x00'.sequence('\x3F')                                    |46
|     e64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" || -                          |47
|           "abcdefghijklmnopqrstuvwxyz" || -                          |48
|           "0123456789+/"                                             |49
|                                                                      |50
|     out = ''                                                         |51
|     loop while str <> ''                                             |52
|       parse str bl +4 str                                            |53
|       base = bl.translate(b64,e64)                                   |54
|       basex = c2x(base)                                              |55
|       bit = basex.x2b()                                              |56
|       parse bit 3 p1 9 11 p2 17 19 p3 25 27 p4 33                    |57
|       bitn = p1||p2||p3||p4                                          |58
|       new = x2c(bitn.b2x())                                          |59
|       out = out||new                                                 |60
|     end                                                              |61
|     return out                                                       |62
|                                                                      |63
+----------------------------------------------------------------------+
                                                 xstring.nrx(Method:m2a)
Download the complete source for the xstring.nrx library

Those programs could be used as building blocks for a real MIME packer/unpacker routine. Note that you will need quite a bit of work to make them really useful: what is missing is a proper handling of line splitting in the output file (in a2m).

Tricks with strings

TRICK: Avoid multiple substr() calls with just one parse. If you find yourself using more than one substr() function in a raw, you should probably consider rewriting your code using a more appropriate parse function. Suppose you have to split a time stamp in its components.

 
 YYMMDDhhmmss         (timestamp)
 | | | | | |
 | | | | | +--------- second
 | | | | +----------- minute
 | | | +------------- hour
 | | +--------------- day
 | +----------------- month
 +------------------- year
 

The first and most obvious approach is the following:

 
year  = substr(timestamp,1,2)
month = substr(timestamp,3,2)
(...)
 

And so on. The alternative using parse is:

 
parse var timestamp year +2 month +2 day +2 ,
                    hour +2 minute +2 second +2
 

The gain (both in terms of execution speed and coding) is clear: you use one instruction instead of six. Your code is also easier to modify (and to adapt to different formats of time-stamps). TRICK: Use the parse with '.' to avoid the need for issuing a space() afterwards. The title of this trick is a cryptology trick in itself. "How's that?" Simple. Suppose you need to parse lines of this format:

 
  node=rsl3pm1
  os=AIX
 

Depending on what the left term of the '=' sign is (we will call it the key), you will need to perform certain actions. What you can do is something along these lines:

 
  parse var line key '=' attributes
  if key == 'node' then  (...)
  if key == 'os' then (...)
 

This works well until there are no extra spaces between the key and the '=' sign. But this is precisely what will happen if someone modifies the file containing these lines, as we have seen. You must be 100% sure that someone will write:

 
  node = rsl3pm1
  os   = AIX
 

Now the value of key will be: "node " and "os ", and this is not exactly what we expect. The first solution that will came to mind is the following (at least it was the first that came to my mind before learning this trick):

 

  parse var line key '=' attributes
  key = space(key)
  if key == 'node' then  (...)
  if key == 'os' then (...)

 

The trick (finally we come to it), is to use a '.' in the parse, as here:

 

  parse var line key . '=' attributes
  if key == 'node' then  (...)
  if key == 'os' then (...)

 

This will avoid any space() instruction, acting as a 'space-eater'. TRICK: Avoid unexpected results from a missing wordpos(). This particular trick I learned from Eric Thomas, the author of LISTSERV(tm) (probably the most popular Mailing List Server Software). I offer a concrete example: suppose you want to write a program that translates a given TCP/IP port number in its "human" meaning , i.e. a program that tells you that port 21 is FTP, port 23 is TELNET, etc. You will write two lists , one containing the port numbers, the other the 'human meaning'. These lists will then be:

 
portl    = '21  23     37'
servicel = 'ftp telnet time'
 

Note that those two lists are "ordered": 21 is the port number for FTP, 23 for TELNET, and so on , i.e. the nth element of the list portl corresponds to the nth element of the list servicel. The existence of this one-to-one correspondence is the basis of our discussion. Suppose that the port number for which we want to know the 'human meaning' is contained in the variable port. The obvious way to find out its meaning is, first, to identify the position in the string portl of the variable port, and second, use this number to extract (using the function word() the corresponding value in the list servicel). Each of these words translates into a sentence:

 

service = servicel.word(portl.wordpos(port))

 

This code is correct, but 'buggy'; what happens if you enter a port number that is not in portl? The result of wordpos() will be 0, and a word with a second argument zero will cause a buggy "ftp" answer. We could check that port is in portl before doing the wordpos(), but there is a simpler solution:

 
service = ('unknown' servicel).word(portl.wordpos(port) + 1)
 

The trick is simple: we add a term in front of servicel (the 'unknown' term) and we add 'plus 1' to wordpos(). In this way we can be sure that we have covered the case when port is not in portl. The code is now correct, and can handle unexpected errors. I provide the full final code as a resume':

 
+----------------------------------------------------------------------+
| -- portn.nrx                                                         |01
| --                                                                   |02
| parse arg port .                                                     |03
| portl    = '21  23     37'                                           |04
| servicel = 'ftp telnet time'                                         |05
| service = ('unknown '||servicel).word(portl.wordpos(port)+1)         |06
| say service                                                          |07
| exit 0                                                               |08
+----------------------------------------------------------------------+
                                                               portn.nrx
Download the source for the portn.nrx example

Of course there are many more services (look to /etc/services if you want to see them. Note also that this is NOT the way to find out the service name from the port number; rather, see the chapter on sockets in order to discover how to obtain it from the system itself.

Chapter FAQ

QUESTION: How do I know the program's name at running time? This is a real FAQ. Suppose that you have written (or created, to make your work more important) a program called toto. How does toto know its name? You could put the information inside a variable in toto but that is UGLY, and whenever you rename the program, you will need to remember to change that variable. The solution is the parse source instruction Ñ do

 
parse SOURCE . . myname .
 

SMALL ADDENDUM for UNIX users. If you place the program toto in a directory in your PATH (for example, /usr/local/bin) and you execute it, you will notice an interesting effect: myname is no longer toto, but /usr/local/bin/toto. This might be interesting, since you're now capable of ascertaining the directory from which your program was called, but the question then becomes how to eliminate the (probably unwanted) /usr/local/bin? You do it by coding:

 
myname = myname.substr(myname.lastpos('/') + 1)
 

QUESTION: Can I put the character '00'X in a string? Yes. The only thing you need to remember is to make the byte a HEX constant, as here:

 
string = 'this is a' '\x00' 'test'
string = '\x00\x00\x00'
 

As a rule of thumb, you can put any character you like in a string; the only thing you should remember is that you might have problems if you try a say of this string. QUESTION: How do I display strings containing control characters? You can use the c2x() instruction, in order to see the string in HEX. A more elegant way is to translate all the non-printable characters to a '.' (or to any other character you prefer). This small program shows you how to do it:

 
+----------------------------------------------------------------------+
| -- nodisp                                                            |01
| --                                                                   |02
| str = 'This is a \x03\x09\xFE test.'                                 |03
| tablei = '\x00'.sequence('\x1F')||'\x80'.sequence('\xFF')            |04
| tableo = '.'.copies(tablei.length())                                 |05
| say str.translate(tableo,tablei)                                     |06
| exit 0                                                               |07
+----------------------------------------------------------------------+
                                                              nodisp.nrx
Download the source for the nodisp.nrx example

Note how I build the tablei: I use sequence() over all the unprintable characters (from '00'x to '1F'x, and from '80'x till 'FF'x). tableo is simply a sequence of '.' (for the same length of tablei). That is all I need. Note, however, that this will only work for ASCII systems: EBCDIC systems will require a different tablei.

Summary

We resume some of the concepts we have encountered in this chapter.

 
_ concatenate a string            | ||  or  abuttal
  (with no spaces)                | - ex.: s1||s2
                                  | - ex.: n1'%'
                                  |
_ concatenate a string            | blank
  (with spaces)                   | - ex.: s1 s2
                                  |
 

 
 *** This section is: 
  
 *** and will be available in next releases


File: nr_8.html.

The contents of this WEB page are Copyright © 1997 by Pierantonio Marchesini / ETH Zurich.

Last update was done on 18 May 1998 21:47:40(GMT +2).