An edited version of this article appeard in FD 1/94

(The code in this article has been tested under FPC 3.56 by Jeff Fox)


PARALLEL FORTH: The new approach

Michael Montvelishsky, Saransk Russia

As a rule the arrival of a new programming language is caused by the arrival of some new programming method or paradigm. Thus Algol-60 had marked the appearance of the structured programming approach and Pascal the advanced user-defined data types. Object-oriented programming was born coupled with Smalltalk, but C++ has marked it's coming to the "professional league". New programming paradigms demand new programming language. But Forth users need not change languages. Of course Forth is extensible and can easily adopt new paradigms. The subject of this paper demonstrates how Forth can adapt a parallel programming paradigm.

One of the "HOT" areas of Programming Science is parallel computing. C.A.R. Hoar, the author of "Communication Sequental Processes"-book is the first theoretical founder of Parallel Computing. The programming language OCCAM is based on Hoar's theory. It is considered to be one of a main parallel programming languages. There are two main features of this interesting language:

1. Channels for information exchange and synchronization between processes.

2. The ability of the currently running process to run several (the number may to be very large) "son" parallel processes. Parent processes stop themselves and run "son" processes then they start to run again after all son processes are done.

Another popular parallel programming model is LINDA by D. Gelernter. LINDA is not a programming language, but a parallelizing extension to a conventional programming language. There is C-LINDA, FORTRAN-LINDA and FORTH-LINDA. LINDA the paradigm is based on the idea of active and passive tuples. Active tuples are the processes started by some other process and running with it in co-operation.
Passive tuples are the tools for information exchange. The passive tuples exchange information like notes on a bulletin board.

Paradigms of OCCAM and LINDA do not conflict with one another, but rather complement each other. It is hard to emulate LINDA in OCCAM and it is hard to emulate OCCAM in LINDA efficiently.

The subject of this paper is a FORTH Parallel Extension. What new has arrived with Parallel Forth? Very little.

MULTI           Enable multiprocessing mode;

SINGLE          Disable multiprocessing mode;

||( ... )       Coupled words to add a new parallel branch
                (all words inside parenthesis) into the
                current mounted (not running!) processes'
                ring;

EV( ... )       Coupled words to add a new parallel branch
                into the current running processes' ring. It's
                like the active tuple in LINDA and the syntax
                has come from FORTH-LINDA by Jeff Fox;

PAR             Stop the current process and start a "son"
                processes' ring mounted by the "parent"
                process by means of a ||( ... ) -construction.
                Wait until all "son" processes are done and then
                run the "parent" process again.

PAUSE           Deferred (for now!) word. In MULTI -mode -
                switch to the next process in the current
                ring. In SINGLE -mode - NOOP;

STOP            Stop a process and remove it from the current
                processes' ring;

STOP-ALL        Stop all processes in the current ring and
                resume the "parent" process. Very useful to
                emulate OCCAM ALT -processes;

DATA-STACK-SIZE
RETURN-STACK-SIZE
USER-AREA-SIZE  Variables to control USER -area's and stack's
                sizes. It's because different processes have
                different demands for stacks and the number of
                user-variables;

BUFFER          Generic word to define buffers. Use buffers for
                the most efficient data exchange between processes
                (without synchronisation);

>B ( n b -- )   Output a 16-bit value to the buffer b;

B> ( b -- n )   Input a 16-bit value from the buffer b;

CHAN            Generic word to define channels. A channel is the
                buffer of one-equal-length.  A channel is useful
                for the exchange of information with
                synchronization.

[]CHAN          Generic word to define channels' area.

>C ( n c -- )   Output a 16-bit value to the channel c;

C> ( c -- n )   Input word a 16-bit value from the channel c;

[]WORD          Just like F-PC's ARRAY with the same aim.
Several notes, about the source code:

1. All the user-variables of "son" that are defined in a program have the same value as its "parent's" ones before ||( ... ) or EV( ... ) -constructions. Thus it's possible to use user-variables to pass parameters from a "parent" to a "son" or to a co-operative "parent".

2. Make sure the USER-AREAs' and stacks' size for each process are big enough to run without hanging up. It's easy to write special words to fill these areas by some char and then to check how many of them were used.

3. If You've start to mount "son" processes' ring by ||( ... ) then you can't run co-operator by EV( ... ) until PAR starts this ring.

4. There is automatic memory allocation for ||(...) and EV(...) processes. But automatic memory deallocation is only for ||(...) -processes (and ALL processes inside it)! Can you guess why?

5. All processes start with empty stacks.

About the programs now. The source of the parallel FORTH extension is in Figure 1. I've used F-PC ver. 3.53. It's one of most complete FORTHs for the IBM-PC compatible and it's PUBLIC DOMAIN! The source doesn't contain CODE words, so you can transfer it easily to another FORTH platform with only minor changes. You can rewrite PAUSE in assembler to increase efficiency, but the best way is to implement Parallel Forth on some real multiprocessor hardware :-)

A parallel program example is shown in Figure 2. I've choosen a scalar vector multiplication as example of using the parallel wordset. The task is like the task X4 from chapter 4.3 of C.A.R.Hoare's "Communication Sequental Processes". I've included the equivalent OCCAM program code as comments. Sorry, I have no transputer so I have not checked out this OCCAM code, but I hope it's valid.

The second example program (Figure 3.) is less "multiprocessing" but more useful. It's an alarm clock. Type:

21 00 ALARM" It's time to go home!"

to start experimenting with Parallel FORTH!


\ ==========================  Fig. 1  ============================
\ Dr. Michael B. Montvelishsky, Saransk Russia 1993

\ PARALLELISM WORDSET        by Michael Montvelishsky, 25-Sep-93
 ANEW PARALLEL

 CR .FREE

\ ============================================================ \
\ It is just a simple troock     :-) "TROOCK" is  the Russian
\ To replace the pause hook      :-) pronounce of the "TRICK"
\ PAUSE is CODE-word now.
 ' PAUSE                \ OLD
 DEFER PAUSE ' NOOP IS PAUSE
 ' PAUSE                \ OLD NEW
 2DUP SWAP -            \ OLD NEW SHIFT
 OVER 1+ +!             \ OLD NEW ( ADJUSTED!)
 TUCK                   \ NEW OLD NEW
 HERE SWAP -            \ NEW OLD SIZE
 CMOVE
 FORGET PAUSE
\ PAUSE is DEFERed NOW!
\ ============================================================ \

 USER
 VARIABLE (NEST)                \ Point to the last mounted proc
 VARIABLE (PARENT)              \ Point to the parent proc
 VARIABLE (NEXT)                \ Point to the next proc in ring
 VARIABLE RP                    \ To save
 VARIABLE SP                    \        stacks' value
 FORTH

 VARIABLE RETURN-STACK-SIZE     \ To change return stack size
 VARIABLE DATA-STACK-SIZE       \    -\\-   data stack size
 VARIABLE USER-AREA-SIZE        \    -\\-   user area size

\ GLOBAL memory allocation tool:
 DP CONSTANT (HEAP)
 : MALLOC (HEAP) +! ;
 : HEAP (HEAP) @ ;

\ Initiate variables :
 UP @ (PARENT) !
 UP @ (NEXT) !
 0    (NEST) !
 64   USER-AREA-SIZE !
 128  RETURN-STACK-SIZE !
 64   DATA-STACK-SIZE !

 : PROC-SIZE  ( -- current-proc-size )
   USER-AREA-SIZE @ RETURN-STACK-SIZE @ + DATA-STACK-SIZE @ +
 ;
 : GO-NEST   SP@ RP@ RP ! SP ! (NEST) @ UP ! RP @ RP! SP @ SP! ;
 : (PAUSE)   SP@ RP@ RP ! SP ! (NEXT) @ UP ! RP @ RP! SP @ SP! ;

\ Stop all current processes and go to parent process
 : STOP-ALL  (PARENT) @ (NEXT) ! (PAUSE) ;

\ Parallelism enable/disable
 : MULTI     ['] (PAUSE) IS PAUSE ;
 : SINGLE    ['] NOOP    IS PAUSE ;

\ Stop itself
 : STOP ( --- : Excludes proc from procs' ring )
   UP @ DUP (NEXT) @ - IF  \ Last in the ring?
     UP @ BEGIN (NEXT) 2DUP @ - WHILE @ UP ! REPEAT
     SWAP UP ! (NEXT) @ SWAP ! (PAUSE)
   ELSE
     STOP-ALL
   THEN
 ;

\ Create NEW process with specified PFA PARRENT -process and
\ process to be PREVious in processes' ring
 : MOUNT ( PFA PARRENT PREV --- : Create and mount new process)
   HEAP PROC-SIZE MALLOC                   \ Allocate proc space
   UP @ DUP >R OVER USER-AREA-SIZE @ CMOVE \ Copy parent's users
   OVER UP ! (NEXT) @ SWAP UP ! (NEXT) !   \ Adjusts new  (NEXT)
   UP @ USER-AREA-SIZE @ +
   RETURN-STACK-SIZE @ +                   \ Adjust for the NEW :
   DUP DUP RP0 ! 4 - RP !                  \    RP0 & RP
   DATA-STACK-SIZE @ + DUP SP0 ! SP !      \    SP0 & SP
   SWAP (PARENT) !                         \    (PARENT)
   (NEST) 0!                               \    (NEST)
   -ROT RP @  2!                           \ PFA to begin with
   UP @ SWAP UP ! (NEXT) !                 \ Point PREV's (NEXT)
   R> UP !                                 \   to the NEW
 ;

\ Run-time words for mounting and evaluating new process
 : (||()
   2R@ 4 + UP @ (NEST) @ ?DUP 0=
   IF HEAP DUP DP ! THEN   HEAP (NEST) ! MOUNT
 ;
 : (EV()
   (NEST) @ 0= IF
     2R@ 4 + (PARENT) @ (NEXT) @ MOUNT
   THEN
 ;

 : ||(  COMPILE (||() COMPILE BRANCH ?>MARK ; IMMEDIATE
 : EV(  COMPILE (EV() COMPILE BRANCH ?>MARK ; IMMEDIATE
 : )    COMPILE STOP ?>RESOLVE ; IMMEDIATE
 : PAR  (NEST) @ IF GO-NEST DP @ (HEAP) ! (NEST) 0! THEN ;

\ Use chans for information exchanging & SYNCHRONISATION
 : CHAN 2VARIABLE ;
 : C> ( CHAN -- W : Get word from the channel )
   >R BEGIN R@ @ 0= WHILE (PAUSE) REPEAT R@ 2+ @ R>  0!
 ;
 : >C ( W CHAN -- : put word to the channel )
   >R BEGIN R@ @    WHILE (PAUSE) REPEAT R@ 2+ ! R> -1!
 ;

\ Use buffers for EFFICIENT information exchanging
\ BUFF-SIZE and MASK may be increased!
 16 CONSTANT BUFF-SIZE  15 CONSTANT MASK
 : BUFF  CREATE 0 , BUFF-SIZE ALLOT  ;
 : .OUT 1+ ;        ( IN +0 )
 : .BUF 2+ ;
 : B> ( BUFF -- W : Get word from the buffer )
   >R BEGIN R@ C@ R@ .OUT C@ = WHILE (PAUSE) REPEAT
   R@ .OUT C@ DUP R@ .BUF + @ SWAP 2+ MASK AND R> .OUT C!
 ;
 : >B ( W BUFF -- : Put word to the buffer )
   >R R@ C@ 2+ MASK AND
   BEGIN DUP R@ .OUT C@ = WHILE (PAUSE) REPEAT
   SWAP R@ C@ R@ .BUF + ! R> C!
 ;

 : 4* 2* 2* ;

\ Generic words for word and channel array
 : []CHAN
   CREATE 4* HERE OVER ERASE ALLOT ?STACK
   DOES> SWAP 4* +
 ;
\ IMHO []WORD is more useful than F-PC's ARRAY
 : []WORD
   CREATE 2* HERE OVER ERASE ALLOT ?STACK
   DOES> SWAP 2* +
 ;

 MULTI

 CR .FREE

\ ================================================================

\ ==========================  Fig. 2  ============================
\ Dr. Michael B. Montvelishsky, Saransk Russia 1993

fload figure1

 30 CONSTANT SIZE                \ DEF  SIZE = 30 :
 USER VARIABLE U.I FORTH         \ VAR  U.I     ,
 SIZE []WORD V1                  \      V1[SIZE],
 SIZE []WORD V2                  \      V2[SIZE]:
 SIZE 1+ []CHAN N                \ CHAN N[SIZE+1],
 SIZE    []CHAN W                \      W[SIZE] ,
 SIZE    []CHAN E                \      E[SIZE] :
                                 \ VAR  T1,T2,T3,T4:                                 \
 : VECT-MULT                     \ PROC VECT.MULT =
                                 \ SEQ
   SIZE 0 DO                     \   U.I = [0 FOR SIZE]
     I 1+ I V1 !                 \     V1[U.I] := U.I + 1
     I 1+ I V2 !                 \     V2[U.I] := U.I + 1
   LOOP                          \   PAR
   SIZE 0 DO I U.I !             \     U.I = [0 FOR SIZE]
     I U.I !                     \       PAR
     ||( U.I @ V1 @ U.I @ W >C ) \         W[U.I] ! V1[U.I]
     ||( U.I @ V2 @ U.I @ E >C ) \         E[U.I] ! V2[U.I]
     ||(                         \         SEQ
       U.I @ W C>                \           W[U.I]   ? T1
       U.I @ E C> *              \           E[U.I]   ? T2
       U.I @ N C> +              \           N[U.I]   ? T3
       U.I @ 1+ N >C             \           N[U.I+1] ! T1*T2+T3
     )                           \
   LOOP                          \
   ||( 0 0 N >C )                \     N[0] ! 0
   ||( SIZE N C> U. )            \     SEQ
                                 \       N[SIZE] ? T4
                                 \       WRITE(T4) :
   PAR
 ;
\ ================================================================

\ ==========================  Fig. 3  ============================
\ Dr. Michael B. Montvelishsky, Saransk Russia 1993

fload figure1

\ Alarm clock

 ANEW ALARM
 HIDDEN ALSO EDITOR ALSO

 USER
 VARIABLE ALARM-HM
 VARIABLE ALARM-STRING
 FORTH

 : ALARM" ( HOURS MINUTES | " )
   SWAP FLIP + 0 0. B>T DROP ALARM-HM ! \ Get time
   HERE ," ALARM-STRING !               \ Get alarming string
   EV(
     ALARM-HM @
     BEGIN                              \ Wait ...
       1000 FOR PAUSE NEXT              \ Skipping for efficiency
       DUP GETTIME DROP U<=
     UNTIL
     DROP SINGLE                        \ Disable parallelism
     SAVESCR SAVECURSOR                 \ Save screen & cursor
     DKGRAY >BG WHITE >FG               \ Select colours
     TRUE ALARM-STRING @ COUNT ?SOFTERROR       \ Alarm !!
     RESTCURSOR RESTSCR MULTI           \ Restore screen&cursor
   )                                    \ Enable parallelism
 ;

 ONLY FORTH ALSO DEFINITIONS

\ ================================================================+
Dr. Michael B. Montvelishksy:

I'm from Saransk (republic Mordovia capital, 600kM to the South-East from the Moscow). I've get my Electronics engineer diploma in 1982 in Electronics department of Mordovia State University. I've came to Leningrad (St.Petersbourg now) University in 1982 to get the postgraduate. I dealt with Robot software and got my Ph.D degree on this matter in 1989. I've get start with Forth in 1988. It's my favourite so far, but I write programs on C and Pascal sometimes. There are many of project I've done (alone or in couple with Max) on Forth. Latest and greatest from its is the embedded system for CNC of Electroerosion machinetool with High Level Geometric (Forth-based) language and CAD-futures. I'm reader (docent) in Mordovia University now. I'm using Forth in my course of programming.

To: FIG
From: Jeff Fox
Ultra Technology
Dr. Montvelishsky is currently doing work for Ultra Technology. He has submitted several routines for the MuP21 and F21 microprocessors under development at Computer Cowboys by Charles Moore. Dr. Montvelishsky's CORDIC function executes 50 times faster on MuP21 than on a 387. Here at Ultra Technology Dr. Montvelishsky will be consulting on the design of parallel Forth extensions for the F21 Parallel Forth engine, robotics, scientific programming, and on using Forth to teach computer science.

Dr. Michael B. Montvelishsky

avenue Lenin 21 - 29

RUSSIA 430000 Saransk