(The code in this article has been tested under FPC 3.56 by Jeff Fox)
As a rule the arrival of a new programming language is caused by the arrival of some new programming method or paradigm. Thus Algol-60 had marked the appearance of the structured programming approach and Pascal the advanced user-defined data types. Object-oriented programming was born coupled with Smalltalk, but C++ has marked it's coming to the "professional league". New programming paradigms demand new programming language. But Forth users need not change languages. Of course Forth is extensible and can easily adopt new paradigms. The subject of this paper demonstrates how Forth can adapt a parallel programming paradigm.
One of the "HOT" areas of Programming Science is parallel computing. C.A.R. Hoar, the author of "Communication Sequental Processes"-book is the first theoretical founder of Parallel Computing. The programming language OCCAM is based on Hoar's theory. It is considered to be one of a main parallel programming languages. There are two main features of this interesting language:
Another popular parallel programming model is LINDA by D. Gelernter.
LINDA is not a programming language, but a parallelizing extension to
a conventional programming language. There is C-LINDA, FORTRAN-LINDA
and FORTH-LINDA. LINDA the paradigm is based on the idea of active
and passive tuples. Active tuples are the processes started by some
other process and running with it in co-operation.
Passive tuples are the tools for information exchange. The passive
tuples exchange information like notes on a bulletin board.
Paradigms of OCCAM and LINDA do not conflict with one another, but rather complement each other. It is hard to emulate LINDA in OCCAM and it is hard to emulate OCCAM in LINDA efficiently.
The subject of this paper is a FORTH Parallel Extension. What new has arrived with Parallel Forth? Very little.
MULTI Enable multiprocessing mode;
SINGLE Disable multiprocessing mode;
||( ... ) Coupled words to add a new parallel branch
(all words inside parenthesis) into the
current mounted (not running!) processes'
ring;
EV( ... ) Coupled words to add a new parallel branch
into the current running processes' ring. It's
like the active tuple in LINDA and the syntax
has come from FORTH-LINDA by Jeff Fox;
PAR Stop the current process and start a "son"
processes' ring mounted by the "parent"
process by means of a ||( ... ) -construction.
Wait until all "son" processes are done and then
run the "parent" process again.
PAUSE Deferred (for now!) word. In MULTI -mode -
switch to the next process in the current
ring. In SINGLE -mode - NOOP;
STOP Stop a process and remove it from the current
processes' ring;
STOP-ALL Stop all processes in the current ring and
resume the "parent" process. Very useful to
emulate OCCAM ALT -processes;
DATA-STACK-SIZE
RETURN-STACK-SIZE
USER-AREA-SIZE Variables to control USER -area's and stack's
sizes. It's because different processes have
different demands for stacks and the number of
user-variables;
BUFFER Generic word to define buffers. Use buffers for
the most efficient data exchange between processes
(without synchronisation);
>B ( n b -- ) Output a 16-bit value to the buffer b;
B> ( b -- n ) Input a 16-bit value from the buffer b;
CHAN Generic word to define channels. A channel is the
buffer of one-equal-length. A channel is useful
for the exchange of information with
synchronization.
[]CHAN Generic word to define channels' area.
>C ( n c -- ) Output a 16-bit value to the channel c;
C> ( c -- n ) Input word a 16-bit value from the channel c;
[]WORD Just like F-PC's ARRAY with the same aim.
Several notes, about the source code:1. All the user-variables of "son" that are defined in a program have the same value as its "parent's" ones before ||( ... ) or EV( ... ) -constructions. Thus it's possible to use user-variables to pass parameters from a "parent" to a "son" or to a co-operative "parent".
2. Make sure the USER-AREAs' and stacks' size for each process are big enough to run without hanging up. It's easy to write special words to fill these areas by some char and then to check how many of them were used.
3. If You've start to mount "son" processes' ring by ||( ... ) then you can't run co-operator by EV( ... ) until PAR starts this ring.
4. There is automatic memory allocation for ||(...) and EV(...) processes. But automatic memory deallocation is only for ||(...) -processes (and ALL processes inside it)! Can you guess why?
5. All processes start with empty stacks.
About the programs now. The source of the parallel FORTH extension is in Figure 1. I've used F-PC ver. 3.53. It's one of most complete FORTHs for the IBM-PC compatible and it's PUBLIC DOMAIN! The source doesn't contain CODE words, so you can transfer it easily to another FORTH platform with only minor changes. You can rewrite PAUSE in assembler to increase efficiency, but the best way is to implement Parallel Forth on some real multiprocessor hardware :-)
A parallel program example is shown in Figure 2. I've choosen a scalar vector multiplication as example of using the parallel wordset. The task is like the task X4 from chapter 4.3 of C.A.R.Hoare's "Communication Sequental Processes". I've included the equivalent OCCAM program code as comments. Sorry, I have no transputer so I have not checked out this OCCAM code, but I hope it's valid.
The second example program (Figure 3.) is less "multiprocessing" but more useful. It's an alarm clock. Type:
21 00 ALARM" It's time to go home!"
to start experimenting with Parallel FORTH!
\ ========================== Fig. 1 ============================
\ Dr. Michael B. Montvelishsky, Saransk Russia 1993
\ PARALLELISM WORDSET by Michael Montvelishsky, 25-Sep-93
ANEW PARALLEL
CR .FREE
\ ============================================================ \
\ It is just a simple troock :-) "TROOCK" is the Russian
\ To replace the pause hook :-) pronounce of the "TRICK"
\ PAUSE is CODE-word now.
' PAUSE \ OLD
DEFER PAUSE ' NOOP IS PAUSE
' PAUSE \ OLD NEW
2DUP SWAP - \ OLD NEW SHIFT
OVER 1+ +! \ OLD NEW ( ADJUSTED!)
TUCK \ NEW OLD NEW
HERE SWAP - \ NEW OLD SIZE
CMOVE
FORGET PAUSE
\ PAUSE is DEFERed NOW!
\ ============================================================ \
USER
VARIABLE (NEST) \ Point to the last mounted proc
VARIABLE (PARENT) \ Point to the parent proc
VARIABLE (NEXT) \ Point to the next proc in ring
VARIABLE RP \ To save
VARIABLE SP \ stacks' value
FORTH
VARIABLE RETURN-STACK-SIZE \ To change return stack size
VARIABLE DATA-STACK-SIZE \ -\\- data stack size
VARIABLE USER-AREA-SIZE \ -\\- user area size
\ GLOBAL memory allocation tool:
DP CONSTANT (HEAP)
: MALLOC (HEAP) +! ;
: HEAP (HEAP) @ ;
\ Initiate variables :
UP @ (PARENT) !
UP @ (NEXT) !
0 (NEST) !
64 USER-AREA-SIZE !
128 RETURN-STACK-SIZE !
64 DATA-STACK-SIZE !
: PROC-SIZE ( -- current-proc-size )
USER-AREA-SIZE @ RETURN-STACK-SIZE @ + DATA-STACK-SIZE @ +
;
: GO-NEST SP@ RP@ RP ! SP ! (NEST) @ UP ! RP @ RP! SP @ SP! ;
: (PAUSE) SP@ RP@ RP ! SP ! (NEXT) @ UP ! RP @ RP! SP @ SP! ;
\ Stop all current processes and go to parent process
: STOP-ALL (PARENT) @ (NEXT) ! (PAUSE) ;
\ Parallelism enable/disable
: MULTI ['] (PAUSE) IS PAUSE ;
: SINGLE ['] NOOP IS PAUSE ;
\ Stop itself
: STOP ( --- : Excludes proc from procs' ring )
UP @ DUP (NEXT) @ - IF \ Last in the ring?
UP @ BEGIN (NEXT) 2DUP @ - WHILE @ UP ! REPEAT
SWAP UP ! (NEXT) @ SWAP ! (PAUSE)
ELSE
STOP-ALL
THEN
;
\ Create NEW process with specified PFA PARRENT -process and
\ process to be PREVious in processes' ring
: MOUNT ( PFA PARRENT PREV --- : Create and mount new process)
HEAP PROC-SIZE MALLOC \ Allocate proc space
UP @ DUP >R OVER USER-AREA-SIZE @ CMOVE \ Copy parent's users
OVER UP ! (NEXT) @ SWAP UP ! (NEXT) ! \ Adjusts new (NEXT)
UP @ USER-AREA-SIZE @ +
RETURN-STACK-SIZE @ + \ Adjust for the NEW :
DUP DUP RP0 ! 4 - RP ! \ RP0 & RP
DATA-STACK-SIZE @ + DUP SP0 ! SP ! \ SP0 & SP
SWAP (PARENT) ! \ (PARENT)
(NEST) 0! \ (NEST)
-ROT RP @ 2! \ PFA to begin with
UP @ SWAP UP ! (NEXT) ! \ Point PREV's (NEXT)
R> UP ! \ to the NEW
;
\ Run-time words for mounting and evaluating new process
: (||()
2R@ 4 + UP @ (NEST) @ ?DUP 0=
IF HEAP DUP DP ! THEN HEAP (NEST) ! MOUNT
;
: (EV()
(NEST) @ 0= IF
2R@ 4 + (PARENT) @ (NEXT) @ MOUNT
THEN
;
: ||( COMPILE (||() COMPILE BRANCH ?>MARK ; IMMEDIATE
: EV( COMPILE (EV() COMPILE BRANCH ?>MARK ; IMMEDIATE
: ) COMPILE STOP ?>RESOLVE ; IMMEDIATE
: PAR (NEST) @ IF GO-NEST DP @ (HEAP) ! (NEST) 0! THEN ;
\ Use chans for information exchanging & SYNCHRONISATION
: CHAN 2VARIABLE ;
: C> ( CHAN -- W : Get word from the channel )
>R BEGIN R@ @ 0= WHILE (PAUSE) REPEAT R@ 2+ @ R> 0!
;
: >C ( W CHAN -- : put word to the channel )
>R BEGIN R@ @ WHILE (PAUSE) REPEAT R@ 2+ ! R> -1!
;
\ Use buffers for EFFICIENT information exchanging
\ BUFF-SIZE and MASK may be increased!
16 CONSTANT BUFF-SIZE 15 CONSTANT MASK
: BUFF CREATE 0 , BUFF-SIZE ALLOT ;
: .OUT 1+ ; ( IN +0 )
: .BUF 2+ ;
: B> ( BUFF -- W : Get word from the buffer )
>R BEGIN R@ C@ R@ .OUT C@ = WHILE (PAUSE) REPEAT
R@ .OUT C@ DUP R@ .BUF + @ SWAP 2+ MASK AND R> .OUT C!
;
: >B ( W BUFF -- : Put word to the buffer )
>R R@ C@ 2+ MASK AND
BEGIN DUP R@ .OUT C@ = WHILE (PAUSE) REPEAT
SWAP R@ C@ R@ .BUF + ! R> C!
;
: 4* 2* 2* ;
\ Generic words for word and channel array
: []CHAN
CREATE 4* HERE OVER ERASE ALLOT ?STACK
DOES> SWAP 4* +
;
\ IMHO []WORD is more useful than F-PC's ARRAY
: []WORD
CREATE 2* HERE OVER ERASE ALLOT ?STACK
DOES> SWAP 2* +
;
MULTI
CR .FREE
\ ================================================================
\ ========================== Fig. 2 ============================
\ Dr. Michael B. Montvelishsky, Saransk Russia 1993
fload figure1
30 CONSTANT SIZE \ DEF SIZE = 30 :
USER VARIABLE U.I FORTH \ VAR U.I ,
SIZE []WORD V1 \ V1[SIZE],
SIZE []WORD V2 \ V2[SIZE]:
SIZE 1+ []CHAN N \ CHAN N[SIZE+1],
SIZE []CHAN W \ W[SIZE] ,
SIZE []CHAN E \ E[SIZE] :
\ VAR T1,T2,T3,T4: \
: VECT-MULT \ PROC VECT.MULT =
\ SEQ
SIZE 0 DO \ U.I = [0 FOR SIZE]
I 1+ I V1 ! \ V1[U.I] := U.I + 1
I 1+ I V2 ! \ V2[U.I] := U.I + 1
LOOP \ PAR
SIZE 0 DO I U.I ! \ U.I = [0 FOR SIZE]
I U.I ! \ PAR
||( U.I @ V1 @ U.I @ W >C ) \ W[U.I] ! V1[U.I]
||( U.I @ V2 @ U.I @ E >C ) \ E[U.I] ! V2[U.I]
||( \ SEQ
U.I @ W C> \ W[U.I] ? T1
U.I @ E C> * \ E[U.I] ? T2
U.I @ N C> + \ N[U.I] ? T3
U.I @ 1+ N >C \ N[U.I+1] ! T1*T2+T3
) \
LOOP \
||( 0 0 N >C ) \ N[0] ! 0
||( SIZE N C> U. ) \ SEQ
\ N[SIZE] ? T4
\ WRITE(T4) :
PAR
;
\ ================================================================
\ ========================== Fig. 3 ============================
\ Dr. Michael B. Montvelishsky, Saransk Russia 1993
fload figure1
\ Alarm clock
ANEW ALARM
HIDDEN ALSO EDITOR ALSO
USER
VARIABLE ALARM-HM
VARIABLE ALARM-STRING
FORTH
: ALARM" ( HOURS MINUTES | " )
SWAP FLIP + 0 0. B>T DROP ALARM-HM ! \ Get time
HERE ," ALARM-STRING ! \ Get alarming string
EV(
ALARM-HM @
BEGIN \ Wait ...
1000 FOR PAUSE NEXT \ Skipping for efficiency
DUP GETTIME DROP U<=
UNTIL
DROP SINGLE \ Disable parallelism
SAVESCR SAVECURSOR \ Save screen & cursor
DKGRAY >BG WHITE >FG \ Select colours
TRUE ALARM-STRING @ COUNT ?SOFTERROR \ Alarm !!
RESTCURSOR RESTSCR MULTI \ Restore screen&cursor
) \ Enable parallelism
;
ONLY FORTH ALSO DEFINITIONS
\ ================================================================+
Dr. Michael B. Montvelishksy:I'm from Saransk (republic Mordovia capital, 600kM to the South-East from the Moscow). I've get my Electronics engineer diploma in 1982 in Electronics department of Mordovia State University. I've came to Leningrad (St.Petersbourg now) University in 1982 to get the postgraduate. I dealt with Robot software and got my Ph.D degree on this matter in 1989. I've get start with Forth in 1988. It's my favourite so far, but I write programs on C and Pascal sometimes. There are many of project I've done (alone or in couple with Max) on Forth. Latest and greatest from its is the embedded system for CNC of Electroerosion machinetool with High Level Geometric (Forth-based) language and CAD-futures. I'm reader (docent) in Mordovia University now. I'm using Forth in my course of programming.
To: FIG
From: Jeff Fox
Ultra Technology
Dr. Montvelishsky is currently doing work for Ultra Technology. He has
submitted several routines for the MuP21 and F21 microprocessors under
development at Computer Cowboys by Charles Moore. Dr. Montvelishsky's
CORDIC function executes 50 times faster on MuP21 than on a 387. Here at
Ultra Technology Dr. Montvelishsky will be consulting on the design of
parallel Forth extensions for the F21 Parallel Forth engine, robotics,
scientific programming, and on using Forth to teach computer science.
Dr. Michael B. Montvelishsky
avenue Lenin 21 - 29
RUSSIA 430000 Saransk