Go to the first, previous, next, last section, table of contents.

The Implementation

Data Types

In the descriptions below it is assumed that long ints are 32 bits in length. Acutally, SCM is written to work with any long int size larger than 31 bits. With some modification, SCM could work with word sizes as small as 24 bits.

All SCM objects are represented by type SCM. Type SCM come in 2 basic flavors, Immediates and Cells:

Immediates

An immediate is a data type contained in type SCM (long int). The type codes distinguishing immediate types from each other vary in length, but reside in the low order bits.

Macro: IMP x
Macro: NIMP x: Return non-zero if the SCM object x is an immediate or non-immediate type, respectively.

Immediate: inum

immediate 30 bit signed integer. An INUM is flagged by a 1 in the second to low order bit position. The high order 30 bits are used for the integer's value.

Macro: INUMP x
Macro: NINUMP x: Return non-zero if the SCM x is an immediate integer or not an immediate integer, respectively.

Macro: INUM x: Returns the C long integer corresponding to SCM x.

Macro: MAKINUM x: Returns the SCM inum corresponding to C long integer x.

Immediate Constant: INUM0: is equivalent to MAKINUM(0).

Computations on INUMs are performed by converting the arguments to C integers (by a shift), operating on the integers, and converting the result to an inum. The result is checked for overflow by converting back to integer and checking the reverse operation.

The shifts used for conversion need to be signed shifts. If the C implementation does not support signed right shift this fact is detected in a #if statement in `scmfig.h' and a signed right shift, SRS, is constructed in terms of unsigned right shift.

Immediate: ichr

characters.

Macro: ICHRP x: Return non-zero if the SCM object x is a character.

Macro: ICHR x: Returns corresponding unsigned char.

Macro: MAKICHR x: Given char x, returns SCM character.

Immediate: iflags

These are frequently used immediate constants.

Immediate Constant: SCM BOOL_T: #t

Immediate Constant: SCM BOOL_F: #f

Immediate Constant: SCM EOL: (). If SICP is #defined, EOL is #defined to be identical with BOOL_F. In this case, both print as #f.

Immediate Constant: SCM EOF_VAL: end of file token, #<eof>.

Immediate Constant: SCM UNDEFINED: #<undefined> used for variables which have not been defined and absent optional arguments.

Immediate Constant: SCM UNSPECIFIED: #<unspecified> is returned for those procedures whose return values are not specified.

Macro: IFLAGP n: Returns non-zero if n is an ispcsym, isym or iflag.

Macro: ISYMP n: Returns non-zero if n is an ispcsym or isym.

Macro: ISYMNUM n: Given ispcsym, isym, or iflag n, returns its index in the C array isymnames[].

Macro: ISYMCHARS n: Given ispcsym, isym, or iflag n, returns its char * representation (from isymnames[]).

Macro: MAKSPCSYM n: Returns SCM ispcsym n.

Macro: MAKISYM n: Returns SCM iisym n.

Macro: MAKIFLAG n: Returns SCM iflag n.

Variable: isymnames: An array of strings containing the external representations of all the ispcsym, isym, and iflag immediates. Defined in `repl.c'.

Constant: NUM_ISPCSYM
Constant: NUM_ISYMS: The number of ispcsyms and ispcsyms+isyms, respectively. Defined in `scm.h'.

Immediate: isym

and, begin, case, cond, define, do, if, lambda, let, let*, letrec, or, quote, set!, #f, #t, #<undefined>, #<eof>, (), and #<unspecified>.

CAR Immediate: ispcsym

special symbols: syntax-checked versions of first 14 isyms

CAR Immediate: iloc: indexes to a variable's location in environment

CAR Immediate: gloc: pointer to a symbol's value cell

Immediate: CELLPTR

pointer to a cell (not really an immediate type, but here for completeness). Since cells are always 8 byte aligned, a pointer to a cell has the low order 3 bits 0.

There is one exception to this rule, CAR Immediates, described next.

A CAR Immediate is an Immediate point which can only occur in the CARs of evaluated code (as a result of ceval's memoization process).

Cells

Cells represent all SCM objects other than immediates. A cell has a CAR and a CDR. Low-order bits in CAR identify the type of object. The rest of CAR and CDR hold object data. The number after tc specifies how many bits are in the type code. For instance, tc7 indicates that the type code is 7 bits.

Macro: NEWCELL x

Allocates a new cell and stores a pointer to it in SCM local variable x.

Care needs to be taken that stores into the new cell pointed to by x do not create an inconsistent object. See section Signals.

All of the C macros decribed in this section assume that their argument is of type SCM and points to a cell (CELLPTR).

Macro: CAR x
Macro: CDR x: Returns the car and cdr of cell x, respectively.

Macro: TYP3 x
Macro: TYP7 x
Macro: TYP16 x: Returns the 3, 7, and 16 bit type code of a cell.

Cell: tc3_cons

scheme cons-cell returned by (cons arg1 arg2).

Macro: CONSP x
Macro: NCONSP x: Returns non-zero if x is a tc3_cons or isn't, respectively.

Cell: tc3_closure

applicable object returned by (lambda (args) ...). tc3_closures have a pointer to the body of the procedure in the CAR and a pointer to the environment in the CDR. Bits 1 and 2 (zero-based) in the CDR indicate a lower bound on the number of required arguments to the closure, which is used to avoid allocating rest argument lists in the environment cache. This encoding precludes an immediate value for the CDR: In the case of an empty environment all bits above 2 in the CDR are zero.

Macro: CLOSUREP x: Returns non-zero if x is a tc3_closure.

Macro: CODE x
Macro: ENV x: Returns the code body or environment of closure x, respectively.

Macro: ARGC x: Returns the a lower bound on the number of required arguments to closure x, it cannot exceed 3.

Header Cells

Headers are Cells whose CDRs point elsewhere in memory, such as to memory allocated by malloc.

Header: spare: spare tc7 type code

Header: tc7_vector

scheme vector.

Macro: VECTORP x
Macro: NVECTORP x: Returns non-zero if x is a tc7_vector or if not, respectively.

Macro: VELTS x
Macro: LENGTH x: Returns the C array of SCMs holding the elements of vector x or its length, respectively.

Header: tc7_ssymbol

static scheme symbol (part of initial system)

Header: tc7_msymbol

malloced scheme symbol (can be GCed)

Macro: SYMBOLP x: Returns non-zero if x is a tc7_ssymbol or tc7_msymbol.

Macro: CHARS x
Macro: UCHARS x
Macro: LENGTH x: Returns the C array of chars or as unsigned chars holding the elements of symbol x or its length, respectively.

Header: tc7_string

scheme string

Macro: STRINGP x
Macro: NSTRINGP x: Returns non-zero if x is a tc7_string or isn't, respectively.

Macro: CHARS x
Macro: UCHARS x
Macro: LENGTH x: Returns the C array of chars or as unsigned chars holding the elements of string x or its length, respectively.

Header: tc7_bvect: uniform vector of booleans (bit-vector)

Header: tc7_ivect: uniform vector of integers

Header: tc7_uvect: uniform vector of non-negative integers

Header: tc7_fvect: uniform vector of short inexact real numbers

Header: tc7_dvect: uniform vector of double precision inexact real numbers

Header: tc7_cvect: uniform vector of double precision inexact complex numbers

Header: tc7_contin: applicable object produced by call-with-current-continuation

Header: tc7_cclo

Subr and environment for compiled closure

A cclo is similar to a vector (and is GCed like one), but can be applied as a function:

the cclo itself is consed onto the head of the argument list
the first element of the cclo is applied to that list. Cclo invocation is currently not tail recursive when given 2 or more arguments.

Function: makcclo proc len: makes a closure from the subr proc with len-1 extra locations for SCM data. Elements of a cclo are referenced using VELTS(cclo)[n] just as for vectors.

Subr Cells

A Subr is a header whose CDR points to a C code procedure. Scheme primitive procedures are subrs. Except for the arithmetic tc7_cxrs, the C code procedures will be passed arguments (and return results) of type SCM.

Subr: tc7_asubr: associative C function of 2 arguments. Examples are +, -, *, /, max, and min.

Subr: tc7_subr_0: C function of no arguments.

Subr: tc7_subr_1: C function of one argument.

Subr: tc7_cxr

These subrs are handled specially. If inexact numbers are enabled, the CDR should be a function which takes and returns type double. Conversions are handled in the interpreter.

floor, ceiling, truncate, round, $sqrt, $abs, $exp, $log, $sin, $cos, $tan, $asin, $acos, $atan, $sinh, $cosh, $tanh, $asinh, $acosh, $atanh, and exact->inexact are defined this way.

If the CDR is 0 (NULL), the name string of the procedure is used to control traversal of its list structure argument.

car, cdr, caar, cadr, cdar, cddr, caaar, caadr, cadar, caddr, cdaar, cdadr, cddar, cdddr, caaaar, caaadr, caadar, caaddr, cadaar, cadadr, caddar, cadddr, cdaaar, cdaadr, cdadar, cdaddr, cddaar, cddadr, cdddar, and cddddr are defined this way.

Subr: tc7_subr_3: C function of 3 arguments.

Subr: tc7_subr_2: C function of 2 arguments.

Subr: tc7_rpsubr: transitive relational predicate C function of 2 arguments. The C function should return either BOOL_T or BOOL_F.

Subr: tc7_subr_1o: C function of one optional argument. If the optional argument is not present, UNDEFINED is passed in its place.

Subr: tc7_subr_2o: C function of 1 required and 1 optional argument. If the optional argument is not present, UNDEFINED is passed in its place.

Subr: tc7_lsubr_2: C function of 2 arguments and a list of (rest of) SCM arguments.

Subr: tc7_lsubr: C function of list of SCM arguments.

Ptob Cells

A ptob is a port object, capable of delivering or accepting characters. See section `Ports' in Revised(4) Report on the Algorithmic Language Scheme. Unlike the types described so far, new varieties of ptobs can be defined dynamically (see section Defining Ptobs). These are the initial ptobs:

ptob: tc16_inport: input port.

ptob: tc16_outport: output port.

ptob: tc16_ioport: input-output port.

ptob: tc16_inpipe: input pipe created by popen().

ptob: tc16_outpipe: output pipe created by popen().

ptob: tc16_strport: String port created by cwos() or cwis().

ptob: tc16_sfport: Software (virtual) port created by mksfpt() (see section Soft Ports).

Macro: PORTP x
Macro: OPPORTP x
Macro: OPINPORTP x
Macro: OPOUTPORTP x
Macro: INPORTP x
Macro: OUTPORTP x: Returns non-zero if x is a port, open port, open input-port, open output-port, input-port, or output-port, respectively.

Macro: OPENP x
Macro: CLOSEDP x: Returns non-zero if port x is open or closed, respectively.

Macro: STREAM x: Returns the FILE * stream for port x.

Ports which are particularly well behaved are called fports. Advanced operations like file-position and reopen-file only work for fports.

Macro: FPORTP x
Macro: OPFPORTP x
Macro: OPINFPORTP x
Macro: OPOUTFPORTP x: Returns non-zero if x is a port, open port, open input-port, or open output-port, respectively.

Smob Cells

A smob is a miscellaneous datatype. The type code and GCMARK bit occupy the lower order 16 bits of the CAR half of the cell. The rest of the CAR can be used for sub-type or other information. The CDR contains data of size long and is often a pointer to allocated memory.

Like ptobs, new varieties of smobs can be defined dynamically (see section Defining Smobs). These are the initial smobs:

smob: tc_free_cell: unused cell on the freelist.

smob: tc16_flo

single-precision float.

Inexact number data types are subtypes of type tc16_flo. If the sub-type is:

a single precision float is contained in the CDR.
CDR is a pointer to a malloced double.

CDR is a pointer to a malloced pair of doubles.

smob: tc_dblr: double-precision float.

smob: tc_dblc: double-precision complex.

smob: tc16_bigpos

smob: tc16_bigneg

positive and negative bignums, respectively.

Scm has large precision integers called bignums. They are stored in sign-magnitude form with the sign occuring in the type code of the SMOBs bigpos and bigneg. The magnitude is stored as a malloced array of type BIGDIG which must be an unsigned integral type with size smaller than long. BIGRAD is the radix associated with BIGDIG.

smob: tc16_promise: made by DELAY. See section `Control features' in Revised(4) Scheme.

smob: tc16_arbiter: synchronization object. See section Process Synchronization.

smob: tc16_macro: macro expanding function. See section Low Level Syntactic Hooks.

smob: tc16_array

multi-dimensional array. See section Arrays.

This type implements both conventional arrays (those with arbitrary data as elements see section Conventional Arrays) and uniform arrays (those with elements of a uniform type see section Uniform Array).

Conventional Arrays have a pointer to a vector for their CDR. Uniform Arrays have a pointer to a Uniform Vector type (string, bvect, ivect, uvect, fvect, dvect, or cvect) in their CDR.

Data Type Representations

IMMEDIATE:      B,D,E,F=data bit, C=flag code, P=pointer address bit
        ................................
inum    BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB10
ichr    BBBBBBBBBBBBBBBBBBBBBBBB11110100
iflag                   CCCCCCC101110100
isym                    CCCCCCC001110100
        IMCAR:  only in car of evaluated code, cdr has cell's GC bit
ispcsym                 000CCCC00CCCC100
iloc    0DDDDDDDDDDDEFFFFFFFFFFF11111100
pointer PPPPPPPPPPPPPPPPPPPPPPPPPPPPP000
gloc    PPPPPPPPPPPPPPPPPPPPPPPPPPPPP001

   HEAP CELL:   G=gc_mark; 1 during mark, 0 other times.
        1s and 0s here indicate type.     G missing means sys (not GC'd)
        SIMPLE:
cons    ..........SCM car..............0  ...........SCM cdr.............G
closure ..........SCM code...........011  ...........SCM env...........CCG
        HEADERs:
ssymbol .........long length....G0000101  ..........char *chars...........
msymbol .........long length....G0000111  ..........char *chars...........
string  .........long length....G0001101  ..........char *chars...........
vector  .........long length....G0001111  ...........SCM **elts...........
bvect   .........long length....G0010101  ..........long *words...........
 spare                          G0010111
ivect   .........long length....G0011101  ..........long *words...........
uvect   .........long length....G0011111  ......unsigned long *words......
 spare                          G0100101
 spare                          G0100111
fvect   .........long length....G0101101  .........float *words...........
dvect   .........long length....G0101111  ........double *words...........
cvect   .........long length....G0110101  ........double *words...........

contin  .........long length....G0111101  .............*regs..............
cclo    .........long length....G0111111  ...........SCM **elts...........
        SUBRs:
 spare                          010001x1
 spare                          010011x1
subr_0  ..........int hpoff.....01010101  ...........SCM (*f)()...........
subr_1  ..........int hpoff.....01010111  ...........SCM (*f)()...........
cxr     ..........int hpoff.....01011101  .........double (*f)()..........
subr_3  ..........int hpoff.....01011111  ...........SCM (*f)()...........
subr_2  ..........int hpoff.....01100101  ...........SCM (*f)()...........
asubr   ..........int hpoff.....01100111  ...........SCM (*f)()...........
subr_1o ..........int hpoff.....01101101  ...........SCM (*f)()...........
subr_2o ..........int hpoff.....01101111  ...........SCM (*f)()...........
lsubr_2 ..........int hpoff.....01110101  ...........SCM (*f)()...........
rpsubr  ..........int hpoff.....01111101  ...........SCM (*f)()...........
                        PTOBs:
   port            0bwroxxxxxxxxG1110111  ..........FILE *stream..........
 socket ttttttt    00001xxxxxxxxG1110111  ..........FILE *stream..........
 inport uuuuuuuuuuU00011xxxxxxxxG1110111  ..........FILE *stream..........
outport 0000000000000101xxxxxxxxG1110111  ..........FILE *stream..........
 ioport uuuuuuuuuuU00111xxxxxxxxG1110111  ..........FILE *stream..........
fport              00   00000000G1110111  ..........FILE *stream..........
pipe               00   00000001G1110111  ..........FILE *stream..........
strport            00   00000010G1110111  ..........FILE *stream..........
sfport             00   00000011G1110111  ..........FILE *stream..........
                        SMOBs:
free_cell
        000000000000000000000000G1111111  ...........*free_cell........000
flo     000000000000000000000001G1111111  ...........float num............
dblr    000000000000000100000001G1111111  ..........double *real..........
dblc    000000000000001100000001G1111111  .........complex *cmpx..........
bignum  ...int length...0000001 G1111111  .........short *digits..........
bigpos  ...int length...00000010G1111111  .........short *digits..........
bigneg  ...int length...00000011G1111111  .........short *digits..........
                        xxxxxxxx = code assigned by newsmob();
promise 000000000000000fxxxxxxxxG1111111  ...........SCM val..............
arbiter 000000000000000lxxxxxxxxG1111111  ...........SCM name.............
macro   000000000000000mxxxxxxxxG1111111  ...........SCM name.............
array   ...short rank..cxxxxxxxxG1111111  ............*array..............

Operations

Garbage Collection

The garbage collector is in the latter half of `sys.c'. The primary goal of garbage collection (or GC) is to recycle those cells no longer in use. Immediates always appear as parts of other objects, so they are not subject to explicit garbage collection.

All cells reside in the heap (composed of heap segments). Note that this is different from what Computer Science usually defines as a heap.

Marking Cells

The first step in garbage collection is to mark all heap objects in use. Each heap cell has a bit reserved for this purpose. For pairs (cons cells) the lowest order bit (0) of the CDR is used. For other types, bit 8 of the CAR is used. The GC bits are never set except during garbage collection. Special C macros are defined in `scm.h' to allow easy manipulation when GC bits are possibly set. CAR, TYP3, and TYP7 can be used on GC marked cells as they are.

Macro: GCCDR x: Returns the CDR of a cons cell, even if that cell has been GC marked.

Macro: GCTYP16 x: Returns the 16 bit type code of a cell.

We need to (recursively) mark only a few objects in order to assure that all accessible objects are marked. Those objects are sys_protects[] (for example, dynwinds), the current C-stack and the hash table for symbols, symhash.

Function: void gc_mark (SCM obj): The function gc_mark() is used for marking SCM cells. If obj is marked, gc_mark() returns. If obj is unmarked, gc_mark sets the mark bit in obj, then calls gc_mark() on any SCM components of obj. The last call to gc_mark() is tail-called (looped).

Function: void mark_locations (STACKITEM x[], sizet len))

The function mark_locations is used for marking segments of C-stack or saved segments of C-stack (marked continuations). The argument len is the size of the stack in units of size (STACKITEM).

Each longword in the stack is tried to see if it is a valid cell pointer into the heap. If it is, the object itself and any objects it points to are marked using gc_mark. If the stack is word rather than longword aligned (#define WORD_ALIGN), both alignments are tried. This arrangement will occasionally mark an object which is no longer used. This has not been a problem in practice and the advantage of using the c-stack far outweighs it.

Sweeping the Heap

After all found objects have been marked, the heap is swept.

The storage for strings, vectors, continuations, doubles, complexes, and bignums is managed by malloc. There is only one pointer to each malloc object from its type-header cell in the heap. This allows malloc objects to be freed when the associated heap object is garbage collected.

Function: static void gc_sweep ()

The function gc_sweep scans through all heap segments. The mark bit is cleared from marked cells. Unmarked cells are spliced into freelist, where they can again be returned by invocations of NEWCELL.

If a type-header cell pointing to malloc space is unmarked, the malloc object is freed. If the type header of smob is collected, the smob's free procedure is called to free its storage.

Memory Management for Environments

Ecache was designed and implemented by Radey Shouman.
This documentation of ecache was written by Tom Lord.

The memory management component of SCM contains special features which optimize the allocation and garbage collection of environments.

The optimizations are based on certain facts and assumptions:

The SCM evaluator creates many environments with short lifetimes and these account of a large portion of the total number of objects allocated.

The general purpose allocator allocates objects from a freelist, and collects using a mark/sweep algorithm. Research into garbage collection suggests that such an allocator is sub-optimal for object populations containing a large portion of short-lived members and that allocation strategies involving a copying collector are more appropriate.

It is a property of SCM, reflected throughout the source code, that a simple copying collector can not be used as the general purpose memory manager: much code assumes that the run-time stack can be treated as a garbage collection root set using conservative garbage collection techniques, which are incompatible with objects that change location.

Nevertheless, it is possible to use a mostly-separate copying-collector, just for environments. Roughly speaking, cons pairs making up environments are initially allocated from a small heap that is collected by a precise copying collector. These objects must be handled specially for the collector to work. The (presumably) small number of these objects that survive one collection of the copying heap are copied to the general purpose heap, where they will later be collected by the mark/sweep collector. The remaining pairs are more rapidly collected than they would otherwise be and all of this collection is accomplished without having to mark or sweep any other segment of the heap.

Allocating cons pairs for environments from this special heap is a heuristic that approximates the (unachievable) goal:

allocate all short-lived objects from the copying-heap, at no extra cost in allocation time.

Implementation Details

A separate heap (ecache_v) is maintained for the copying collector. Pairs are allocated from this heap in a stack-like fashion. Objects in this heap may be protected from garbage collection by:

Pushing a reference to the object on a stack specially maintained for that purpose. This stack (scm_estk) is used in place of the C run-time stack by the SCM evaluator to hold local variables which refer to the copying heap.
Saving a reference to every object in the mark/sweep heap which directly references the copying heap in a root set that is specially maintained for that purpose (scm_egc_roots). If no object in the mark/sweep heap directly references an object from the copying heap, that object can be preserved by storing a direct reference to it in the copying-collector root set.
Keeping no other references to these objects, except references between the objects themselves, during copying collection.

When the copying heap or root-set becomes full, the copying collector is invoked. All protected objects are copied to the mark-sweep heap. All references to those objects are updated. The copying collector root-set and heap are emptied.

References to pairs allocated specificly for environments are inaccessible to the Scheme procedures evaluated by SCM. These pairs are manipulated by only a small number of code fragments in the interpreter. To support copying collection, those code fragments (mostly in `eval.c') have been modified to protect environments from garbage collection using the three rules listed above.

During a mark-sweep collection, the copying collector heap is marked and swept almost like any ordinary segment of the general purpose heap. The only difference is that pairs from the copying heap that become free during a sweep phase are not added to the freelist.

The environment cache is disabled by adding #define NO_ENV_CACHE to `eval.c'; all environment cells are then allocated from the regular heap.

Relation to Other Work

This work seems to build upon a considerable amount of previous work into garbage collection techniques about which a considerable amount of literature is available.

Signals

Function: init_signals: (in `scm.c') initializes handlers for SIGINT and SIGALRM if they are supported by the C implementation. All of the signal handlers immediately reestablish themselves by a call to signal().

Function: int_signal sig
Function: alrm_signal sig: The low level handlers for SIGINT and SIGALRM.

If an interrupt handler is defined when the interrupt is received, the code is interpreted. If the code returns, execution resumes from where the interrupt happened. Call-with-current-continuation allows the stack to be saved and restored.

SCM does not use any signal masking system calls. These are not a portable feature. However, code can run uninterrupted by use of the C macros DEFER_INTS and ALLOW_INTS.

Macro: DEFER_INTS

sets the global variable ints_disabled to 1. If an interrupt occurs during a time when ints_disabled is 1, then deferred_proc is set to non-zero, one of the global variables SIGINT_deferred or SIGALRM_deferred is set to 1, and the handler returns.

Macro: ALLOW_INTS

Checks the deferred variables and if set the appropriate handler is called.

Calls to DEFER_INTS can not be nested. An ALLOW_INTS must happen before another DEFER_INTS can be done. In order to check that this constraint is satisfied #define CAREFUL_INTS in `scmfig.h'.

C Macros

Macro: ASSERT cond arg pos subr

signals an error if the expression (cond) is 0. arg is the offending object, subr is the string naming the subr, and pos indicates the position or type of error. pos can be one of

ARGn (> 5 or unknown ARG number)
ARG1
ARG2
ARG3
ARG4
ARG5
WNA (wrong number of args)
OVFLOW
OUTOFRANGE
NALLOC
EXIT
HUP_SIGNAL
INT_SIGNAL
FPE_SIGNAL
BUS_SIGNAL
SEGV_SIGNAL
ALRM_SIGNAL
a C string (char *)

Error checking is not done by ASSERT if the flag RECKLESS is defined. An error condition can still be signaled in this case with a call to wta(arg, pos, subr).

Macro: ASRTGO cond label: goto label if the expression (cond) is 0. Like ASSERT, ASRTGO does is not active if the flag RECKLESS is defined.

Changing Scm

When writing C-code for SCM, a precaution is recommended. If your routine allocates a non-cons cell which will not be incorporated into a SCM object which is returned, you need to make sure that a SCM variable in your routine points to that cell as long as part of it might be referenced by your code.

In order to make sure this SCM variable does not get optimized out you can put this assignment after its last possible use:

SCM_dummy1 = foo;

or put this assignment somewhere in your routine:

SCM_dummy1 = (SCM) &foo;

SCM_dummy variables are not currently defined. Passing the address of the local SCM variable to any procedure also protects it. The procedure scm_protect_temp is provided for this purpose.

Also, if you maintain a static pointer to some (non-immediate) SCM object, you must either make your pointer be the value cell of a symbol (see errobj for an example) or make your pointer be one of the sys_protects (see dynwinds for an example). The former method is prefered since it does not require any changes to the SCM distribution.

To add a C routine to scm:

choose the appropriate subr type from the type list.
write the code and put into `scm.c'.
add a make_subr or make_gsubr call to init_scm. Or put an entry into the appropriate iproc structure.

To add a package of new procedures to scm (see `crs.c' for example):

create a new C file (`foo.c').

at the front of `foo.c' put declarations for strings for your procedure names.

static char s_twiddle_bits[]="twiddle-bits!";
static char s_bitsp[]="bits?";

choose the appropriate subr types from the type list in `code.doc'.
write the code for the procedures and put into `foo.c'

create one iproc structure for each subr type used in `foo.c'

static iproc subr3s[]= {
        {s_twiddle-bits,twiddle-bits},
        {s_bitsp,bitsp},
        {0,0} };

create an init_<name of file> routine at the end of the file which calls init_iprocs with the correct type for each of the iprocs created in step 5.
```
void init_foo()
{
  init_iprocs(subr1s, tc7_subr_1);
  init_iprocs(subr3s, tc7_subr_3);
}
```
If your package needs to have a finalization routine called to free up storage, close files, etc, then also have a line in init_foo like:
```
add_final(final_foo);
```
final_foo should be a (void) procedure of no arguments. The finals will be called in opposite order from their definition. The line:
```
add_feature("foo");
```
will append a symbol 'foo to the (list) value of *features*.
put any scheme code which needs to be run as part of your package into `Ifoo.scm'.
put an if into `Init5c4.scm' which loads `Ifoo.scm' if your package is included:
```
(if (defined? twiddle-bits!)
    (load (in-vicinity (implementation-vicinity)
                       "Ifoo"
                       (scheme-file-suffix))))
```
or use (provided? 'foo) instead of (defined? twiddle-bits!) if you have added the feature.
put documentation of the new procedures into `foo.doc'
add lines to your `Makefile' to compile and link SCM with your object file. Add a init_foo\; to the INITS=... line at the beginning of the makefile.

These steps should allow your package to be linked into SCM with a minimum of difficulty. Your package should also work with dynamic linking if your SCM has this capability.

Special forms (new syntax) can be added to scm.

define a new MAKISYM in `scm.h' and increment NUM_ISYMS.
add a string with the new name in the corresponding place in isymnames in `repl.c'.
add case: clause to ceval() near i_quasiquote (in `eval.c').

New syntax can now be added without recompiling SCM by the use of the procedure->syntax, procedure->macro, procedure->memoizing-macro, and defmacro. For details, See section Syntax Extensions.

Defining Subrs

If CCLO is #defined when compiling, the compiled closure feature will be enabled. It is automatically enabled if dynamic linking is enabled.

The SCM interpreter directly recognizes subrs taking small numbers of arguments. In order to create subrs taking larger numbers of arguments use:

Function: make_gsubr name req opt rest fcn

returns a cclo (compiled closure) object of name char * name which takes int req required arguments, int opt optional arguments, and a list of rest arguments if int rest is 1 (0 for not).

SCM (*fcn)() is a pointer to a C function to do the work.

The C function will always be called with req + opt + rest arguments, optional arguments not supplied will be passed UNDEFINED. An error will be signaled if the subr is called with too many or too few arguments. Currently a total of 10 arguments may be specified, but increasing this limit should not be difficult.

/* A silly example, taking 2 required args,
   1 optional, and a list of rest args */

#include <scm.h>

SCM gsubr_21l(req1,req2,opt,rst)
     SCM req1,req2,opt,rst;
{
  lputs("gsubr-2-1-l:\n req1: ", cur_outp);
  display(req1,cur_outp);
  lputs("\n req2: ", cur_outp);
  display(req2,cur_outp);
  lputs("\n opt: ", cur_outp);
  display(opt,cur_outp);
  lputs("\n rest: ", cur_outp);
  display(rst,cur_outp);
  newline(cur_outp);
  return UNSPECIFIED;
}

void init_gsubr211()
{
  make_gsubr("gsubr-2-1-l", 2, 1, 1, gsubr_21l);
}

Defining Smobs

Here is an example of how to add a new type named foo to SCM. The following lines need to be added to your code:

long tc16_foo;

The type code which will be used to identify the new type.

static smobfuns foosmob = {markfoo,freefoo,printfoo,equalpfoo};

smobfuns is a structure composed of 4 functions:

typedef struct {
  SCM   (*mark)P((SCM));
  sizet (*free)P((CELLPTR));
  int   (*print)P((SCM exp, SCM port, int writing));
  SCM   (*equalp)P((SCM, SCM));
} smobfuns;

smob.mark

is a function of one argument of type SCM (the cell to mark) and returns type SCM which will then be marked. If no further objects need to be marked then return an immediate object such as BOOL_F. 2 functions are provided:

markcdr(ptr): which marks the current cell and returns CDR(ptr).
mark0(ptr): which marks the current cell and returns BOOL_F.

smob.free

is a function of one argument of type CELLPTR (the cell to collected) and returns type sizet which is the number of malloced bytes which were freed. Smob.free should free any malloced storage associated with this object. The function free0(ptr) is provided which does not free any storage and returns 0.

smob.print

is 0 or a function of 3 arguments. The first, of type SCM, is the smob object. The second, of type SCM, is the stream on which to write the result. The third, of type int, is 1 if the object should be writen, 0 if it should be displayed. This function should return non-zero if it printed, and zero otherwise (in which case a hexadecimal number will be printed).

smob.equalp

is 0 or a function of 2 SCM arguments. Both of these arguments will be of type tc16foo. This function should return BOOL_T if the smobs are equal, BOOL_F if they are not. If smob.equalp is 0, equal? will return BOOL_F if they are not eq?.

tc16_foo = newsmob(&foosmob);

Allocates the new type with the functions from foosmob. This line goes in an init_ routine.

Promises and macros in `eval.c' and arbiters in `repl.c' provide examples of SMOBs. There are a maximum of 256 SMOBs. Smobs that must allocate blocks of memory should use, for example, must_malloc rather than malloc See section Allocating memory.

Defining Ptobs

ptobs are similar to smobs but define new types of port to which SCM procedures can read or write. The following functions are defined in the ptobfuns:

typedef struct {
  SCM   (*mark)P((SCM ptr));
  int   (*free)P((FILE *p));
  int   (*print)P((SCM exp, SCM port, int writing));
  SCM   (*equalp)P((SCM, SCM));
  int   (*fputc)P((int c, FILE *p));
  int   (*fputs)P((char *s, FILE *p));
  sizet (*fwrite)P((char *s, sizet siz, sizet num, FILE *p));
  int   (*fflush)P((FILE *stream));
  int   (*fgetc)P((FILE *p));
  int   (*fclose)P((FILE *p));
} ptobfuns;

The .free component to the structure takes a FILE * or other C construct as its argument, unlike .free in a smob, which takes the whole smob cell. Often, .free and .fclose can be the same function. See fptob and pipob in `sys.c' for examples of how to define ptobs. Ptobs that must allocate blocks of memory should use, for example, must_malloc rather than malloc See section Allocating memory.

Allocating memory

SCM maintains a count of bytes allocated using malloc, and calls the garbage collector when that number exceeds a dynamically managed limit. In order for this to work properly, malloc and free should not be called directly to manage memory freeable by garbage collection. The following functions are provided for that purpose:

Function: SCM must_malloc_cell (long len, SCM c, char *what)
Function: char *must_malloc (long len, char *what): len is the number of bytes that should be allocated, what is a string to be used in error or gc messages. must_malloc returns a pointer to newly allocated memory. must_malloc_cell returns a newly allocated cell whose car is c and whose cdr is a pointer to newly allocated memory.

Function: void must_realloc_cell (SCM z, long olen, long len, char *what)

Function: char *must_realloc (char *where, long olen, long len, char *what)

must_realloc_cell takes as argument z a cell whose cdr should be a pointer to a block of memory of length olen allocated with must_malloc_cell and modifies the cdr to point to a block of memory of length len. must_realloc takes as argument where the address of a block of memory of length olen allocated by must_malloc and returns the address of a block of length len.

The contents of the reallocated block will be unchanged up the the minimum of the old and new sizes.

what is a pointer to a string used for error and gc messages.

must_malloc, must_malloc_cell, must_realloc, and must_realloc_cell must be called with interrupts deferred See section Signals.

Function: void must_free (char *ptr, sizet len): must_free is used to free a block of memory allocated by the above functions and pointed to by ptr. len is the length of the block in bytes, but this value is used only for debugging purposes. If it is difficult or expensive to calculate then zero may be used instead.

Calling Scheme From C

To use SCM as a whole from another program call init_scm or run_scm as is done in main() in `scm.c'.

In order to call indivdual Scheme procedures from C code more is required; SCM's storage system needs to be initialized. The simplest way to do this for a statically linked single-thread program is to:

make a SCM procedure which calls your code's startup routine.
use the #define RTL flag when compiling `scm.c' to elide SCM's main().
In your main(), call run_scm with arguments (argc and argv) to invoke your code's startup routine.
link your code with SCM at compile time.

For a dynamically linked single-thread program:

make an init_ procedure for your code which will set up any Scheme definitions you need and then call your startup routine (see section Changing Scm).
Start SCM with command line arguments to dynamically link your code. After your module is linked, the init_ procedure will be called, and hence your startup routine.

Now use apply (and perhaps intern) to call Scheme procedures from your C code. For example:

/* If this apply fails, SCM will catch the error */
apply(CDR(intern("srv:startup",sizeof("srv:startup")-1)),
      mksproc(srvproc),
      listofnull);

func = CDR(intern(rpcname,strlen(rpcname)));
retval = apply(func, cons(mksproc(srvproc), args), EOL);

Callbacks

SCM now has routines to make calling back to Scheme procedures easier. The source code for these routines are found in `rope.c'.

Function: int scm_ldfile (char *file): Loads the Scheme source file file. Returns 0 if successful, non-0 if not. This function is used to load SCM's initialization file `Init5c4.scm'.

Function: int scm_ldprog (char *file)

Loads the Scheme source file

(in-vicinity (program-vicinity)
file)

. Returns 0 if successful, non-0 if not.

This function is useful for compiled code init_ functions to load non-compiled Scheme (source) files. program-vicinity is the directory from which the calling code was loaded (see section `Vicinity' in SLIB).

Function: SCM scm_evstr (char *str): Returns the result of reading an expression from str and evaluating it.

Function: void scm_ldstr (char *str): Reads and evaluates all the expressions from str.

If you wish to catch errors during execution of Scheme code, then you can use a wrapper like this for your Scheme procedures:

(define (srv:protect proc)
  (lambda args
    (define result #f)                  ; put default value here
    (call-with-current-continuation
     (lambda (cont)
       (dynamic-wind (lambda () #t)
                     (lambda ()
                       (set! result (apply proc args))
                       (set! cont #f))
                     (lambda ()
                       (if cont (cont #f))))))
    result))

Calls to procedures so wrapped will return even if an error occurs.

Type Conversions

These type conversion functions are very useful for connecting SCM and C code. Most are defined in `rope.c'.

Function: SCM long2num (long n)

Function: SCM ulong2num (unsigned long n)

Return an object of type SCM corresponding to the long or unsigned long argument n. If n cannot be converted, BOOL_F is returned. Which numbers can be converted depends on whether SCM was compiled with the BIGDIG or FLOATS flags.

To convert integer numbers of smaller types (short or char), use the macro MAKINUM(n).

Function: long num2long (SCM num, char *pos, char *s_caller)

Function: unsigned long num2ulong (SCM num, char *pos, char *s_caller)

Function: unsigned short num2ushort (SCM num, char *pos, char *s_caller)

Function: unsigned char num2uchar (SCM num, char *pos, char *s_caller)

These functions are used to check and convert SCM arguments to the named C type. The first argument num is checked to see it it is within the range of the destination type. If so, the converted number is returned. If not, the ASSERT macro calls wta with num and strings pos and s_caller. For a listing of useful predefined pos macros, See section C Macros.

Note: Inexact numbers are accepted only by num2long and num2ulong (for when SCM is compiled without bignums). To convert inexact numbers to exact numbers, See section `Numerical operations' in Revised(4) Scheme.

Function: unsigned long scm_addr (SCM args, char *s_name)

Returns a pointer (cast to an unsigned long) to the storage corresponding to the location accessed by aref(CAR(args),CDR(args)). The string s_name is used in any messages from error calls by scm_addr.

scm_addr is useful for performing C operations on strings or other uniform arrays (see section Uniform Array).

Note: While you use a pointer returned from scm_addr you must keep a pointer to the associated SCM object in a stack allocated variable or GC-protected location in order to assure that SCM does not reuse that storage before you are done with it.

Function: SCM makfrom0str (char *src)
Function: SCM makfromstr (char *src, sizet len): Return a newly allocated string SCM object copy of the null-terminated string src or the string src of length len, respectively.

Function: SCM makfromstrs (int argc, char **argv): Returns a newly allocated SCM list of strings corresponding to the argc length array of null-terminated strings argv. If argv is less than 0, argv is assumed to be NULL terminated. makfromstrs is used by run_scm to convert the arguments SCM was called with to a SCM list which is the value of SCM procedure calls to program-arguments (see section SCM Session).

Function: char **makargvfrmstrs (SCM args, char *s_name)

Returns a NULL terminated list of null-terminated strings copied from the SCM list of strings args. The string s_name is used in messages from error calls by makargvfrmstrs.

makargvfrmstrs is useful for constructing argument lists suitable for passing to main functions.

Function: void must_free_argv (char **argv): Frees the storage allocated to create argv by a call to makargvfrmstrs.

Continuations

The source files `continue.h' and `continue.c' are designed to function as an independent resource for programs wishing to use continuations, but without all the rest of the SCM machinery. The concept of continuations is explained in section `Control features' in Revised(4) Scheme.

The C constructs jmp_buf, setjmp, and longjmp implement escape continuations. On VAX and Cray platforms, the setjmp provided does not save all the registers. The source files `setjump.mar', `setjump.s', and `ugsetjump.s' provide implementations which do meet this criteria.

SCM uses the names jump_buf, setjump, and longjump in lieu of jmp_buf, setjmp, and longjmp to prevent name and declaration conflicts.

Data type: CONTINUATION jmpbuf length stkbse other parent: is a typedefed structure holding all the information needed to represent a continuation. The other slot can be used to hold any data the user wishes to put there by defining the macro CONTINUATION_OTHER.

Macro: SHORT_ALIGN: If SHORT_ALIGN is #defined (in `scmfig.h'), then the it is assumed that pointers in the stack can be aligned on short int boundaries.

Data type: STACKITEM: is a pointer to objects of the size specified by SHORT_ALIGN being #defined or not.

Macro: CHEAP_CONTINUATIONS

If CHEAP_CONTINUATIONS is #defined (in `scmfig.h') each CONTINUATION has size sizeof CONTINUATION. Otherwise, all but root CONTINUATIONs have additional storage (immediately following) to contain a copy of part of the stack.

Note: On systems with nonlinear stack disciplines (multiple stacks or non-contiguous stack frames) copying the stack will not work properly. These systems need to #define CHEAP_CONTINUATIONS in `scmfig.h'.

Macro: STACK_GROWS_UP: Expresses which way the stack grows by its being #defined or not.

Variable: long thrown_value: Gets set to the value passed to throw_to_continuation.

Function: long stack_size (STACKITEM *start): Returns the number of units of size STACKITEM which fit between start and the current top of stack. No check is done in this routine to ensure that start is actually in the current stack segment.

Function: CONTINUATION *make_root_continuation (STACKITEM *stack_base): Allocates (malloc) storage for a CONTINUATION of the current extent of stack. This newly allocated CONTINUATION is returned if successful, 0 if not. After make_root_continuation returns, the calling routine still needs to setjump(new_continuation->jmpbuf) in order to complete the capture of this continuation.

Function: CONTINUATION *make_continuation (CONTINUATION *parent_cont): Allocates storage for the current CONTINUATION, copying (or encapsulating) the stack state from parent_cont->stkbse to the current top of stack. The newly allocated CONTINUATION is returned if successful, 0q if not. After make_continuation returns, the calling routine still needs to setjump(new_continuation->jmpbuf) in order to complete the capture of this continuation.

Function: void free_continuation (CONTINUATION *cont): Frees the storage pointed to by cont. Remember to free storage pointed to by cont->other.

Function: void throw_to_continuation (CONTINUATION *cont, long value, CONTINUATION *root_cont)

Sets thrown_value to value and returns from the continuation cont.

If CHEAP_CONTINUATIONS is #defined, then throw_to_continuation does longjump(cont->jmpbuf, val).

If CHEAP_CONTINUATIONS is not #defined, the CONTINUATION cont contains a copy of a portion of the C stack (whose bound must be CONT(root_cont)->stkbse). Then:

the stack is grown larger than the saved stack, if neccessary.
the saved stack is copied back into it's original position.
longjump(cont->jmpbuf, val);

Evaluation

SCM uses its type representations to speed evaluation. All of the subr types (see section Subr Cells) are tc7 types. Since the tc7 field is in the low order bit position of the CAR it can be retrieved and dispatched on quickly by dereferencing the SCM pointer pointing to it and masking the result.

All the SCM Special Forms get translated to immediate symbols (isym) the first time they are encountered by the interpreter (ceval). The representation of these immediate symbols is engineered to occupy the same bits as tc7. All the isyms occur only in the CAR of lists.

If the CAR of a expression to evaluate is not immediate, then it may be a symbol. If so, the first time it is encountered it will be converted to an immediate type ILOC or GLOC (see section Immediates). The codes for ILOC and GLOC lower 7 bits distinguish them from all the other types we have discussed.

Once it has determined that the expression to evaluate is not immediate, ceval need only retrieve and dispatch on the low order 7 bits of the CAR of that cell, regardless of whether that cell is a closure, header, or subr, or a cons containing ILOC or GLOC.

In order to be able to convert a SCM symbol pointer to an immediate ILOC or GLOC, the evaluator must be holding the pointer to the list in which that symbol pointer occurs. Turning this requirement to an advantage, ceval does not recursively call itself to evaluate symbols in lists; It instead calls the macro EVALCAR. EVALCAR does symbol lookup and memoization for symbols, retrieval of values for ILOCs and GLOCs, returns other immediates, and otherwise recursively calls itself with the CAR of the list.

ceval inlines evaluation (using EVALCAR) of almost all procedure call arguments. When ceval needs to evaluate a list of more than length 3, the procedure eval_args is called. So ceval can be said to have one level lookahead. The avoidance of recursive invocations of ceval for the most common cases (special forms and procedure calls) results in faster execution. The speed of the interpreter is currently limited on most machines by interpreter size, probably having to do with its cache footprint. In order to keep the size down, certain EVALCAR calls which don't need to be fast (because they rarely occur or because they are part of expensive operations) are instead calls to the C function evalcar.

Variable: symhash: Top level symbol values are stored in the symhash table. symhash is an array of lists of ISYMs and pairs of symbols and values.

Immediate: ILOC: Whenever a symbol's value is found in the local environment the pointer to the symbol in the code is replaced with an immediate object (ILOC) which specifies how many environment frames down and how far in to go for the value. When this immediate object is subsequently encountered, the value can be retrieved quickly.

ILOCs work up to a maximum depth of 4096 frames or 4096 identifiers in a frame. Radey Shouman added FARLOC to handle cases exceeding these limits. A FARLOC consists of a pair whose CAR is the immediate type IM_FARLOC_CAR or IM_FARLOC_CDR, and whose CDR is a pair of INUMs specifying the frame and distance with a larger range than ILOCs span.

Adding #define TEST_FARLOC to `eval.c' causes FARLOCs to be generated for all local identifiers; this is useful only for testing memoization.

Immediate: GLOC: Pointers to symbols not defined in local environments are changed to one plus the value cell address in symhash. This incremented pointer is called a GLOC. The low order bit is normally reserved for GCmark; But, since references to variables in the code always occur in the CAR position and the GCmark is in the CDR, there is no conflict.

If the compile FLAG CAUTIOUS is #defined then the number of arguments is always checked for application of closures. If the compile FLAG RECKLESS is #defined then they are not checked. Otherwise, number of argument checks for closures are made only when the function position (whose value is the closure) of a combination is not an ILOC or GLOC. When the function position of a combination is a symbol it will be checked only the first time it is evaluated because it will then be replaced with an ILOC or GLOC.

Macro: EVAL expression env

Macro: SIDEVAL expression env

EVAL Returns the result of evaluating expression in env. SIDEVAL evaluates expression in env when the value of the expression is not used.

Both of these macros alter the list structure of expression as it is memoized and hence should be used only when it is known that expression will not be referenced again. The C function eval is safe from this problem.

Function: SCM eval (SCM expression): Returns the result of evaluating expression in the top-level environment. eval copies expression so that memoization does not modify expression.

Program Self-Knowledge

File-System Habitat

Where should software reside? Although individually a minor annoyance, cumulatively this question represents many thousands of frustrated user hours spent trying to find support files or guessing where packages need to be installed. Even simple programs require proper habitat; games need to find their score files.

Aren't there standards for this? Some Operating Systems have devised regimes of software habitats -- only to have them violated by large software packages and imports from other OS varieties.

In some programs, the expected locations of support files are fixed at time of compilation. This means that the program may not run on configurations unanticipated by the authors. Compiling locations into a program also can make it immovable -- necessitating recompilation to install it.

Programs of the world unite! You have nothing to lose but loss itself.

The function scm_find_impl_file in `scm.c' is an attempt to create a utility (for inclusion in programs) which will hide the details of platform-dependent file habitat conventions. It takes as input the pathname of the executable file which is running. If there are systems for which this information is either not available or unrelated to the locations of support files, then a higher level interface will be needed.

Function: char *scm_find_impl_file(char *exec_path, char: *generic_name, char *initname, char *sep) Given the pathname of this executable (exec_path), test for the existence of initname in the implementation-vicinity of this program. Return a newly allocated string of the path if successful, 0 if not. The sep argument is a null-terminated string of the character used to separate directory components.

One convention is to install the support files for an executable program in the same directory as the program. This possibility is tried first, which satisfies not only programs using this convention, but also uninstalled builds when testing new releases, etc.
Another convention is to install the executables in a directory named `bin', `BIN', `exe', or `EXE' and support files in a directroy named `lib', which is a peer the executable directory. This arrangement allows multiple executables can be stored in a single directory. For example, the executable might be in `/usr/local/bin/' and initialization file in `/usr/local/lib/'. If the executable directory name matches, the peer directroy `lib' is tested for initname.
Sometimes `lib' directories become too crowded. So we look in any subdirectories of `lib' or `src' having the name (sans type suffix such as `.EXE') of the program we are running. For example, the executable might be `/usr/local/bin/foo' and initialization file in `/usr/local/lib/foo/'.
But the executable name may not be the usual program name; So also look in any generic_name subdirectories of `lib' or `src' peers.
Finally, if the name of the executable file being run has a (system dependent) suffix which is not needed to invoke the program, then look in a subdirectory (of the one containing the executable file) named for the executable (without the suffix); And look in a generic_name subdirectory. For example, the executable might be `C:\foo\bar.exe' and the initialization file in `C:\foo\bar\'.

Executable Pathname

For purposes of finding `Init5c4.scm', dumping an executable, and dynamic linking, a SCM session needs the pathname of its executable image.

When a program is executed by MS-DOS, the full pathname of that executable is available in argv[0]. This value can be passed directly to scm_find_impl_file (see section File-System Habitat).

In order to find the habitat for a unix program, we first need to know the full pathname for the associated executable file.

Function: char *dld_find_executable (const char *command)

dld_find_executable returns the absolute path name of the file that would be executed if command were given as a command. It looks up the environment variable PATH, searches in each of the directory listed for command, and returns the absolute path name for the first occurrence. Thus, it is advisable to invoke dld_init as:

main (int argc, char **argv)
{
    ...
    if (dld_init (dld_find_executable (argv[0]))) {
        ...
    }
    ...
}

Note: If the current process is executed using the execve call without passing the correct path name as argument 0, dld_find_executable (argv[0]) will also fail to locate the executable file.

dld_find_executable returns zero if command is not found in any of the directories listed in PATH.

Script Support

Source code for these C functions is in the file `script.c'. section Shell Scripts for a description of script argument processing.

script_find_executable is only defined on unix systems.

Function: char *script_find_executable (const char *name): script_find_executable returns the path name of the executable which will is invoked by the script file name; name if it is a binary executable (not a script); or 0 if name does not exist or is not executable.

Function: char **script_process_argv(int argc; char **argv)

Given an main style argument vector argv and the number of arguments, argc, script_process_argv returns a newly allocated argument vector in which the second line of the script being invoked is substituted for the corresponding meta-argument.

If the script does not have a meta-argument, or if the file named by the argument following a meta-argument cannot be opened for reading, then 0 is returned.

script_process_argv correctly processes argument vectors of nested script invocations.

Function: int script_count_argv(char **argv): Returns the number of argument strings in argv.

Improvements To Make

Allow users to set limits for malloc() storage.
Prefix and make more uniform all C function, variable, and constant names. Provide a file full of #define's to provide backward compatability.
lgcd() needs to generate at most one bignum, but currently generates more.
divide() could use shifts instead of multiply and divide when scaling.
Currently, dumping an executable does not preserve ports. When loading a dumped executable, disk files could be reopened to the same file and position as they had when the executable was dumped.
Copying all of the stack is wasteful of storage. Any time a call-with-current-continuation is called the stack could be re-rooted with a frame which calls the contin just created. This in combination with checking stack depth could also be used to allow stacks deeper than 64K on the IBM PC.
In the quest for speed, there has been some discussion about a "Forth" style Scheme interpreter.

Provided there is still type code space available in SCM, if we devote some of the IMCAR codes to "inlined" operations, we should get a significant performance boost. What is eliminated is the having to look up a GLOC or ILOC and then dispatch on the subr type. The IMCAR operation would be dispatched to directly. Another way to view this is that we make available special form versions of CAR, CDR, etc. Since the actual operation code is localized in the interpreter, it is much easier than uncompilation and then recompilation to handle (trace car); For instance a switch gets set which tells the interpreter to instead always look up the values of the associated symbols.

Finishing Dynamic Linking

Scott Schwartz <schwartz@galapagos.cse.psu.edu> suggests: One way to tidy up the dynamic loading stuff would be to grab the code from perl5.

VMS

George Carrette (gjc@mitech.com) outlines how to dynamically link on VMS. There is already some code in `dynl.c' to do this, but someone with a VMS system needs to finish and debug it.

Say you have this `main.c' program:
```
main()
{init_lisp();
 lisp_repl();}
```
and you have your lisp in files `repl.c', `gc.c', eval.c and there are some toplevel non-static variables in use called the_heap, the_environment, and some read-only toplevel structures, such as the_subr_table.
```
$ LINK/SHARE=LISPRTL.EXE/DEBUG REPL.OBJ,GC.OBJ,EVAL.OBJ,LISPRTL.OPT/OPT
```
where `LISPRTL.OPT' must contain at least this:
```
SYS$LIBRARY:VAXCRTL/SHARE
UNIVERSAL=init_lisp
UNIVERSAL=lisp_repl
PSECT_ATTR=the_subr_table,SHR,NOWRT,LCL
PSECT_ATTR=the_heap,NOSHR,LCL
PSECT_ATTR=the_environment,NOSHR,LCL
```
Notice: The psect (Program Section) attributes.

LCL
means to keep the name local to the shared library. You almost always want to do that for a good clean library.
SHR,NOWRT
means shared-read-only. Which is the default for code, and is also good for efficiency of some data structures.
NOSHR,LCL
is what you want for everything else.
Note: If you do not have a handy list of all these toplevel variables, do not dispair. Just do your link with the /MAP=LISPRTL.MAP/FULL and then search the map file,
```
$SEARCH/OUT=LISPRTL.LOSERS LISPRTL.MAP  ",  SHR,NOEXE,  RD,  WRT"
```
And use an emacs keyboard macro to muck the result into the proper form. Of course only the programmer can tell if things can be made read-only. I have a DCL command procedure to do this if you want it.
Now MAIN.EXE would be linked thusly:
```
$ DEFINE LISPRTL USER$DISK:[JAFFER]LISPRTL.EXE

$LINK MAIN.OBJ,SYS$INPUT:/OPT
 SYS$LIBRARY:VAXCRTL/SHARE
 LISPRTL/SHARE
```
Note the definition of the LISPRTL logical name. Without such a definition you will need to copy `LISPRTL.EXE' over to `SYS$SHARE:' (aka `SYS$LIBRARY:') in order to invoke the main program once it is linked.
Now say you have a file of optional subrs, `MYSUBRS.C'. And there is a routine INIT_MYSUBRS that must be called before using it.
```
$ CC MYSUBRS.C
$ LINK/SHARE=MYSUBRS.EXE MYSUBRS.OBJ,SYS$INPUT:/OPT
  SYS$LIBRARY:VAXCRTL/SHARE
  LISPRTL/SHARE
  UNIVERSAL=INIT_MYSUBRS
```
Ok. Another hint is that you can avoid having to add the PSECT declaration of NOSHR,LCL by declaring variables status in the C language source. That works great for most things.
Then the dynamic loader would have to do this:
```
{void (*init_fcn)();
 long retval;
 retval = lib$find_image_symbol("MYSUBRS","INIT_MYSUBRS",&init_fcn,
                                "SYS$DISK:[].EXE");
 if (retval != SS$_NORMAL) error(...);
 (*init_fcn)();}
```
But of course all string arguments must be (struct dsc$descriptor *) and the last argument is optional if MYSUBRS is defined as a logical name or if `MYSUBRS.EXE' has been copied over to `SYS$SHARE'. The other consideration is that you will want to turn off C-c or other interrupt handling while you are inside most lib$ calls. As far as the generation of all the UNIVERSAL=... declarations. Well, you could do well to have that automatically generated from the public `LISPRTL.H' file, of course. VMS has a good manual called the Guide to Writing Modular Procedures or something like that, which covers this whole area rather well, and also talks about advanced techniques, such as a way to declare a program section with a pointer to a procedure that will be automatically invoked whenever any shared image is dynamically activated. Also, how to set up a handler for normal or abnormal program exit so that you can clean up side effects (such as opening a database). But for use with LISPRTL you probably don't need that hair. One fancier option that is useful under VMS for `LISPLIB.EXE' is to define all your exported procedures through an call vector instead of having them just be pointers into random places in the image, which is what you get by using UNIVERSAL. If you set up the call vector thing correctly it will allow you to modify and relink `LISPLIB.EXE' without having to relink programs that have been linked against it.

Windows NT

George Carrette (gjc@mitech.com) outlines how to dynamically link on Windows NT:

The Software Developers Kit has a sample called SIMPLDLL. Here is the gist of it, following along the lines of the VMS description above (contents of a makefile for the SDK NMAKE)

LISPLIB.exp:
LISPLIB.lib: LISPLIB.def
    $(implib) -machine:$(CPU) -def:LISPLIB.def -out:LISPLIB.lib

LISPLIB.DLL : $(LISPLIB_OBJS) LISPLIB.EXP
    $(link) $(linkdebug)              \
    -dll                 \
    -out:LISPLIB.DLL     \
    LISPLIB.EXP $(LISPLIB_OBJS) $(conlibsdll)

The `LISPDEF.DEF' file has this:

LIBRARY lisplib
EXPORT
 init_lisp
 init_repl

And `MAIN.EXE' using:

CLINK = $(link) $(ldebug) $(conflags) -out:$*.exe $** $(conlibsdll)

MAIN.EXE : MAIN.OBJ LISPLIB.LIB
 $(CLINK)

And `MYSUBRS.DLL' is produced using:

mysubrs.exp:
mysubrs.lib: mysubrs.def
    $(implib) -machine:$(CPU) -def:MYSUBRS.def -out:MYSUBRS.lib

mysubrs.dll : mysubrs.obj mysubrs.exp mysubrs.lib
    $(link) $(linkdebug) \
    -dll                 \
    -out:mysubrs.dll     \
    MYSUBRS.OBJ MYSUBRS.EXP LISPLIB.LIB $(conlibsdll)

Where `MYSUBRS.DEF' has
```
LIBRARY mysubrs
EXPORT
 INIT_MYSUBRS
```

And the dynamic loader looks something like this, calling the two procedures LoadLibrary and GetProcAddress.

LISP share_image_load(LISP fname)
{long iflag;
 LISP retval,(*fcn)(void);
 HANDLE hLib;
 DWORD err;
 char *libname,fcnname[64];
 iflag = nointerrupt(1);
 libname = c_string(fname);
 _snprintf(fcnname,sizeof(fcnname),"INIT_%s",libname);
 if (!(hLib = LoadLibrary(libname)))
   {err = GetLastError();
    retval = list2(fname,LSPNUM(err));
    serror1("library failed to load",retval);}
 if (!(fcn = (LISP (*)(void)) GetProcAddress(hLib,fcnname)))
   {err = GetLastError();
    retval = list2(fname,LSPNUM(err));
    serror1("could not find library init procedure",retval);}
 retval = (*fcn)();
 nointerrupt(iflag);
 return(retval);}

Note: in VMS the linker and dynamic loader is case sensitive, but all the language compilers, including C, will by default upper-case external symbols for use by the linker, although the debugger gets its own symbols and case sensitivity is language mode dependant. In Windows NT things are case sensitive generally except for file and device names, which are case canonicalizing like in the Symbolics filesystem.
Also: All this WINDOWS NT stuff will work in MS-DOS MS-Windows 3.1 too, by a method of compiling and linking under Windows NT, and then copying various files over to MS-DOS/WINDOWS.

Go to the first, previous, next, last section, table of contents.