In the descriptions below it is assumed that long int
s are 32
bits in length. Acutally, SCM is written to work with any long
int
size larger than 31 bits. With some modification, SCM could work
with word sizes as small as 24 bits.
All SCM objects are represented by type SCM. Type SCM
come
in 2 basic flavors, Immediates and Cells:
An immediate is a data type contained in type SCM
(long int
). The type codes distinguishing immediate types from
each other vary in length, but reside in the low order bits.
SCM
object x is an immediate or
non-immediate type, respectively.
1
in
the second to low order bit position. The high order 30 bits are used
for the integer's value.
SCM
x is an immediate integer or not
an immediate integer, respectively.
long integer
corresponding to SCM
x.
SCM
inum corresponding to C long integer
x.
MAKINUM(0)
.
Computations on INUMs are performed by converting the arguments to C integers (by a shift), operating on the integers, and converting the result to an inum. The result is checked for overflow by converting back to integer and checking the reverse operation.
The shifts used for conversion need to be signed shifts. If the C
implementation does not support signed right shift this fact is detected
in a #if statement in `scmfig.h' and a signed right shift,
SRS
, is constructed in terms of unsigned right shift.
SCM
object x is a character.
unsigned char
.
char
x, returns SCM
character.
#t
#f
()
. If SICP
is #define
d, EOL
is
#define
d to be identical with BOOL_F
. In this case, both
print as #f
.
#<eof>
.
#<undefined>
used for variables which have not been defined and
absent optional arguments.
#<unspecified>
is returned for those procedures whose return
values are not specified.
isymnames[]
.
char *
representation (from isymnames[]
).
SCM
ispcsym n.
SCM
iisym n.
SCM
iflag n.
and
, begin
, case
, cond
, define
,
do
, if
, lambda
, let
, let*
,
letrec
, or
, quote
, set!
, #f
,
#t
, #<undefined>
, #<eof>
, ()
, and
#<unspecified>
.
0
.
There is one exception to this rule, CAR Immediates, described next.
A CAR Immediate is an Immediate point which can only occur in the
CAR
s of evaluated code (as a result of ceval
's memoization
process).
Cells represent all SCM objects other than immediates. A cell has
a CAR
and a CDR
. Low-order bits in CAR
identify
the type of object. The rest of CAR
and CDR
hold object
data. The number after tc
specifies how many bits are in the
type code. For instance, tc7
indicates that the type code is 7
bits.
SCM
local
variable x.
Care needs to be taken that stores into the new cell pointed to by x do not create an inconsistent object. See section Signals.
All of the C macros decribed in this section assume that their argument
is of type SCM
and points to a cell (CELLPTR
).
car
and cdr
of cell x, respectively.
tc3_cons
or isn't, respectively.
tc3_closure
s have a pointer to the body of the procedure in the
CAR
and a pointer to the environment in the CDR
. Bits 1
and 2 (zero-based) in the CDR
indicate a lower bound on the
number of required arguments to the closure, which is used to avoid
allocating rest argument lists in the environment cache. This encoding
precludes an immediate value for the CDR
: In the case of
an empty environment all bits above 2 in the CDR
are zero.
tc3_closure
.
Headers are Cells whose CDR
s point elsewhere in memory,
such as to memory allocated by malloc
.
tc7
type code
tc7_vector
or if not, respectively.
SCM
s holding the elements of vector
x or its length, respectively.
malloc
ed scheme symbol (can be GCed)
tc7_ssymbol
or
tc7_msymbol
.
char
s or as unsigned char
s holding
the elements of symbol x or its length, respectively.
tc7_string
or isn't,
respectively.
char
s or as unsigned char
s holding
the elements of string x or its length, respectively.
A cclo is similar to a vector (and is GCed like one), but can be applied as a function:
SCM
data. Elements of a cclo are referenced
using VELTS(cclo)[n]
just as for vectors.
A Subr is a header whose CDR
points to a C code procedure.
Scheme primitive procedures are subrs. Except for the arithmetic
tc7_cxr
s, the C code procedures will be passed arguments (and
return results) of type SCM
.
+
, -
,
*
, /
, max
, and min
.
CDR
should be a function which takes and returns type
double
. Conversions are handled in the interpreter.
floor
, ceiling
, truncate
, round
,
$sqrt
, $abs
, $exp
, $log
, $sin
,
$cos
, $tan
, $asin
, $acos
, $atan
,
$sinh
, $cosh
, $tanh
, $asinh
, $acosh
,
$atanh
, and exact->inexact
are defined this way.
If the CDR
is 0
(NULL
), the name string of the
procedure is used to control traversal of its list structure argument.
car
, cdr
, caar
, cadr
, cdar
,
cddr
, caaar
, caadr
, cadar
, caddr
,
cdaar
, cdadr
, cddar
, cdddr
, caaaar
,
caaadr
, caadar
, caaddr
, cadaar
,
cadadr
, caddar
, cadddr
, cdaaar
,
cdaadr
, cdadar
, cdaddr
, cddaar
,
cddadr
, cdddar
, and cddddr
are defined this way.
BOOL_T
or BOOL_F
.
UNDEFINED
is passed in its place.
UNDEFINED
is passed in its place.
SCM
arguments.
SCM
arguments.
A ptob is a port object, capable of delivering or accepting characters. See section `Ports' in Revised(4) Report on the Algorithmic Language Scheme. Unlike the types described so far, new varieties of ptobs can be defined dynamically (see section Defining Ptobs). These are the initial ptobs:
popen()
.
popen()
.
cwos()
or cwis()
.
mksfpt()
(see section Soft Ports).
FILE *
stream for port x.
Ports which are particularly well behaved are called fports.
Advanced operations like file-position
and reopen-file
only work for fports.
A smob is a miscellaneous datatype. The type code and GCMARK bit
occupy the lower order 16 bits of the CAR
half of the cell. The
rest of the CAR
can be used for sub-type or other information.
The CDR
contains data of size long and is often a pointer to
allocated memory.
Like ptobs, new varieties of smobs can be defined dynamically (see section Defining Smobs). These are the initial smobs:
Inexact number data types are subtypes of type tc16_flo
. If the
sub-type is:
CDR
.
CDR
is a pointer to a malloc
ed double.
CDR
is a pointer to a malloc
ed pair of doubles.
Scm has large precision integers called bignums. They are stored in
sign-magnitude form with the sign occuring in the type code of the SMOBs
bigpos and bigneg. The magnitude is stored as a malloc
ed array
of type BIGDIG
which must be an unsigned integral type with size
smaller than long
. BIGRAD
is the radix associated with
BIGDIG
.
This type implements both conventional arrays (those with arbitrary data as elements see section Conventional Arrays) and uniform arrays (those with elements of a uniform type see section Uniform Array).
Conventional Arrays have a pointer to a vector for their CDR
.
Uniform Arrays have a pointer to a Uniform Vector type (string, bvect,
ivect, uvect, fvect, dvect, or cvect) in their CDR
.
IMMEDIATE: B,D,E,F=data bit, C=flag code, P=pointer address bit ................................ inum BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB10 ichr BBBBBBBBBBBBBBBBBBBBBBBB11110100 iflag CCCCCCC101110100 isym CCCCCCC001110100 IMCAR: only in car of evaluated code, cdr has cell's GC bit ispcsym 000CCCC00CCCC100 iloc 0DDDDDDDDDDDEFFFFFFFFFFF11111100 pointer PPPPPPPPPPPPPPPPPPPPPPPPPPPPP000 gloc PPPPPPPPPPPPPPPPPPPPPPPPPPPPP001 HEAP CELL: G=gc_mark; 1 during mark, 0 other times. 1s and 0s here indicate type. G missing means sys (not GC'd) SIMPLE: cons ..........SCM car..............0 ...........SCM cdr.............G closure ..........SCM code...........011 ...........SCM env...........CCG HEADERs: ssymbol .........long length....G0000101 ..........char *chars........... msymbol .........long length....G0000111 ..........char *chars........... string .........long length....G0001101 ..........char *chars........... vector .........long length....G0001111 ...........SCM **elts........... bvect .........long length....G0010101 ..........long *words........... spare G0010111 ivect .........long length....G0011101 ..........long *words........... uvect .........long length....G0011111 ......unsigned long *words...... spare G0100101 spare G0100111 fvect .........long length....G0101101 .........float *words........... dvect .........long length....G0101111 ........double *words........... cvect .........long length....G0110101 ........double *words........... contin .........long length....G0111101 .............*regs.............. cclo .........long length....G0111111 ...........SCM **elts........... SUBRs: spare 010001x1 spare 010011x1 subr_0 ..........int hpoff.....01010101 ...........SCM (*f)()........... subr_1 ..........int hpoff.....01010111 ...........SCM (*f)()........... cxr ..........int hpoff.....01011101 .........double (*f)().......... subr_3 ..........int hpoff.....01011111 ...........SCM (*f)()........... subr_2 ..........int hpoff.....01100101 ...........SCM (*f)()........... asubr ..........int hpoff.....01100111 ...........SCM (*f)()........... subr_1o ..........int hpoff.....01101101 ...........SCM (*f)()........... subr_2o ..........int hpoff.....01101111 ...........SCM (*f)()........... lsubr_2 ..........int hpoff.....01110101 ...........SCM (*f)()........... rpsubr ..........int hpoff.....01111101 ...........SCM (*f)()........... PTOBs: port 0bwroxxxxxxxxG1110111 ..........FILE *stream.......... socket ttttttt 00001xxxxxxxxG1110111 ..........FILE *stream.......... inport uuuuuuuuuuU00011xxxxxxxxG1110111 ..........FILE *stream.......... outport 0000000000000101xxxxxxxxG1110111 ..........FILE *stream.......... ioport uuuuuuuuuuU00111xxxxxxxxG1110111 ..........FILE *stream.......... fport 00 00000000G1110111 ..........FILE *stream.......... pipe 00 00000001G1110111 ..........FILE *stream.......... strport 00 00000010G1110111 ..........FILE *stream.......... sfport 00 00000011G1110111 ..........FILE *stream.......... SMOBs: free_cell 000000000000000000000000G1111111 ...........*free_cell........000 flo 000000000000000000000001G1111111 ...........float num............ dblr 000000000000000100000001G1111111 ..........double *real.......... dblc 000000000000001100000001G1111111 .........complex *cmpx.......... bignum ...int length...0000001 G1111111 .........short *digits.......... bigpos ...int length...00000010G1111111 .........short *digits.......... bigneg ...int length...00000011G1111111 .........short *digits.......... xxxxxxxx = code assigned by newsmob(); promise 000000000000000fxxxxxxxxG1111111 ...........SCM val.............. arbiter 000000000000000lxxxxxxxxG1111111 ...........SCM name............. macro 000000000000000mxxxxxxxxG1111111 ...........SCM name............. array ...short rank..cxxxxxxxxG1111111 ............*array..............
The garbage collector is in the latter half of `sys.c'. The primary goal of garbage collection (or GC) is to recycle those cells no longer in use. Immediates always appear as parts of other objects, so they are not subject to explicit garbage collection.
All cells reside in the heap (composed of heap segments). Note that this is different from what Computer Science usually defines as a heap.
The first step in garbage collection is to mark all heap objects
in use. Each heap cell has a bit reserved for this purpose. For pairs
(cons cells) the lowest order bit (0) of the CDR is used. For other
types, bit 8 of the CAR is used. The GC bits are never set except
during garbage collection. Special C macros are defined in `scm.h'
to allow easy manipulation when GC bits are possibly set. CAR
,
TYP3
, and TYP7
can be used on GC marked cells as they are.
We need to (recursively) mark only a few objects in order to assure that
all accessible objects are marked. Those objects are
sys_protects[]
(for example, dynwinds
), the current
C-stack and the hash table for symbols, symhash.
gc_mark()
is used for marking SCM cells. If
obj is marked, gc_mark()
returns. If obj is
unmarked, gc_mark sets the mark bit in obj, then calls
gc_mark()
on any SCM components of obj. The last call to
gc_mark()
is tail-called (looped).
mark_locations
is used for marking segments of
C-stack or saved segments of C-stack (marked continuations). The
argument len is the size of the stack in units of size
(STACKITEM)
.
Each longword in the stack is tried to see if it is a valid cell pointer
into the heap. If it is, the object itself and any objects it points to
are marked using gc_mark
. If the stack is word rather than
longword aligned (#define WORD_ALIGN)
, both alignments are tried.
This arrangement will occasionally mark an object which is no longer
used. This has not been a problem in practice and the advantage of
using the c-stack far outweighs it.
After all found objects have been marked, the heap is swept.
The storage for strings, vectors, continuations, doubles, complexes, and bignums is managed by malloc. There is only one pointer to each malloc object from its type-header cell in the heap. This allows malloc objects to be freed when the associated heap object is garbage collected.
gc_sweep
scans through all heap segments. The mark
bit is cleared from marked cells. Unmarked cells are spliced into
freelist, where they can again be returned by invocations of
NEWCELL
.
If a type-header cell pointing to malloc space is unmarked, the malloc
object is freed. If the type header of smob is collected, the smob's
free
procedure is called to free its storage.
The memory management component of SCM contains special features which optimize the allocation and garbage collection of environments.
The optimizations are based on certain facts and assumptions:
The SCM evaluator creates many environments with short lifetimes and these account of a large portion of the total number of objects allocated.
The general purpose allocator allocates objects from a freelist, and collects using a mark/sweep algorithm. Research into garbage collection suggests that such an allocator is sub-optimal for object populations containing a large portion of short-lived members and that allocation strategies involving a copying collector are more appropriate.
It is a property of SCM, reflected throughout the source code, that a simple copying collector can not be used as the general purpose memory manager: much code assumes that the run-time stack can be treated as a garbage collection root set using conservative garbage collection techniques, which are incompatible with objects that change location.
Nevertheless, it is possible to use a mostly-separate copying-collector, just for environments. Roughly speaking, cons pairs making up environments are initially allocated from a small heap that is collected by a precise copying collector. These objects must be handled specially for the collector to work. The (presumably) small number of these objects that survive one collection of the copying heap are copied to the general purpose heap, where they will later be collected by the mark/sweep collector. The remaining pairs are more rapidly collected than they would otherwise be and all of this collection is accomplished without having to mark or sweep any other segment of the heap.
Allocating cons pairs for environments from this special heap is a heuristic that approximates the (unachievable) goal:
allocate all short-lived objects from the copying-heap, at no extra cost in allocation time.
Implementation Details
A separate heap (ecache_v
) is maintained for the copying
collector. Pairs are allocated from this heap in a stack-like fashion.
Objects in this heap may be protected from garbage collection by:
scm_estk
) is used in place of the C
run-time stack by the SCM evaluator to hold local variables which refer
to the copying heap.
scm_egc_roots
). If no object in the mark/sweep
heap directly references an object from the copying heap, that object
can be preserved by storing a direct reference to it in the
copying-collector root set.
When the copying heap or root-set becomes full, the copying collector is invoked. All protected objects are copied to the mark-sweep heap. All references to those objects are updated. The copying collector root-set and heap are emptied.
References to pairs allocated specificly for environments are inaccessible to the Scheme procedures evaluated by SCM. These pairs are manipulated by only a small number of code fragments in the interpreter. To support copying collection, those code fragments (mostly in `eval.c') have been modified to protect environments from garbage collection using the three rules listed above.
During a mark-sweep collection, the copying collector heap is marked and swept almost like any ordinary segment of the general purpose heap. The only difference is that pairs from the copying heap that become free during a sweep phase are not added to the freelist.
The environment cache is disabled by adding #define NO_ENV_CACHE
to `eval.c'; all environment cells are then allocated from the
regular heap.
This work seems to build upon a considerable amount of previous work into garbage collection techniques about which a considerable amount of literature is available.
SIGINT
and
SIGALRM
if they are supported by the C implementation. All of
the signal handlers immediately reestablish themselves by a call to
signal()
.
SIGINT
and SIGALRM
.
If an interrupt handler is defined when the interrupt is received, the
code is interpreted. If the code returns, execution resumes from where
the interrupt happened. Call-with-current-continuation
allows
the stack to be saved and restored.
SCM does not use any signal masking system calls. These are not a
portable feature. However, code can run uninterrupted by use of the C
macros DEFER_INTS
and ALLOW_INTS
.
ints_disabled
to 1. If an interrupt
occurs during a time when ints_disabled
is 1, then
deferred_proc
is set to non-zero, one of the global variables
SIGINT_deferred
or SIGALRM_deferred
is set to 1, and the
handler returns.
Calls to DEFER_INTS
can not be nested. An ALLOW_INTS
must
happen before another DEFER_INTS
can be done. In order to check
that this constraint is satisfied #define CAREFUL_INTS
in
`scmfig.h'.
ARGn
(> 5 or unknown ARG number)
ARG1
ARG2
ARG3
ARG4
ARG5
WNA
(wrong number of args)
OVFLOW
OUTOFRANGE
NALLOC
EXIT
HUP_SIGNAL
INT_SIGNAL
FPE_SIGNAL
BUS_SIGNAL
SEGV_SIGNAL
ALRM_SIGNAL
(char *)
Error checking is not done by ASSERT
if the flag RECKLESS
is defined. An error condition can still be signaled in this case with
a call to wta(arg, pos, subr)
.
goto
label if the expression (cond) is 0. Like
ASSERT
, ASRTGO
does is not active if the flag
RECKLESS
is defined.
When writing C-code for SCM, a precaution is recommended. If your
routine allocates a non-cons cell which will not be incorporated
into a SCM
object which is returned, you need to make sure that a
SCM
variable in your routine points to that cell as long as part
of it might be referenced by your code.
In order to make sure this SCM
variable does not get optimized
out you can put this assignment after its last possible use:
SCM_dummy1 = foo;
or put this assignment somewhere in your routine:
SCM_dummy1 = (SCM) &foo;
SCM_dummy
variables are not currently defined. Passing the
address of the local SCM
variable to any procedure also
protects it. The procedure scm_protect_temp
is provided for
this purpose.
Also, if you maintain a static pointer to some (non-immediate)
SCM
object, you must either make your pointer be the value cell
of a symbol (see errobj
for an example) or make your pointer be
one of the sys_protects
(see dynwinds
for an example).
The former method is prefered since it does not require any changes to
the SCM distribution.
To add a C routine to scm:
make_subr
or make_gsubr
call to init_scm
. Or
put an entry into the appropriate iproc
structure.
To add a package of new procedures to scm (see `crs.c' for example):
static char s_twiddle_bits[]="twiddle-bits!"; static char s_bitsp[]="bits?";
iproc
structure for each subr type used in `foo.c'
static iproc subr3s[]= { {s_twiddle-bits,twiddle-bits}, {s_bitsp,bitsp}, {0,0} };
init_<name of file>
routine at the end of the file
which calls init_iprocs
with the correct type for each of the
iproc
s created in step 5.
void init_foo() { init_iprocs(subr1s, tc7_subr_1); init_iprocs(subr3s, tc7_subr_3); }If your package needs to have a finalization routine called to free up storage, close files, etc, then also have a line in
init_foo
like:
add_final(final_foo);
final_foo
should be a (void) procedure of no arguments. The
finals will be called in opposite order from their definition.
The line:
add_feature("foo");will append a symbol
'foo
to the (list) value of
*features*
.
if
into `Init5c4.scm' which loads `Ifoo.scm' if
your package is included:
(if (defined? twiddle-bits!) (load (in-vicinity (implementation-vicinity) "Ifoo" (scheme-file-suffix))))or use
(provided? 'foo)
instead of (defined?
twiddle-bits!)
if you have added the feature.
init_foo\(\)\;
to the INITS=...
line at the beginning of the makefile.
These steps should allow your package to be linked into SCM with a minimum of difficulty. Your package should also work with dynamic linking if your SCM has this capability.
Special forms (new syntax) can be added to scm.
MAKISYM
in `scm.h' and increment
NUM_ISYMS
.
isymnames
in `repl.c'.
case:
clause to ceval()
near i_quasiquote
(in
`eval.c').
New syntax can now be added without recompiling SCM by the use of the
procedure->syntax
, procedure->macro
,
procedure->memoizing-macro
, and defmacro
. For details,
See section Syntax Extensions.
If CCLO is #define
d when compiling, the compiled closure
feature will be enabled. It is automatically enabled if dynamic linking
is enabled.
The SCM interpreter directly recognizes subrs taking small numbers of arguments. In order to create subrs taking larger numbers of arguments use:
char *
name which takes int
req required arguments,
int
opt optional arguments, and a list of rest arguments if
int
rest is 1 (0 for not).
SCM (*fcn)()
is a pointer to a C function to do the work.
The C function will always be called with req + opt +
rest arguments, optional arguments not supplied will be passed
UNDEFINED
. An error will be signaled if the subr is called with
too many or too few arguments. Currently a total of 10 arguments may be
specified, but increasing this limit should not be difficult.
/* A silly example, taking 2 required args, 1 optional, and a list of rest args */ #include <scm.h> SCM gsubr_21l(req1,req2,opt,rst) SCM req1,req2,opt,rst; { lputs("gsubr-2-1-l:\n req1: ", cur_outp); display(req1,cur_outp); lputs("\n req2: ", cur_outp); display(req2,cur_outp); lputs("\n opt: ", cur_outp); display(opt,cur_outp); lputs("\n rest: ", cur_outp); display(rst,cur_outp); newline(cur_outp); return UNSPECIFIED; } void init_gsubr211() { make_gsubr("gsubr-2-1-l", 2, 1, 1, gsubr_21l); }
Here is an example of how to add a new type named foo
to SCM.
The following lines need to be added to your code:
long tc16_foo;
static smobfuns foosmob = {markfoo,freefoo,printfoo,equalpfoo};
typedef struct { SCM (*mark)P((SCM)); sizet (*free)P((CELLPTR)); int (*print)P((SCM exp, SCM port, int writing)); SCM (*equalp)P((SCM, SCM)); } smobfuns;
smob.mark
SCM
(the cell to mark) and
returns type SCM
which will then be marked. If no further
objects need to be marked then return an immediate object such as
BOOL_F
. 2 functions are provided:
markcdr(ptr)
CDR(ptr)
.
mark0(ptr)
BOOL_F
.
smob.free
CELLPTR
(the cell to
collected) and returns type sizet
which is the number of
malloc
ed bytes which were freed. Smob.free
should free
any malloc
ed storage associated with this object. The function
free0(ptr) is provided which does not free any storage and returns 0.
smob.print
SCM
, is
the smob object. The second, of type SCM
, is the stream on which
to write the result. The third, of type int, is 1 if the object should
be write
n, 0 if it should be display
ed. This function
should return non-zero if it printed, and zero otherwise (in which case
a hexadecimal number will be printed).
smob.equalp
SCM
arguments. Both of these arguments
will be of type tc16foo
. This function should return
BOOL_T
if the smobs are equal, BOOL_F
if they are not. If
smob.equalp
is 0, equal?
will return BOOL_F
if they
are not eq?
.
tc16_foo = newsmob(&foosmob);
foosmob
. This
line goes in an init_
routine.
Promises and macros in `eval.c' and arbiters in `repl.c'
provide examples of SMOBs. There are a maximum of 256 SMOBs.
Smobs that must allocate blocks of memory should use, for example,
must_malloc
rather than malloc
See section Allocating memory.
ptobs are similar to smobs but define new types of port to which
SCM procedures can read or write. The following functions are defined
in the ptobfuns
:
typedef struct { SCM (*mark)P((SCM ptr)); int (*free)P((FILE *p)); int (*print)P((SCM exp, SCM port, int writing)); SCM (*equalp)P((SCM, SCM)); int (*fputc)P((int c, FILE *p)); int (*fputs)P((char *s, FILE *p)); sizet (*fwrite)P((char *s, sizet siz, sizet num, FILE *p)); int (*fflush)P((FILE *stream)); int (*fgetc)P((FILE *p)); int (*fclose)P((FILE *p)); } ptobfuns;
The .free
component to the structure takes a FILE *
or
other C construct as its argument, unlike .free
in a smob, which
takes the whole smob cell. Often, .free
and .fclose
can be
the same function. See fptob
and pipob
in `sys.c'
for examples of how to define ptobs.
Ptobs that must allocate blocks of memory should use, for example,
must_malloc
rather than malloc
See section Allocating memory.
SCM maintains a count of bytes allocated using malloc, and calls the
garbage collector when that number exceeds a dynamically managed limit.
In order for this to work properly, malloc
and free
should
not be called directly to manage memory freeable by garbage collection.
The following functions are provided for that purpose:
must_malloc
returns
a pointer to newly allocated memory. must_malloc_cell
returns a
newly allocated cell whose car
is c and whose cdr
is
a pointer to newly allocated memory.
must_realloc_cell
takes as argument z a cell whose
cdr
should be a pointer to a block of memory of length olen
allocated with must_malloc_cell
and modifies the cdr
to point
to a block of memory of length len. must_realloc
takes as
argument where the address of a block of memory of length olen
allocated by must_malloc
and returns the address of a block of
length len.
The contents of the reallocated block will be unchanged up the the minimum of the old and new sizes.
what is a pointer to a string used for error and gc messages.
must_malloc
, must_malloc_cell
, must_realloc
, and
must_realloc_cell
must be called with interrupts deferred
See section Signals.
must_free
is used to free a block of memory allocated by the
above functions and pointed to by ptr. len is the length of
the block in bytes, but this value is used only for debugging purposes.
If it is difficult or expensive to calculate then zero may be used
instead.
To use SCM as a whole from another program call init_scm
or
run_scm
as is done in main()
in `scm.c'.
In order to call indivdual Scheme procedures from C code more is required; SCM's storage system needs to be initialized. The simplest way to do this for a statically linked single-thread program is to:
#define RTL
flag when compiling `scm.c' to elide
SCM's main()
.
main()
, call run_scm
with arguments (argc
and argv
) to invoke your code's startup routine.
For a dynamically linked single-thread program:
init_
procedure for your code which will set up any Scheme
definitions you need and then call your startup routine
(see section Changing Scm).
init_
procedure will be called, and
hence your startup routine.
Now use apply
(and perhaps intern
) to call Scheme
procedures from your C code. For example:
/* If this apply fails, SCM will catch the error */ apply(CDR(intern("srv:startup",sizeof("srv:startup")-1)), mksproc(srvproc), listofnull); func = CDR(intern(rpcname,strlen(rpcname))); retval = apply(func, cons(mksproc(srvproc), args), EOL);
SCM now has routines to make calling back to Scheme procedures easier. The source code for these routines are found in `rope.c'.
(in-vicinity (program-vicinity)
file)
. Returns 0 if successful, non-0 if not.
This function is useful for compiled code init_ functions to load
non-compiled Scheme (source) files. program-vicinity
is the
directory from which the calling code was loaded (see section `Vicinity' in SLIB).
If you wish to catch errors during execution of Scheme code, then you can use a wrapper like this for your Scheme procedures:
(define (srv:protect proc) (lambda args (define result #f) ; put default value here (call-with-current-continuation (lambda (cont) (dynamic-wind (lambda () #t) (lambda () (set! result (apply proc args)) (set! cont #f)) (lambda () (if cont (cont #f)))))) result))
Calls to procedures so wrapped will return even if an error occurs.
These type conversion functions are very useful for connecting SCM and C code. Most are defined in `rope.c'.
SCM
corresponding to the long
or
unsigned long
argument n. If n cannot be converted,
BOOL_F
is returned. Which numbers can be converted depends on
whether SCM was compiled with the BIGDIG
or FLOATS
flags.
To convert integer numbers of smaller types (short
or
char
), use the macro MAKINUM(n)
.
SCM
arguments to
the named C type. The first argument num is checked to see it it
is within the range of the destination type. If so, the converted
number is returned. If not, the ASSERT
macro calls wta
with num and strings pos and s_caller. For a listing
of useful predefined pos macros, See section C Macros.
Note: Inexact numbers are accepted only by num2long
and
num2ulong
(for when SCM
is compiled without bignums). To
convert inexact numbers to exact numbers, See section `Numerical operations' in Revised(4) Scheme.
unsigned long
) to the storage
corresponding to the location accessed by
aref(CAR(args),CDR(args))
. The string s_name is used in
any messages from error calls by scm_addr
.
scm_addr
is useful for performing C operations on strings or
other uniform arrays (see section Uniform Array).
Note: While you use a pointer returned from scm_addr
you
must keep a pointer to the associated SCM
object in a stack
allocated variable or GC-protected location in order to assure that SCM
does not reuse that storage before you are done with it.
SCM
object copy of the
null-terminated string src or the string src of length
len, respectively.
SCM
list of strings corresponding to
the argc length array of null-terminated strings argv. If
argv is less than 0
, argv is assumed to be
NULL
terminated. makfromstrs
is used by run_scm
to
convert the arguments SCM was called with to a SCM
list which is
the value of SCM procedure calls to program-arguments
(see section SCM Session).
NULL
terminated list of null-terminated strings copied
from the SCM
list of strings args. The string s_name
is used in messages from error calls by makargvfrmstrs
.
makargvfrmstrs
is useful for constructing argument lists suitable
for passing to main
functions.
makargvfrmstrs
.
The source files `continue.h' and `continue.c' are designed to function as an independent resource for programs wishing to use continuations, but without all the rest of the SCM machinery. The concept of continuations is explained in section `Control features' in Revised(4) Scheme.
The C constructs jmp_buf
, setjmp
, and longjmp
implement escape continuations. On VAX and Cray platforms, the setjmp
provided does not save all the registers. The source files
`setjump.mar', `setjump.s', and `ugsetjump.s' provide
implementations which do meet this criteria.
SCM uses the names jump_buf
, setjump
, and longjump
in lieu of jmp_buf
, setjmp
, and longjmp
to prevent
name and declaration conflicts.
typedef
ed structure holding all the information needed to
represent a continuation. The other slot can be used to hold any
data the user wishes to put there by defining the macro
CONTINUATION_OTHER
.
SHORT_ALIGN
is #define
d (in `scmfig.h'), then the
it is assumed that pointers in the stack can be aligned on short
int
boundaries.
SHORT_ALIGN
being #define
d or not.
CHEAP_CONTINUATIONS
is #define
d (in `scmfig.h')
each CONTINUATION
has size sizeof CONTINUATION
.
Otherwise, all but root CONTINUATION
s have additional
storage (immediately following) to contain a copy of part of the stack.
Note: On systems with nonlinear stack disciplines (multiple
stacks or non-contiguous stack frames) copying the stack will not work
properly. These systems need to #define CHEAP_CONTINUATIONS
in
`scmfig.h'.
#define
d or not.
throw_to_continuation
.
STACKITEM
which fit between
start and the current top of stack. No check is done in this
routine to ensure that start is actually in the current stack
segment.
malloc
) storage for a CONTINUATION
of the
current extent of stack. This newly allocated CONTINUATION
is
returned if successful, 0
if not. After
make_root_continuation
returns, the calling routine still needs
to setjump(new_continuation->jmpbuf)
in order to complete
the capture of this continuation.
CONTINUATION
, copying (or
encapsulating) the stack state from parent_cont->stkbse
to
the current top of stack. The newly allocated CONTINUATION
is
returned if successful, 0
q if not. After
make_continuation
returns, the calling routine still needs to
setjump(new_continuation->jmpbuf)
in order to complete the
capture of this continuation.
cont->other
.
thrown_value
to value and returns from the
continuation cont.
If CHEAP_CONTINUATIONS
is #define
d, then
throw_to_continuation
does longjump(cont->jmpbuf, val)
.
If CHEAP_CONTINUATIONS
is not #define
d, the CONTINUATION
cont contains a copy of a portion of the C stack (whose bound must
be CONT(root_cont)->stkbse
). Then:
longjump(cont->jmpbuf, val)
;
SCM uses its type representations to speed evaluation. All of the
subr
types (see section Subr Cells) are tc7
types. Since the
tc7
field is in the low order bit position of the CAR
it
can be retrieved and dispatched on quickly by dereferencing the SCM
pointer pointing to it and masking the result.
All the SCM Special Forms get translated to immediate symbols
(isym
) the first time they are encountered by the interpreter
(ceval
). The representation of these immediate symbols is
engineered to occupy the same bits as tc7
. All the isym
s
occur only in the CAR
of lists.
If the CAR
of a expression to evaluate is not immediate, then it
may be a symbol. If so, the first time it is encountered it will be
converted to an immediate type ILOC
or GLOC
(see section Immediates). The codes for ILOC
and GLOC
lower 7
bits distinguish them from all the other types we have discussed.
Once it has determined that the expression to evaluate is not immediate,
ceval
need only retrieve and dispatch on the low order 7 bits of
the CAR
of that cell, regardless of whether that cell is a
closure, header, or subr, or a cons containing ILOC
or
GLOC
.
In order to be able to convert a SCM symbol pointer to an immediate ILOC
or GLOC
, the evaluator must be holding the pointer to the list in which
that symbol pointer occurs. Turning this requirement to an advantage,
ceval
does not recursively call itself to evaluate symbols in
lists; It instead calls the macro EVALCAR. EVALCAR
does
symbol lookup and memoization for symbols, retrieval of values for ILOC
s
and GLOC
s, returns other immediates, and otherwise recursively calls
itself with the CAR
of the list.
ceval
inlines evaluation (using EVALCAR
) of almost all
procedure call arguments. When ceval
needs to evaluate a list of
more than length 3, the procedure eval_args
is called. So
ceval
can be said to have one level lookahead. The avoidance of
recursive invocations of ceval
for the most common cases (special
forms and procedure calls) results in faster execution. The speed of
the interpreter is currently limited on most machines by interpreter
size, probably having to do with its cache footprint. In order to keep
the size down, certain EVALCAR
calls which don't need to be fast
(because they rarely occur or because they are part of expensive
operations) are instead calls to the C function evalcar
.
symhash
table.
symhash
is an array of lists of ISYM
s and pairs of symbols
and values.
ILOC
) which specifies how many environment frames down and how
far in to go for the value. When this immediate object is subsequently
encountered, the value can be retrieved quickly.
ILOC
s work up to a maximum depth of 4096 frames or 4096
identifiers in a frame. Radey Shouman added FARLOC
to handle cases exceeding these limits. A FARLOC
consists of a
pair whose CAR is the immediate type IM_FARLOC_CAR
or
IM_FARLOC_CDR
, and whose CDR is a pair of INUMs specifying the
frame and distance with a larger range than ILOC
s span.
Adding #define TEST_FARLOC
to `eval.c' causes FARLOC
s
to be generated for all local identifiers; this is useful only for
testing memoization.
GLOC
. The low order bit is normally reserved for
GCmark; But, since references to variables in the code always occur in
the CAR
position and the GCmark is in the CDR
, there is no
conflict.
If the compile FLAG CAUTIOUS
is #defined then the number of
arguments is always checked for application of closures. If the compile
FLAG RECKLESS
is #defined then they are not checked. Otherwise,
number of argument checks for closures are made only when the function
position (whose value is the closure) of a combination is not an
ILOC
or GLOC
. When the function position of a combination
is a symbol it will be checked only the first time it is evaluated
because it will then be replaced with an ILOC
or GLOC
.
EVAL
Returns the result of evaluating expression in
env. SIDEVAL
evaluates expression in env when
the value of the expression is not used.
Both of these macros alter the list structure of expression as it
is memoized and hence should be used only when it is known that
expression will not be referenced again. The C function
eval
is safe from this problem.
eval
copies expression
so that memoization
does not modify expression
.
Where should software reside? Although individually a minor annoyance, cumulatively this question represents many thousands of frustrated user hours spent trying to find support files or guessing where packages need to be installed. Even simple programs require proper habitat; games need to find their score files.
Aren't there standards for this? Some Operating Systems have devised regimes of software habitats -- only to have them violated by large software packages and imports from other OS varieties.
In some programs, the expected locations of support files are fixed at time of compilation. This means that the program may not run on configurations unanticipated by the authors. Compiling locations into a program also can make it immovable -- necessitating recompilation to install it.
Programs of the world unite! You have nothing to lose but loss itself.
The function scm_find_impl_file
in `scm.c' is an attempt to
create a utility (for inclusion in programs) which will hide the details
of platform-dependent file habitat conventions. It takes as input the
pathname of the executable file which is running. If there are systems
for which this information is either not available or unrelated to the
locations of support files, then a higher level interface will be
needed.
For purposes of finding `Init5c4.scm', dumping an executable, and dynamic linking, a SCM session needs the pathname of its executable image.
When a program is executed by MS-DOS, the full pathname of that
executable is available in argv[0]
. This value can be passed
directly to scm_find_impl_file
(see section File-System Habitat).
In order to find the habitat for a unix program, we first need to know the full pathname for the associated executable file.
dld_find_executable
returns the absolute path name of the file
that would be executed if command were given as a command. It
looks up the environment variable PATH, searches in each of the
directory listed for command, and returns the absolute path name
for the first occurrence. Thus, it is advisable to invoke
dld_init
as:
main (int argc, char **argv) { ... if (dld_init (dld_find_executable (argv[0]))) { ... } ... }
Note: If the current process is executed using the
execve
call without passing the correct path name as argument 0,dld_find_executable (argv[0])
will also fail to locate the executable file.
dld_find_executable
returns zero if command
is not found
in any of the directories listed in PATH
.
Source code for these C functions is in the file `script.c'. section Shell Scripts for a description of script argument processing.
script_find_executable
is only defined on unix systems.
script_find_executable
returns the path name of the
executable which will is invoked by the script file name;
name if it is a binary executable (not a script); or 0 if
name does not exist or is not executable.
script_process_argv
returns a newly
allocated argument vector in which the second line of the script being
invoked is substituted for the corresponding meta-argument.
If the script does not have a meta-argument, or if the file named by the argument following a meta-argument cannot be opened for reading, then 0 is returned.
script_process_argv
correctly processes argument vectors of
nested script invocations.
malloc()
storage.
lgcd()
needs to generate at most one bignum, but currently
generates more.
divide()
could use shifts instead of multiply and divide when
scaling.
dump
ing an executable does not preserve ports. When
loading a dump
ed executable, disk files could be reopened to the
same file and position as they had when the executable was dumped.
Provided there is still type code space available in SCM, if we devote some of the IMCAR codes to "inlined" operations, we should get a significant performance boost. What is eliminated is the having to look up a
GLOC
orILOC
and then dispatch on the subr type. The IMCAR operation would be dispatched to directly. Another way to view this is that we make available special form versions ofCAR
,CDR
, etc. Since the actual operation code is localized in the interpreter, it is much easier than uncompilation and then recompilation to handle(trace car)
; For instance a switch gets set which tells the interpreter to instead always look up the values of the associated symbols.
Scott Schwartz <schwartz@galapagos.cse.psu.edu> suggests: One way to tidy up the dynamic loading stuff would be to grab the code from perl5.
George Carrette (gjc@mitech.com) outlines how to dynamically link on VMS. There is already some code in `dynl.c' to do this, but someone with a VMS system needs to finish and debug it.
main() {init_lisp(); lisp_repl();}
eval.c
and there are some toplevel non-static variables in use
called the_heap
, the_environment
, and some read-only
toplevel structures, such as the_subr_table
.
$ LINK/SHARE=LISPRTL.EXE/DEBUG REPL.OBJ,GC.OBJ,EVAL.OBJ,LISPRTL.OPT/OPT
SYS$LIBRARY:VAXCRTL/SHARE UNIVERSAL=init_lisp UNIVERSAL=lisp_repl PSECT_ATTR=the_subr_table,SHR,NOWRT,LCL PSECT_ATTR=the_heap,NOSHR,LCL PSECT_ATTR=the_environment,NOSHR,LCLNotice: The psect (Program Section) attributes.
LCL
SHR,NOWRT
NOSHR,LCL
$SEARCH/OUT=LISPRTL.LOSERS LISPRTL.MAP ", SHR,NOEXE, RD, WRT"And use an emacs keyboard macro to muck the result into the proper form. Of course only the programmer can tell if things can be made read-only. I have a DCL command procedure to do this if you want it.
$ DEFINE LISPRTL USER$DISK:[JAFFER]LISPRTL.EXE $LINK MAIN.OBJ,SYS$INPUT:/OPT SYS$LIBRARY:VAXCRTL/SHARE LISPRTL/SHARENote the definition of the
LISPRTL
logical name. Without such a
definition you will need to copy `LISPRTL.EXE' over to
`SYS$SHARE:' (aka `SYS$LIBRARY:') in order to invoke the main
program once it is linked.
INIT_MYSUBRS
that must be called before using it.
$ CC MYSUBRS.C $ LINK/SHARE=MYSUBRS.EXE MYSUBRS.OBJ,SYS$INPUT:/OPT SYS$LIBRARY:VAXCRTL/SHARE LISPRTL/SHARE UNIVERSAL=INIT_MYSUBRSOk. Another hint is that you can avoid having to add the
PSECT
declaration of NOSHR,LCL
by declaring variables status
in
the C language source. That works great for most things.
{void (*init_fcn)(); long retval; retval = lib$find_image_symbol("MYSUBRS","INIT_MYSUBRS",&init_fcn, "SYS$DISK:[].EXE"); if (retval != SS$_NORMAL) error(...); (*init_fcn)();}But of course all string arguments must be
(struct dsc$descriptor
*)
and the last argument is optional if MYSUBRS
is defined as a
logical name or if `MYSUBRS.EXE' has been copied over to
`SYS$SHARE'. The other consideration is that you will want to turn
off C-c or other interrupt handling while you are inside most
lib$
calls.
As far as the generation of all the UNIVERSAL=...
declarations. Well, you could do well to have that automatically
generated from the public `LISPRTL.H' file, of course.
VMS has a good manual called the Guide to Writing Modular
Procedures or something like that, which covers this whole area rather
well, and also talks about advanced techniques, such as a way to declare
a program section with a pointer to a procedure that will be
automatically invoked whenever any shared image is dynamically
activated. Also, how to set up a handler for normal or abnormal program
exit so that you can clean up side effects (such as opening a database).
But for use with LISPRTL
you probably don't need that hair.
One fancier option that is useful under VMS for `LISPLIB.EXE' is to
define all your exported procedures through an call vector instead
of having them just be pointers into random places in the image, which
is what you get by using UNIVERSAL
.
If you set up the call vector thing correctly it will allow you to
modify and relink `LISPLIB.EXE' without having to relink programs
that have been linked against it.
George Carrette (gjc@mitech.com) outlines how to dynamically link on Windows NT:
LISPLIB.exp: LISPLIB.lib: LISPLIB.def $(implib) -machine:$(CPU) -def:LISPLIB.def -out:LISPLIB.lib LISPLIB.DLL : $(LISPLIB_OBJS) LISPLIB.EXP $(link) $(linkdebug) \ -dll \ -out:LISPLIB.DLL \ LISPLIB.EXP $(LISPLIB_OBJS) $(conlibsdll)
LIBRARY lisplib EXPORT init_lisp init_repl
CLINK = $(link) $(ldebug) $(conflags) -out:$*.exe $** $(conlibsdll) MAIN.EXE : MAIN.OBJ LISPLIB.LIB $(CLINK)
mysubrs.exp: mysubrs.lib: mysubrs.def $(implib) -machine:$(CPU) -def:MYSUBRS.def -out:MYSUBRS.lib mysubrs.dll : mysubrs.obj mysubrs.exp mysubrs.lib $(link) $(linkdebug) \ -dll \ -out:mysubrs.dll \ MYSUBRS.OBJ MYSUBRS.EXP LISPLIB.LIB $(conlibsdll)
LIBRARY mysubrs EXPORT INIT_MYSUBRS
LoadLibrary
and GetProcAddress
.
LISP share_image_load(LISP fname) {long iflag; LISP retval,(*fcn)(void); HANDLE hLib; DWORD err; char *libname,fcnname[64]; iflag = nointerrupt(1); libname = c_string(fname); _snprintf(fcnname,sizeof(fcnname),"INIT_%s",libname); if (!(hLib = LoadLibrary(libname))) {err = GetLastError(); retval = list2(fname,LSPNUM(err)); serror1("library failed to load",retval);} if (!(fcn = (LISP (*)(void)) GetProcAddress(hLib,fcnname))) {err = GetLastError(); retval = list2(fname,LSPNUM(err)); serror1("could not find library init procedure",retval);} retval = (*fcn)(); nointerrupt(iflag); return(retval);}
Go to the first, previous, next, last section, table of contents.