|
Unsuitable on Unsuitable: General Object Store
Samuel A. Falvo II
kc5tja -at- arrl.net 2010 Jun 27 00:00 PDT
Out of the box, Forth lacks support for persistent text strings. A blog depends heavily on the concept of persistent string data to store its articles. The General Object Store was written specifically to address this problem by providing the persistent string abstract data type. Since the data type exhibits persistency as a core feature, many parts of the blog software exhibits structural simplicity over competing blog engines written in more mainstream languages, for they must marshall data to/from SQL constructs. 1 General Object Store1.1 OverviewTwo things compose the GOS: the handle table and the actual data store. The former table tells where pieces of data reside in the latter. The rest of the Unsuitable implementation identifies GOS-resident data through opaque integers (called handles). The GOS module implementation, however, synthesizes handles from offsets into the handle table arrays. This implementation detail permits a certain degree of innocuous laziness in writing and maintaining the GOS module. The following illustration demonstrates how object 12 references a 10-byte-long chunk of data at persistent location $0123.
addrs GOS
+-------+ +-----+----------+---------------+
0 | | | |//////////| |
+-------+ | |//////////| |
4 | | +-----+----------+---------------+
+-------+ ^ ^
8 | | | |
+-------+ | .---. |
12 | $0123 |------------------*--| + |---'
+-------+ `---'
16 | ... | ^
|
lens |
+------+ |
0 | | |
+------+ |
4 | | |
+------+ |
8 | | |
+------+ |
12 | 10 |------------------------'
+------+
16 | ... |
The handle table consists of two persistent parallel arrays. addrs elements hold either the sentinel -1, indicating an unused handle, or the persistent address of some data. lens elements hold, for all used handles, the lengths of the corresponding data. As I write this, I have not defined the value of a lens element for unused handles. Each column in the table consumes /hfields bytes of persistent space. The data pointed to by an element of addrs exists in the GOS storage space starting at gorg (GOS origin), and extending to gend (GOS end). 1.2 Code Tour
The lrn (logical record number) variable indicates which variable lrn The addr and length words query the object's location and size in the GOS. Note that addr returns a persistent pointer, not a dictionary pointer. You can change these fields using the setters addr! and length!. : addr addrs lrn @ + @f ; : length lens lrn @ + @f ; : addr! addrs lrn @ + !f ; : length! lens lrn @ + !f ; Fetching from and storing to the lrn variable directly in software breaks the declarative nature of coding I love so much. So, I write some more declarative accessors for the variable. : gob lrn @ ; : gob! lrn ! ; gob (general object) returns the current object we're set to work with. gob! alters this context. Before storing anything in the object store, we need to assign the next available handle to it: : available addrs lens begin 2dup < while over @f -1 = if
drop addrs - exit then swap cell+ swap repeat
abort" Out of GOB handles." ;
If no available handles exist, we throw an exception and quit out of the entire program. Thus, if available returns at all, we know with total certainty that a fresh object handle exists, thus eliminating the need for error checking elsewhere in Unsuitable1. The phrase addrs - computes the handle identifier from the persistent address used as the loop index. To actually commit data to the object store, you need to put it there. : put available gob! fence -rot dup length! archive addr! ; It works by first allocating a handle for your object, then it invokes archive to store the data you passed as a (address, length) tuple on the stack. Since archive consumes this tuple, we need to first duplicate some bits to remember the data size (dup lenth!) and where to find it (fence -rot ... addr!). Note that put returns no value on the stack; applications must invoke gob to query the placed data's object handle immediately after invoking put. The word archive works by transferring data into block storage in chunks not exceeding 1024 bytes, the maximum size you can move with ANSI Forth's block system. This word also deals with any alignment computation tasks as well, preserving the byte granularity we hold dear to minimize fragmentation. : reserve dup fence + gend > abort" Out of general storage space" g-fencepost +! update ; : place >r fence >core r@ move update r> reserve ; : partial 1024 fence 1023 and - min ; : part 2dup partial dup >r place r> /string ; : archive begin dup while part repeat 2drop ; When retrieving data, we need some place to put it. One approach was to let the application decide where to put it, since only the application knows best how it wants to manage its memory. I may yet go this route in a future revision, but so far, all retrievals of data from the GOS were best placed at HERE, Forth's dictionary fencepost/memory allocation pointer. Most words which place data at HERE end with a comma (since you're literally compiling into the dictionary), so I decided that get, should as well: : get, addr length retrieve, ; Analogously to archive, retrieve, deals with the dirty work of copying data out of block storage 1024-bytes at a time, dealing with cross-block alignment issues, and the like. : remaining over 1023 and 1024 swap - min ; : obtain swap >core swap here swap dup >r move r> allot ; : part 2dup remaining dup >r obtain r> /string ; : retrieve, begin dup while part repeat 2drop ; 2 general.fs Source ListingBelow lies the complete listing for general.fs: : fence g-fencepost @ gorg + ;
: reserve dup fence + gend > abort" Out of general storage space" g-fencepost +! update ;
: place >r fence >core r@ move update r> reserve ;
: partial 1024 fence 1023 and - min ;
: part 2dup partial dup >r place r> /string ;
: archive begin dup while part repeat 2drop ;
: remaining over 1023 and 1024 swap - min ;
: obtain swap >core swap here swap dup >r move r> allot ;
: part 2dup remaining dup >r obtain r> /string ;
: retrieve, begin dup while part repeat 2drop ;
variable lrn
: addr addrs lrn @ + @f ;
: length lens lrn @ + @f ;
: addr! addrs lrn @ + !f ;
: length! lens lrn @ + !f ;
: gob lrn @ ;
: gob! lrn ! ;
: available addrs lens begin 2dup < while over @f -1 = if
drop addrs - exit then swap cell+ swap repeat
abort" Out of GOB handles." ;
: put available gob! fence -rot dup length! archive addr! ;
: get, addr length retrieve, ;
3 GOS Issues and Work-AroundsNo convenient means yet exists to free previously used GOS space. You may mark a handle as unused by setting its addrs field to -1, and relying on some external utility to collapse the hole in the GOS space. At present, a few such holes exist, but I'm not yet motivated to write the defragmentation tool. put works by placing data at fence, the GOS analog of HERE. Thus, since only one fence exists for all CGI handler instances, only one CGI handler can write to the GOS at a time. I work around this synchronization nightmare by making all CGI handlers read-only, leaving the only write-capable software accessible through an SSH session. Since I'm the only author of articles on the blog and I no longer accept comments due to spam issues, I find this work-around actually works to my advantage. I am open to changing this arrangement with sufficient demand from readers, however. 4 ConclusionThis article detailed the GOS, how it's laid out in persistent memory space, and the primary entry-points to the module. In the next article, I hope to cover the articles relation. See you then. 1 I wrote this code long before I stumbled upon the Declarative, Imperative, then Interrogative pattern. We can see the seeds of discovery of this pattern starting to germinate here. |