ZX Spectrum and loaders – Part One

[youtube uEZWAcr39lg nolink]

Article written by Martin Maly,
and published at www.uelectronics.info


Everyone who had once owned a ZX Spectrum, surely had at least once seen an amazing loading effect and wondered: How are they doing this? So, come and have a read… It will be a long story and perhaps you might even learn something.

Because ZX Spectrum didn’t have any dedicated circuit for storing and retrieving programs to and from a cassette tape (such as PMD), it had to do it purely by software. At the highest level, programs were loaded as a pair of “header + data”. In the header file there was information about the file (name, type, length, etc.) and the data file contained actual custom data.

The header file has a length of 17 bytes and a following structure:

Position Length Description
0 1 Type (0=program, 1=number array, 2=character array, 3=code)
1 10 Name (right padded with spaces to 10 characters)
11 2 Length of data block
13 2 Parameter 1. Eg. for programs it is a parameter LINE, for ‘code’ it is the address of beginning of the code block
15 2 Parameter 2. Eg. for programs it is the beginning of variables

Data block doesn’t have any such structure, it just contains data.

Let us descend one level: Each block was preceded by one flag byte ($00 for a header, $ff for data) and terminated with a checksum (all bytes, including the flag byte, XORed together). Header was therefore actually 19 bytes long – it began with 0 and ended with the checksum, but these bytes were “consumed” by the system and you couldn’t get your hands on them.

This is where the level at which you could get by calling routines from the ROM ends. The rest of it was just

Ones and zeroes

Data on the tape was written as a sequence of pulses. The processor took care of generating of these pulses, including the proper timing.

Each entry was preceded by a so-called pilot tone. It was created by a rectangular signal (where ON and OFF states were regularly alternated), which means that MIC output in the ON state was 2168 T long (T is the processor clock), and in the OFF state, 2168 T long as well. Boot tone prepared input circuits (if there were any) in a tape recorder to the correct volume level. This boot tone before the header lasted 5 seconds and before the data block two seconds. The duration of the phase tone was determined by a flag byte, all values less than 128 meant a “long” phase tone and more than 128 meant a short one.

After the pilot tone a sync pulse was generated. Its purpose was to let the loading routine know that the pilot tone ends and to begin to process the data. Sync pulse was 667 T ON and 735 T OFF long.

The sync pulse was then followed by actual bytes of data. First was the flag byte, then the contents, and eventually checksum. Bytes were sent bit by bit from the highest one. Every bit was represented by a pulse (state switching to ON and OFF), but differed in length. For logical 0 it was 855T in the ON state and 855T in the OFF state, for logical 1 it was doubled, thus 1710T and 1710T, respectively.

When reading data (which is what we are after here) processor then measured the time between each change of the input signal (ie. time between transitions, or the so-called “Edges”). The recording was therefore not sensitive to polarity (such as for aforementioned PMD in the first version), however the processor did not have much time to do anything else, since it had to listen to EAR input, count loops until the signal changed, and check for SPACE key being pressed (which interrupted the loading, as you surely remember).

Assembler time

Now this is the moment to extract the ROM contents (taken from ‘The Complete Spectrum ROM Disassembly’ by Dr. Ian Logan and Dr. Frank O’Hara, as published by Melbourne House in 1983, available at WOS):


This subroutine is called to LOAD the header information (from 076E) and later
LOAD, or VERIFY, an actual block of data (from 0802).

0556 LD-BYTES   INC  D                  This resets the zero flag (D cannot
                                        hold +FF)
                EX   AF,AF'             The A register holds +00 for a header
                                        and +FF for a block of data
                                        The carry flag is reset for VERIFYing
                                        and set for LOADing
                DEC  D                  Restore D to its original value
                DI                      The maskable interrupt is now disabled
                LD   A,+0F              The border is made WHITE
                OUT  (+FE),A
                LD   HL,+053F           Pre-load the machine stack with the
                PUSH HL                 address - SA/LD-RET
                IN   A,(+FE)            Make an initial read of port 254
                RRA                     Rotate the byte obtained
                AND  +20                but keep only the EAR bit
                OR   +02                Signal RED border
                LD   C,A                Store the value in the C register
                                        (+22 for 'off' and +02 for 'on' - the
                                        present EAR state)
                CP   A                  Set the zero flag

The first stage of reading a tape involves showing that a pulsing signal
actually exists. (i.e. 'On/off' or 'off/on' edges.)

056B LD-BREAK   RET  NZ                 Return if the BREAK key is being pressed
056C LD-START   CALL 05E7,LD-EDGE-1     Return with the carry flag reset if
                JR   NC,056B,LD-BREAK   there is no 'edge' within approx.
                                        14,000 T states. But if an 'edge' is
                                        found the border will go CYAN

The next stage involves waiting a while and then showing that the signal is
still pulsing.

                LD   HL,+0415           The length of this waiting period will
0574 LD-WAIT    DJNZ 0574,LD-WAIT       be almost one second in duration.
                DEC  HL
                LD   A,H
                OR   L
                JR   NZ,0574,LD-WAIT
                CALL 05E3,LD-EDGE-2     Continue only if two edges are found
                JR   NC,056B,LD-BREAK   within the allowed time period.

Now accept only a 'leader signal'.

0580 LD-LEADER  LD   B,+9C              The timing constant
                CALL 05E3,LD-EDGE-2     Continue only if two edges are found
                JR   NC,056B,LD-BREAK   within the allowed time period
                LD   A,+C6              However the edges must have been found
                CP   B                  within about 3,000 T states of each
                JR   NC,056C,LD-START   other
                INC  H                  Count the pair of edges in the H
                JR   NZ,0580,LD-LEADER  register until 256 pairs have been found

After the leader come the 'off' and 'on' parts of the sync pulse.

058F LD-SYNC    LD   B,+C9              The timing constant
                CALL 05E7,LD-EDGE-1     Every edge is considered until two edges
                JR   NC,056B,LD-BREAK   are found close together - these will be
                LD   A,B                the start and finishing edges of the
                CP   +D4                'off' sync pulse
                JR   NC,058F,LD-SYNC
                CALL 05E7,LD-EDGE-1     The finishing edge of the 'on' pulse
                RET  NC                 must exist
                                        (Return carry flag reset)

The bytes of the header or the program/data block can now be LOADed or VERIFied.
But the first byte is the flag byte.

                LD   A,C                The border colours from now on will be
                XOR  +03                BLUE & YELLOW
                LD   C,A
                LD   H,+00              Initialize the 'parity matching' byte
                                        to zero
                LD   B,+B0              Set the timing constant for the flag
                JR   05C8,LD-MARKER     Jump forward into the byte LOADing loop

The byte LOADing loop is used to fetch the bytes one at a time. The flag byte is
first. This is followed by the data bytes and the last byte is the 'parity'

05A9 LD-LOOP    EX   AF,AF'             Fetch the flags
                JR   NZ,05B3,LD-FLAG    Jump forward only when handling the
                                        first byte
                JR   NC,05BD,LD-VERIFY  Jump forward is VERIFYing a tape
                LD   (IX+00),L          Make the actual LOAD when required
                JR   05C2,LD-NEXT       Jump forward to LOAD the next byte
05B3 LD-FLAG    RL   C                  Keep the carry flag in a safe place
                XOR  L                  Return now if the flag byte does not
                RET  NZ                 match the first byte on the tape
                                        (Carry flag reset)
                LD   A,C                Restore the carry flag now
                LD   C,A
                INC  DE                 Increase the counter to compensate for
                JR   05C4,LD-DEC        its decrease after the jump

If a data block is being verified then the freshly loaded byte is tested against
the original byte.

05BD LD-VERIFY  LD   A,(IX+00)          Fetch the original byte
                XOR  L                  Match it against the new byte
                RET  NZ                 Return if 'no match' (Carry flag reset)

A new byte can now be collected from the tape.

05C2 LD-NEXT    INC  IX                 Increase the 'destination'
05C4 LD-DEC     DEC  DE                 Decrease the 'counter'
                EX   AF,AF'             Save the flags
                LD   B,+B2              Set the timing constant
05C8 LD-MARKER  LD   L,+01              Clear the 'object' register apart from
                                        a 'marker' bit

The 'LD-8-BITS' loop is used to build up a byte in the L register.

05CA LD-8-BITS  CALL 05E3,LD-EDGE-2     Find the length of the 'off' and 'on'
                                        pulses of the next bit
                RET  NC                 Return if the time period is exceeded
                                        (Carry flag reset)
                LD   A,+CB              Compare the length against approx.
                CP   B                  2,400 T states; resetting the carry flag
                                        for a '0' and setting it for a '1'
                RL   L                  Include the new bit in the L register
                LD   B,+B0              Set the timing constant for the next bit
                JP   NC,05CA,LD-8-BITS  Jump back whilst there are still bits to
                                        be fetched

The 'parity matching' byte has to be updated with each new byte.

                LD   A,H                Fetch the 'parity matching' byte and
                XOR  L                  include the new byte
                LD   H,A                Save it once again

Passes round the loop are made until the 'counter' reaches zero. At that point
the 'parity matching' byte should be holding zero.

                LD   A,D                Make a furter pass if the DE register
                OR   E                  pair does not hold zero
                JR   NZ,05A9,LD-LOOP
                LD   A,H                Fetch the 'parity matching' byte
                CP   +01                Return with the carry flag set if the
                RET                     value is zero (Carry flag reset if in

These two subroutines form the most important part of the LOAD/VERIFY operation.
The subroutines are entered with a timing constant in the B register, and the
previous border colour and 'edge-type' in the C register.
The subroutines return with the carry flag set if the required number of 'edges'
have been found in the time allowed; and the change to the value in the B
register shows just how long it took to find the 'edge(s)'.
The carry flag will be reset if there is an error. The zero flag then signals
'BREAK pressed' by being reset, or 'time-up' by being set.
The entry point LD-EDGE-2 is used when the length of a complete pulse is
required and LD-EDGE-1 is used to find the time before the next 'edge'.

05E3 LD-EDGE-2  CALL 05E7,LD-EDGE-1     In effect call LD-EDGE-1 twice;
                RET  NC                 returning in between in there is an
05E7 LD-EDGE-1  LD   A,+16              Wait 358 T states before entering the
05E9 LD-DELAY   DEC  A                  sampling loop
                JR   NZ,05E9,LD-DELAY
                AND  A

The sampling loop is now entered. The value in the B register is incremented for
each pass; 'time-up' is given when B reaches zero.

05ED LD-SAMPLE  INC  B                  Count each pass
                RET  Z                  Return carry reset & zero set if
                LD   A,+7F              Read from port +7FFE
                IN   A,(+FE)            i.e. BREAK and EAR
                RRA                     Shift the byte
                RET  NC                 Return carry reset & zero reset if BREAK
                                        was pressed
                XOR  C                  Now test the byte against the 'last
                AND  +20                edge-type'
                JR   Z,05ED,LD-SAMPLE   Jump back unless it has changed

A new 'edge' has been found within the time period allowed for the search.
So change the border colour and set the carry flag.

                LD   A,C                Change the 'last edge-type' and border
                CPL                     colour
                LD   C,A
                AND  +07                Keep only the border colour
                OR   +08                Signal 'MIC off'
                OUT  (+FE),A            Change the border colour (RED/CYAN or
                SCF                     Signal the successful search before
                RET                     returning

Note: The LD-EDGE-1 subroutine takes 464 T states, plus an additional 59 T
states for each unsuccessful pass around the sampling loop.
For example, therefore, when awaiting the sync pulse (see LD-SYNC at 058F)
allowance is made for ten additional passes through the sampling loop.
The search is thereby for the next edge to be found within, roughly, 1,100 T
states (464 + 10 * 59 overhead).
This will prove successful for the sync 'off' pulse that comes after the long
'leader pulses'.

In this article there is no space for detailed description of how the algorithm works, but let’s quickly go over some facts.

Routine disables interrupts. This is logical, since it is dependent on exact timing. That’s also the reason why loaders stored in slow RAM ($4000- $7FFF) will not work.

The last state of EAR input is stored in the C register, but it is shifted by 1 bit to the right, and the lowest three bits contains border color. For the leader it is 02 (red) / 05 (cyan), for data it is 01 (blue) / 06 (yellow). Why the right shift? During loading, a sequence LD A, $7F and IN A, ($FE) is processed. The value $7F is sent to the upper 8 bits of the address, therefore when reading a keyboard input it selects a line of keys B – N – M – Symbol Shift – Space. The status of these keys is in the lowest five bits (0-4), bit 6 contains an EAR state. Using the RRC instruction the lowest bit (status of the SPACE key) moves to the flag CY and EAR state moves to the 5th bit position. Therefore, if CY is zero, it means that a SPACE was pressed and loading ends.

The routine begins by detecting a signal that corresponds to the pilot tone (LD-LEADER). If 255 of such pulses were read, it was then believed that this was a pilot tone and the routine waits for a shorter synchronization pulse. When it arrives (LD-SYNC), you can retrieve data.

LD-MARKER reads 1 byte to register L. It begins with a value of 01, to serve as a counter. Gradually it fills with bits from the right by instruction RL L, the highest bits then passes into CY. If CY = 0, they keep loading further, but once CY = 1, it means that a complete set of eight bits was retrieved.

Key routines are LD-EDGE-n (wherein n is 1 or 2). LD-EDGE-1 first waits for a certain period of time (465T) and then determines whether the value of the EAR input has changed and compares it against the last stored value (in register C, see above). If it has not changed, the loop is repeated. For each loop the value in the registry B is increased. Once it gets to zero, it means “timeout” – the edge did not come in the expected time limit.

If the edge is found, the content of the register C is negated. This results both in a change of stored EAR value, but also in a change of the color border.

LD-EDGE-2 actually performs two LD-EDGE-1 in a sequence.

LD-EDGE output is as follows:

  • CY = 0, Z = 1 – during a time interval EAR change did not come (“timeout”)
  • CY = 0, Z = 0 – SPACE was pressed (BREAK)
  • CY = 1 – edge was found, the current value of the counter is in B

The counter in B counts, as I wrote, upwards. During every loop pass, which takes 58 processor ticks, B is incremented by one. For example, when reading a bit, routine LD-EDGE-2 is called, thus seeking two edges. The counter in B is set to $B0. This means that the timeout comes after $4F cycles ($FF- $B0). This represents a 2 x 465T of wait loop + 79 * 58T = 5512T. After all this time, the routine reports a signal failure.

The resulting counter value is compared to $CB. If it is smaller, it is then evaluated as two short pulses of log. 0, if it is larger, then log. 1. The value of $CB means that the loop was done 27 times ($CB-$B0), specificaly that the two edges came at a time of less than 2496T. Let’s recall: For log. 0 the last two pulses take 1710T, for log. 1 they take 3420T. Therefore the difference between these two times is 2565 and this is roughly what I came up with. It is a bit less because of overhead (subroutine calls, evaluation etc., See LD-8-BITS).

Loader hacking

This was not so difficult, was it? So now, let’s make some of those tricks …

First, reading routine is not completely “T-pedantic”, so few Ts here or there do not pose any major problem. If you want a simple effect, we can add it without any complicated adjustments.

Border effects

We can for example change color of the stripes, if we do not like the default two-tone ones. How about a rainbow? Simply rewrite the end of the routine LD-EDGE:


          LD      A,C
          INC     A
          XOR     $20
          AND     $27
          LD      C,A
          AND     $07
          OR      $08
          OUT     ($FE),A

What’s going on here? Instead of negating of the contents of register C, we increase its value by 1 and negate the value of the fifth bit. Thanks to masking of the value of $27, we avoid the overflow which would affect the EAR bit. Therefore it will vary in the range 0-7 and create a rainbow effect border.

Note: If you intend to try this, be sure to place the loading routine in the upper 32 Kbytes of RAM!

If we add a pair of instructions XOR A; OUT ($FE), A to the end and before the instruction SCF, stripes in the border will change into short lines on a black background.

Here we can put any effects that affect a color border either by amending it, or by using it. What about the effect that Busy used with loader for Song In Lines 3 (rounded corners), it looks impressive, huh?

[youtube uEZWAcr39lg nolink]

Yet it is not very hard… Four squares in the corners contain a simple pattern (“rounding”). Pixels that are equal to 1 (the color INK) will look as if they were part of the border and will show streaks. I did not examine how Busy does it, but I’d bet that in principle it’s done somehow like this:


          LD      A,C ;Change of the last "edge type"
          CPL     ;as well as BORDER color.
          LD      C,A
          AND     $07 ;Taking just only the BORDER color
          OR      $08 ;MIC off
          OUT     ($FE),A ; Change border color
          OR      $30 ; A contains: 0 0 1 1 1 b b b (b = border color)
                  ; so PAPER=7, INK=border color,
                  ; BRIGHT 0, FLASH 0
          LD      ($5800),A ;left top corner attribute
          LD      ($581F),A ;right top corner
          LD      ($5AE0),A ;left bottom
          LD      ($5AFF),A ;right bottom

          SCF     ;Set CY=1 as a "success"
          RET     ;before return

Border effects are mostly simple and fast enough, so we can squash them here and not worry about the timing too much. Usually they fit within tolerance.

Simple effects with loaded content

During the loading we can certainly manage simpler operation with loaded data, either at the bit or byte level. Digisynth demos loader was able to perform real-time data extracting using Huffman decompression algorithm (Huffman suits this purpose quite well, you just need to have a decompression tree saved and pass through it according to the loaded bit). For those interested, I have prepared a reconstruction of the loader. But we will have a look at another case, and this will be a well known Mad Load – a routine that loads square images in a certain order. The video shows its improved version.

[youtube 5y2c6eZ7wb0 nolink]

Mad Load used a very simple data format. After the flag byte, data for each square followed. Each took 11 bytes – lower and higher byte of a screen address where the square should be stored, then 8 bytes of video memory and a 1 byte of attribute. This was followed by another square …

This loading routine was slightly modified by Frantisek Fuka (FUXOFT) – he made LD-8-BITS into a subroutine which reads 1 byte to register L. This subroutine is then used in another subroutine to retrieve one of the squares (MAD_SQUARE). Squares loading subroutine is then called over and over again until there is data to be loaded from the tape, and when there isn’t, it stops and returns back. There is no checksum performed or anything.

Here is the Mad Loader source code. I only commented on the parts that differ from the standard code.

          LD      HL,MAD_RETURN
          PUSH    HL
          JP      MAD_LOAD

          IN      A,($FE)
          AND     $20
          LD      C,A
          CP      A
          RET     NZ
          CALL    LD_EDGE_1
          JR      NC,LD_BREAK
          LD      HL,$0415
          DJNZ    LD_WAIT
          DEC     HL
          LD      A,H
          OR      L
          JR      NZ,LD_WAIT
          CALL    LD_EDGE_2
          JR      NC,LD_BREAK

          LD      B,$9C
          CALL    LD_EDGE_2
          JR      NC,LD_BREAK
          LD      A,$C6
          CP      B
          JR      NC,LD_START
          INC     H
          JR      NZ,LD_LEADER

LD_SYNC:  LD      B,$C9
          CALL    LD_EDGE_1
          JR      NC,LD_BREAK
          LD      A,B
          CP      $D4
          JR      NC,LD_SYNC
          CALL    LD_EDGE_1
          RET     NC

                  ; Mad Load itself
          CALL    LD_ONE_BYTE ; The first one is a flag byte. Drop it!
          CALL    MAD_SQUARE
          RET     NC
          JR      MAD_LOOP

          CALL    LD_ONE_BYTE ; Lower byte of address
          LD      A,L
          EX      AF,AF' ; save into AF'
          CALL    LD_ONE_BYTE ; Upper address byte
          RET     NC
          EX      AF,AF'
          LD      H,L ; to the H register
          LD      L,A ; and the first one to the L, so I have a full address in HL
          LD      B,$08 ;8 bitmap bytes for each square
          PUSH    HL ;save the address
          PUSH    BC ;and the counter
          CALL    LD_ONE_BYTE ; read 1 byte
          POP     BC ; restore the counter
          LD      A,L ; byte to accumulator
          POP     HL ; restore address for this byte
          RET     NC ; If some error occured, return
          LD      (HL),A ;Else store the byte into screen memory
          INC     H ; addr + 256 - it means "next screen microline"
          DJNZ    MAD_SCRN ; repeat for all 8 bytes
          LD      A,H ; Convert address from screen memory to attribute memory
          SUB     $08 ; First, sub 8 to get the original value
          RRA     ; H div 8
          AND     $03 ; lowest 2 bits
          OR      $58 ;$58, $59 or $5a - attribute memory address
          LD      H,A ; So now I have an attribute address in HL
          PUSH    HL ; save it
          CALL    LD_ONE_BYTE ; read one byte
          LD      A,L ; save it to accumulator
          POP     HL ; restore the address
          LD      (HL),A ; and put the attribute byte to a proper place
          RET     ; Finished, one square is done

          LD      B,$B2
          LD      L,$01
          CALL    LD_EDGE_2
          RET     NC
          LD      A,$CB
          CP      B
          RL      L
          LD      B,$B0
          JR      NC,LD_8_BITS


          CALL    LD_EDGE_1
FF78:     RET     NC
          LD      A,$16
          DEC     A
FF7C:     JR      NZ,LD_DELAY
FF7E:     AND     A
          INC     B
          RET     Z
          LD      A,$7F
          IN      A,($FE)
          XOR     C
          AND     $20
          JR      Z,LD_SAMPLE
          LD      A,C
          INC     A
          XOR     $20
          AND     $27
          LD      C,A
          AND     $07
          OR      $08
          OUT     ($FE),A

… And at this point I would to finish for now.

Next time we take a look at more complex effects, for example various counters of bytes or time remainging and other horseplay, for which we need more processor ticks, and so we will have to adjust the timing loop and chop our algorithms so that they fit into the time we have available. Meanwhile, please accept this brief “introductory to the mystery of loaders”, try to experiment yourself, and if you create some nice loader, show it off!

This article was first published at www.uelectronics.info by Martin Maly. He has given us permission to reproduce it here.

Author: Paulo Garcia

Leave a Reply

Your email address will not be published. Required fields are marked *