XHEX encoding

Chombit uses a proprietary scheme called expanded hexadecimal (XHEX) to compress 32-bit integers such as $D000_0000 into a single byte, if almost all of their digits are 0 or F. Base addresses and bitmasks are typically encodable using XHEX.

Consider this assembled code:

$C0_0000: FC 00 A0 00 00    PUSH INT $00A0_0000
$C0_0005: 32 01             POP I:4
$C0_0007: 43 01 5A          MOVE I:4, XHEX $00A0_0000
$C0_000A: 43 02 B8          MOVE I:8, XHEX $FFFF_8FFF

Normally, assigning $00A0_0000 to I:4 requires 7 bytes of assembly language, as a combination of PUSH and POP instructions seen above. We can't do MOVE I:4, $00A0_0000 because that would require 6 bytes (one for the MOVE opcode, one to indicate I:4, and four bytes to represent $00A0_0000). The Chombit hardware does not permit instructions longer than 5 bytes.

But in this case, we are allowed to use MOVE I:4, XHEX $00A0_0000. This form only requires 3 bytes—a significant reduction in size. XHEX compresses the number $00A0_0000 into a single byte $5A.

Encoding scheme

How does it work? A 32-bit number can be represented as XHEX in two circumstances:

Duplicated 0s: if seven of the eight hex digits have the value 0 (for example $00A0_0000); or
Duplicated Fs: if seven of the eight hex digits have the value F (for example $FFFF_8FFF)

The interesting digit becomes the right nibble of the encoded byte. In our example $00A0_0000, the interesting digit is A.

The left nibble indicates the position of the interesting digit. There are five 0s after A in $00A0_0000, so the left nibble is 5. Thus, $00A0_0000 becomes $5A.

What about $FFFF_8FFF? With duplicated 0s, $0000_8000 would be encoded as $38. To indicate duplicated Fs, we set the high bit by adding $80. Thus, $38 + $80 gives $B8 as the encoding for $FFFF_8FFF.

Notes

Here are some more examples:

XHEX $C000_0000 = $7C
XHEX $0000_0006 = $06
XHEX $0030_0000 = $53
XHEX $FF3F_FFFF = $D3
XHEX $0000_0000 = $00
XHEX $FFFF_FFFF = $8F

Some numbers have more than one representation. For example, $8F and $FF both decode to $FFFF_FFFF.

Encoding scheme​

Notes​

Encoding scheme

Notes