Skip to main content

XHEX encoding

Chombit uses a proprietary scheme called expanded hexadecimal (XHEX) to compress 32-bit integers such as $d000_0000 into a single byte, if almost all of their digits are 0 or f. Base addresses and bitmasks are typically encodable using XHEX.

Consider this assembled code:

$c0_0000: fc 00 a0 00 00    push int $00a0_0000
$c0_0005: 32 01 pop i:4
$c0_0007: 43 01 5a move i:4, xhex $00a0_0000
$c0_000a: 43 02 b8 move i:8, xhex $ffff_8fff

Normally, assigning $00a0_0000 to i:4 requires 7 bytes of assembly language, as a combination of push and pop instructions seen above. We can't do move i:4, $00a0_0000 because that would require 6 bytes (one for the move opcode, one to indicate i:4, and four bytes to represent $00a0_0000). The Chombit hardware does not permit instructions longer than 5 bytes.

But in this case, we are allowed to use move i:4, xhex $00a0_0000. This form only requires 3 bytes—a significant reduction in size. XHEX compresses the number $00a0_0000 into a single byte $5a.

Encoding scheme

How does it work? A 32-bit number can be represented as XHEX in two circumstances:

  1. Duplicated 0s: if seven of the eight hex digits have the value 0 (for example $00a0_0000); or
  2. Duplicated Fs: if seven of the eight hex digits have the value f (for example $ffff_8fff)

The interesting digit becomes the right nibble of the encoded byte. In our example $00a0_0000, the interesting digit is a.

The left nibble indicates the position of the interesting digit. There are five 0s after a in $00a0_0000, so the left nibble is 5. Thus, $00a0_0000 becomes $5a.

What about $ffff_8fff? With duplicated 0s, $0000_8000 would be encoded as $38. To indicate duplicated Fs, we set the high bit by adding $80. Thus, $38 + $80 gives $b8 as the encoding for $ffff_8fff.

Notes

Here are some more examples:

  • xhex $c000_0000 = $7c
  • xhex $0000_0006 = $06
  • xhex $0030_0000 = $53
  • xhex $ff3f_ffff = $d3
  • xhex $0000_0000 = $00
  • xhex $ffff_ffff = $8f

Some numbers have more than one representation. For example, $8f and $ff both decode to $ffff_ffff.