Skip to main content

Unsafe operator

The unsafe() operator enables arrays to be indexed faster by skipping their bounds check, but with potential risks if used incorrectly.

When computer programs have bugs, usually they are straightforward logic mistakes or incorrect data states. Memory corruption is a different category of bug: if a program's heap memory becomes corrupted somehow, the program may perform thousands or millions of operations, with no visible problem until it finally tries to access the incorrect memory. At that point, it is very difficult to work backwards to determine how the memory became corrupted. Also, the failure may be difficult to reproduce; if you run the program a second time, the memory may get allocated differently, and now the crash happens in a completely different place, or not at all.

The Hybrix language is memory-safe. In other words, unless you are intentionally manipulating memory (for example, using kernel::set_memory_bytes()), your code is very unlikely to cause memory corruption. But these safety checks are not free. In the arrays section, we discussed bounds checks that prevent a program from writing beyond the bounds of an array.

Array bounds checks

Let's examine how that works. Here is a program that assigns a value to an array:

module main
func example(i: int, array: int[])
array[i] <- 123
end func
end module

The array[i] <- 123 statement gets compiled to Chombit assembly language like this:

c0_2484: 46 00 f0          move i:0, i:-16
c0_2487: 61 01 f4 load unsigned i:4, trio [i:-12]
c0_248a: 9b 00 01 compare i:0, i:4
c0_248d: 09 if not less unsigned
c0_248e: 21 13 fail 19
c0_2490: 72 00 02 shift left i:0, 2
c0_2493: 8d 00 f4 add i:0, i:-12
c0_2496: c9 00 03 7b store [i:0 + 3], int 123

How it works:

  • load unsigned i:4, trio [i:-12] reads the array size
  • compare i:0, i:4 checks i to see whether it's less than size.
  • The unsigned test also ensures that i is not negative.
  • If i is outside the array bounds, then fail 19 reports Array index out of bounds

unsafe() array indexer

The bounds check comprises roughly half of the emitted code. If a loop is processing thousands of array items, we could potentially double its speed by skipping these checks. The unsafe() operator provides a way to accomplish that:

module main
func example(i: int, array: int[])
unsafe(array)[i] <- 123
end func
end module

Now the output reduces to:

c0_2484: 46 00 f0          move i:0, i:-16
c0_2487: 72 00 02 shift left i:0, 2
c0_248a: 8d 00 f4 add i:0, i:-12
c0_248d: c9 00 03 7b store [i:0 + 3], int 123

When the index is a literal value, the savings are even more dramatic:

module main
func example(array: int[])
array[3] <- 123
end func
end module

The code for array[3] <- 123:

c0_2500: 61 00 f4          load unsigned i:0, trio [i:-12]
c0_2503: 97 00 03 compare i:0, 3
c0_2506: 0b if not greater unsigned
c0_2507: 21 13 fail 19
c0_2509: c9 f4 0f 7b store [i:-12 + 15], int 123

With unsafe(), it reduces to just one assembly instruction:

module main
func example(array: int[])
unsafe(array)[3] <- 123
end func
end module
c0_24fe: c9 f4 0f 7b       store [i:-12 + 15], int 123

When to use unsafe()

As a program becomes more complex, memory corruption becomes one of the most frustrating problems you'll ever have to debug. Therefore, unsafe() should not be used casually. It should only be used when two things are true:

You've measured the program and observed that unsafe() actually provides a meaningful speed improvement.
A person who reads your code can easily prove that the bounds check is not necessary.

The buffer class below provides an example:

class buffer
var _items: byte[]
view var count: int

constructor()
._items <- new byte[](100)
end constructor

func append(item: byte)
var count: int
count <- .count
if count >= 100 then
kernel::fail("out of bounds")
end if
unsafe(._items)[count] <- item
.count <- count + 1
end func

func get[](i: int): byte
if i < 0 or i >= .count then
kernel::fail("out of bounds")
end if
var result: byte
result <- unsafe(._items)[i]
return result
end func
end class

Correctness proof:

  1. The _items variable uses the private _ prefix convention, so we don't expect it to be accessed by code outside this buffer class.
  2. The view var declaration guards count from being modified by code outside buffer.
  3. Therefore, our proof only needs to audit this one buffer class.
  4. The array is allocated with exactly 100 elements.
  5. The append() function rejects attempts to add more than 100 elements, ensuring that count is always in the range 0 - 100.
  6. Therefore, append() does not need a bounds check for ._items[count]
  7. The get[] function ensures i ≥ 0 and i < count. Since we already proved count is in the range 0 - 100, ._items[i] does not need a bounds check either.