Skip to main content

Arrays

Many kinds of data can be represented by variables that store just one value such as an int or bool. But what if you need to manage hundreds of items? We represent them as an array, which contains elements that are accessed using numbers called indexes. The array has a size which determines the total number of elements. The first element has index 0, and the last element has index size - 1.

An element's index is similar to a variable's name, except that we can compute an index. For example, if i is an int whose value is 3, then my_array[i] refers to my_array[3] (which is actually the fourth element when counting from 0). What if i is too big or too small? To avoid memory corruption, the Hybrix runtime performs a bounds check, reporting an Array index out of bounds failure.

Hybrix's two kinds of arrays are distinguished by their bounds checks:

  • Static arrays specify the array size as part of the data type. For example, if an array five_values contains five pair values, we declare it as var five_values: pair[size 5]. Since the size is known, the bounds check is cheap. If the index is a literal number (for example five_values[3]), then the compiler can eliminate the bounds check entirely.

    The downside is that func f(five_values: pair[size 5]) can only accept arrays of a specific size.

  • Dynamic arrays solve that problem. The dynamic array declaration does not specify a size, for example var values: pair[]. Now we can declare func f(values: pair[]) that reads values.size to determine the size at runtime.

    This flexibility complicates the bounds checks, however. For example, values[i] costs 12 CPU cycles versus 4 CPU cycles for five_values[i]. Also, values requires some extra memory to store values.size. Normally these costs are negligible, but for code with lots of array operations, static arrays sometimes are significantly more efficient.

Static arrays​

Suppose we want to play Tic Tac Toe. The game board will have 9 cells, and each cell can have an x or o or blank. We could represent this using 9 variables:

module main
# Tic Tac Toe cells. The values are: 0="O", 1="X", -1=blank
var cell_0: int
var cell_1: int
var cell_2: int
var cell_3: int
var cell_4: int
var cell_5: int
var cell_6: int
var cell_7: int
var cell_8: int

func start()
# Initially, make all the cells blank
-1 -> main::cell_0 # (in the debugger, this line takes 3 CPU cycles)
-1 -> main::cell_1
-1 -> main::cell_2
-1 -> main::cell_3
-1 -> main::cell_4
-1 -> main::cell_5
-1 -> main::cell_6
-1 -> main::cell_7
-1 -> main::cell_8
end func
end module

It is awkward to manage so many variables, though. Instead, let's replace it with one variable that is an array of 9 integers. Its Hybrix type will be int[size 9] instead of int:

module main
# An array of Tic Tac Toe cells. The values are: 0="O", 1="X", -1=blank
var cells: int[size 9]

func start()
# Create a new array and assign it to main::cells
new int[size 9]() -> main::cells

# Mark all the cells as blank
-1 -> main::cells[0] # (in the debugger, this line takes 7 CPU cycles)
-1 -> main::cells[1]
-1 -> main::cells[2]
-1 -> main::cells[3]
-1 -> main::cells[4]
-1 -> main::cells[5]
-1 -> main::cells[6]
-1 -> main::cells[7]
-1 -> main::cells[8]
end func
end module

Let's shorten this code by computing the index numbers using a variable i:

module main
# An array of Tic Tac Toe cells. The values are: 0="O", 1="X", -1=blank
var cells: int[size 9]

func start()
var i: int

# Create a new array and assign it to main::cells
new int[size 9]() -> main::cells

0 -> i
loop
if i >= 9
do drop

-1 -> main::cells[i] # (in the debugger, this line takes 20 CPU cycles)
i + 1 -> i
end loop
end func
end module

Important points:

  • The type int[size 9] indicates an array with nine int elements.

  • The elements of an array must always have the same type.

  • The main::cells variable is a pointer to the array object. Its value is initially null.

  • If you assign this pointer to another variable, the array is not copied; instead, both variables will point to the same array.

  • We create the array object (on the heap) by calling new int[size 9](). new always makes memory blocks that are all zeroes, thus the array elements start out as 0.

  • To read or write an element of the array, we use the [ ] operator to provide the array index. For example: main::cells[1] or main::cells[i].

  • The indexes start from 0, not 1.

  • Thus, an error will occur if you try to access main::cells[9], because that is actually the tenth element.

  • Since i can change, each time we access cells[i], the compiled program will perform a bounds check to make sure i is less than 9.

  • For static arrays, the bounds check depends on the type. You cannot mix types:

    module main
    func example(a: int[size 8])
    var b: int[size 9]

    # ERROR: int[size 8] cannot be assigned to int[size 9]
    a -> b
    end func
    end module

Memory-mapped arrays: From a different perspective, a bigger motivation for static arrays is to represent memory-mapped I/O data structures. If you want to declare an array as inset, it must be a static array.

Dynamic arrays​

Now, suppose we want to write a function that can operate on arrays of any size:

module main
# An array of Tic Tac Toe cells. The values are: 0="O", 1="X", -1=blank
var cells: int[]

func create_board(total_cells: int)
var i: int

# Create a new array and assign it to main::cells
new int[](total_cells) -> main::cells

0 -> i
loop
if i >= total_cells
do drop

-1 -> main::cells[i] # (in the debugger, this line takes 24 CPU cycles)
i + 1 -> i
end loop
end func

func start()
# Now we can make boards of any size
main::create_board(9)
end func
end module

Differences from static arrays:

  • The "array of int" type is now written as int[].
  • We create the array object (on the heap) by calling new int[](total_cells).
  • The total_cells in parentheses specifies the number of elements (main::cells.size) for the new array. For dynamic arrays, the size is now a parameter passed to new.
  • If you step through the code using the Hybrix debugger, -1 -> main::cells[i] now requires 24 CPU cycles due to its more complicated bounds check, compared to 3 CPU cycles in our first example.
  • You'll also see that new int[](total_cells) allocates 44 heap bytes, whereas new int[size 9]() only required 40 heap bytes. That extra memory stores 3 bytes for the array size (see below), plus one byte because heap block sizes are always rounded to multiples of 4.
  • In this example, a few extra bytes and CPU cycles won't matter. It's a small price for the added flexibility of dynamic arrays.
  • The array size is sometimes referred to as its "length", but the meaning is the same.

Array storage​

Let's think about how arrays are represented in memory. A single int consists of four bytes, one after another. Suppose our int value is $1122_3344 in hexadecimal, and its memory address is $10_1200. Those 4 bytes might look like this:

int representation:

$10_1200: 11 22 33 44 00 00 00 00   00 00 00 00 00 00 00 00
$10_1210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Now suppose we have a static array of three int values:

module main
func start()
var static_array: int[size 3]
static_array <- new int[size 3]()

static_array[0] <- $1122_3344
static_array[1] <- $5555_5555
static_array[2] <- $6666_6666
end func
end module

The memory block size will be 12 bytes: three array elements, with 4 bytes for each int. This is how the Hybrix compiler represents int[size 3]:

int[size 3] static array representation:

$10_1200: 11 22 33 44 55 55 55 55   66 66 66 66 00 00 00 00
$10_1210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Now let's compare with a dynamic array:

module main
func start()
var dynamic_array: int[]
dynamic_array <- new int[](3)

dynamic_array[0] <- $1122_3344
dynamic_array[1] <- $5555_5555
dynamic_array[2] <- $6666_6666
end func
end module

How is dynamic_array.size stored? It is stored as a 24-bit unsigned trio value. (Array sizes always fit in three bytes, since the virtual computer only has 256 kilobytes of RAM.)

Here is a dynamic array starting at $10_1200, currently storing 3 int values:

int[] dynamic array representation (size=3):

$10_1200: 00 00 03 11 22 33 44 55   55 55 55 66 66 66 66 00
$10_1210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Resizing an array​

What if a program has filled an array up with items, but now needs to add another? We cannot modify size directly; instead, we must allocate a new array and copy the old elements into it. If we do this repeatedly in a loop, it can be very expensive, creating short-lived heap blocks for the garbage collector to clean up. Repeated reallocations can be avoided by allocating extra array elements, using a separate variable such as count to track the number of elements that are actually in use.

The example below increases _items.size in batches of 8:

class list_string
var _items: string[]
view var count: int

constructor()
._items <- new string[](4)
end constructor

func _ensure_size(new_size: int)
var size: int
size <- ._items.size

if new_size > size then
# round up
new_size <- math::bit_and(new_size + 7, math::bit_not(7))

var new_items: string[]
new_items <- new string[](new_size)

# Copy the elements from the old array to the new array
new_items.copy_from(0, ._items, 0, ._items.size)

._items <- new_items
end if
end func

func append(item: string)
var count: int
count <- .count

var new_count: int
new_count <- count + 1

._ensure_size(new_count)

._items[count] <- item
.count <- new_count
end func
end class