Arrays
Many kinds of data can be represented by variables that store just one value such as an int or bool. But what if you need to manage hundreds of items? We represent them as an array, which contains elements that are accessed using numbers called indexes. The array has a size which determines the total number of elements. The first element has index 0, and the last element has index size - 1.
An element's index is similar to a variable's name, except that we can compute an index. For example, if i is an int whose value is 3, then my_array[i] refers to my_array[3] (which is actually the fourth element when counting from 0). What if i is too big or too small? To avoid memory corruption, the Hybrix runtime performs a bounds check, reporting an Array index out of bounds failure.
Hybrix's two kinds of arrays are distinguished by their bounds checks:
-
Static arrays specify the array size as part of the data type. For example, if an array
five_valuescontains fivepairvalues, we declare it asvar five_values: pair[size 5]. Since the size is known, the bounds check is cheap. If the index is a literal number (for examplefive_values[3]), then the compiler can eliminate the bounds check entirely.The downside is that
func f(five_values: pair[size 5])can only accept arrays of a specific size. -
Dynamic arrays solve that problem. The dynamic array declaration does not specify a size, for example
var values: pair[]. Now we can declarefunc f(values: pair[])that readsvalues.sizeto determine the size at runtime.This flexibility complicates the bounds checks, however. For example,
values[i]costs 12 CPU cycles versus 4 CPU cycles forfive_values[i]. Also,valuesrequires some extra memory to storevalues.size. Normally these costs are negligible, but for code with lots of array operations, static arrays sometimes are significantly more efficient.
Static arrays​
Suppose we want to play Tic Tac Toe. The game board will have 9 cells, and each cell can have an x or o or blank. We could represent this using 9 variables:
module main
# Tic Tac Toe cells. The values are: 0="O", 1="X", -1=blank
var cell_0: int
var cell_1: int
var cell_2: int
var cell_3: int
var cell_4: int
var cell_5: int
var cell_6: int
var cell_7: int
var cell_8: int
func start()
# Initially, make all the cells blank
-1 -> main::cell_0 # (in the debugger, this line takes 3 CPU cycles)
-1 -> main::cell_1
-1 -> main::cell_2
-1 -> main::cell_3
-1 -> main::cell_4
-1 -> main::cell_5
-1 -> main::cell_6
-1 -> main::cell_7
-1 -> main::cell_8
end func
end module
It is awkward to manage so many variables, though. Instead, let's replace it with one variable that is an array of 9 integers. Its Hybrix type will be int[size 9] instead of int:
module main
# An array of Tic Tac Toe cells. The values are: 0="O", 1="X", -1=blank
var cells: int[size 9]
func start()
# Create a new array and assign it to main::cells
new int[size 9]() -> main::cells
# Mark all the cells as blank
-1 -> main::cells[0] # (in the debugger, this line takes 7 CPU cycles)
-1 -> main::cells[1]
-1 -> main::cells[2]
-1 -> main::cells[3]
-1 -> main::cells[4]
-1 -> main::cells[5]
-1 -> main::cells[6]
-1 -> main::cells[7]
-1 -> main::cells[8]
end func
end module
Let's shorten this code by computing the index numbers using a variable i:
module main
# An array of Tic Tac Toe cells. The values are: 0="O", 1="X", -1=blank
var cells: int[size 9]
func start()
var i: int
# Create a new array and assign it to main::cells
new int[size 9]() -> main::cells
0 -> i
loop
if i >= 9
do drop
-1 -> main::cells[i] # (in the debugger, this line takes 20 CPU cycles)
i + 1 -> i
end loop
end func
end module
Important points:
-
The type
int[size 9]indicates an array with nineintelements. -
The elements of an array must always have the same type.
-
The
main::cellsvariable is a pointer to the array object. Its value is initiallynull. -
If you assign this pointer to another variable, the array is not copied; instead, both variables will point to the same array.
-
We create the array object (on the heap) by calling
new int[size 9]().newalways makes memory blocks that are all zeroes, thus the array elements start out as 0. -
To read or write an element of the array, we use the
[]operator to provide the array index. For example:main::cells[1]ormain::cells[i]. -
The indexes start from
0, not1. -
Thus, an error will occur if you try to access
main::cells[9], because that is actually the tenth element. -
Since
ican change, each time we accesscells[i], the compiled program will perform a bounds check to make sureiis less than 9. -
For static arrays, the bounds check depends on the type. You cannot mix types:
module main
func example(a: int[size 8])
var b: int[size 9]
# ERROR: int[size 8] cannot be assigned to int[size 9]
a -> b
end func
end module
Memory-mapped arrays: From a different perspective, a bigger motivation for static arrays is to represent memory-mapped I/O data structures. If you want to declare an array as inset, it must be a static array.
Dynamic arrays​
Now, suppose we want to write a function that can operate on arrays of any size:
module main
# An array of Tic Tac Toe cells. The values are: 0="O", 1="X", -1=blank
var cells: int[]
func create_board(total_cells: int)
var i: int
# Create a new array and assign it to main::cells
new int[](total_cells) -> main::cells
0 -> i
loop
if i >= total_cells
do drop
-1 -> main::cells[i] # (in the debugger, this line takes 24 CPU cycles)
i + 1 -> i
end loop
end func
func start()
# Now we can make boards of any size
main::create_board(9)
end func
end module
Differences from static arrays:
- The "array of
int" type is now written asint[]. - We create the array object (on the heap) by calling
new int[](total_cells). - The
total_cellsin parentheses specifies the number of elements (main::cells.size) for the new array. For dynamic arrays, the size is now a parameter passed tonew. - If you step through the code using the Hybrix debugger,
-1 -> main::cells[i]now requires 24 CPU cycles due to its more complicated bounds check, compared to 3 CPU cycles in our first example. - You'll also see that
new int[](total_cells)allocates 44 heap bytes, whereasnew int[size 9]()only required 40 heap bytes. That extra memory stores 3 bytes for the arraysize(see below), plus one byte because heap block sizes are always rounded to multiples of 4. - In this example, a few extra bytes and CPU cycles won't matter. It's a small price for the added flexibility of dynamic arrays.
- The array size is sometimes referred to as its "length", but the meaning is the same.
Array storage​
Let's think about how arrays are represented in memory. A single int consists of four bytes, one after another. Suppose our int value is $1122_3344 in hexadecimal, and its memory address is $10_1200. Those 4 bytes might look like this:
int representation:
$10_1200: 11 22 33 44 00 00 00 00 00 00 00 00 00 00 00 00
$10_1210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Now suppose we have a static array of three int values:
module main
func start()
var static_array: int[size 3]
static_array <- new int[size 3]()
static_array[0] <- $1122_3344
static_array[1] <- $5555_5555
static_array[2] <- $6666_6666
end func
end module
The memory block size will be 12 bytes: three array elements, with 4 bytes for each int. This is how the Hybrix compiler represents int[size 3]:
int[size 3] static array representation:
$10_1200: 11 22 33 44 55 55 55 55 66 66 66 66 00 00 00 00
$10_1210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Now let's compare with a dynamic array:
module main
func start()
var dynamic_array: int[]
dynamic_array <- new int[](3)
dynamic_array[0] <- $1122_3344
dynamic_array[1] <- $5555_5555
dynamic_array[2] <- $6666_6666
end func
end module
How is dynamic_array.size stored? It is stored as a 24-bit unsigned trio value. (Array sizes always fit in three bytes, since the virtual computer only has 256 kilobytes of RAM.)
Here is a dynamic array starting at $10_1200, currently storing 3 int values:
int[] dynamic array representation (size=3):
$10_1200: 00 00 03 11 22 33 44 55 55 55 55 66 66 66 66 00
$10_1210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Resizing an array​
What if a program has filled an array up with items, but now needs to add another? We cannot modify size directly; instead, we must allocate a new array and copy the old elements into it. If we do this repeatedly in a loop, it can be very expensive, creating short-lived heap blocks for the garbage collector to clean up. Repeated reallocations can be avoided by allocating extra array elements, using a separate variable such as count to track the number of elements that are actually in use.
The example below increases _items.size in batches of 8:
class list_string
var _items: string[]
view var count: int
constructor()
._items <- new string[](4)
end constructor
func _ensure_size(new_size: int)
var size: int
size <- ._items.size
if new_size > size then
# round up
new_size <- math::bit_and(new_size + 7, math::bit_not(7))
var new_items: string[]
new_items <- new string[](new_size)
# Copy the elements from the old array to the new array
new_items.copy_from(0, ._items, 0, ._items.size)
._items <- new_items
end if
end func
func append(item: string)
var count: int
count <- .count
var new_count: int
new_count <- count + 1
._ensure_size(new_count)
._items[count] <- item
.count <- new_count
end func
end class