Skip to main content

Arrays

Many kinds of data can be represented by variables that store just one value such as an INT or BOOL. But sometimes we need to manage a list of multiple things. We call such a list an array, which contains things called elements, that are accessed using numbers called indexes. The first array element has index 0. Indexes are an important feature, since the index can be computed, for example allowing a LOOP to process each element one by one.

There are two kinds of arrays:

  • Fixed-size arrays always have the same number of elements. For example, a list of walls (0=north, 1=south, 2=east, 3=west).

  • Variable-size arrays have a counter (SIZE) that keeps track of the number of elements in the array. They also have a maximum size (MAX) based on the way memory is managed by Hybrix. A variable-size array could be useful for tracking a list of words. The list gets longer each time we add another word.

Fixed-size arrays

Suppose we want to play Tic Tac Toe. The game board will have 9 cells, and each cell can have an X or O or blank. We could represent this using 9 variables:

MODULE MAIN
# TIC TAC TOE CELLS. THE VALUES ARE: 0="O", 1="X", -1=BLANK
VAR CELL_0: INT
VAR CELL_1: INT
VAR CELL_2: INT
VAR CELL_3: INT
VAR CELL_4: INT
VAR CELL_5: INT
VAR CELL_6: INT
VAR CELL_7: INT
VAR CELL_8: INT

FUNC START()
# INITIALLY, MAKE ALL THE CELLS BLANK
-1 --> MAIN::CELL_0
-1 --> MAIN::CELL_1
-1 --> MAIN::CELL_2
-1 --> MAIN::CELL_3
-1 --> MAIN::CELL_4
-1 --> MAIN::CELL_5
-1 --> MAIN::CELL_6
-1 --> MAIN::CELL_7
-1 --> MAIN::CELL_8
END FUNC
END MODULE

It is awkward to manage so many variables, though. Instead, let's replace it with one variable that is an array of 9 integers. Its Hybrix type will be INT[SIZE 9] instead of INT:

MODULE MAIN
# AN ARRAY OF TIC TAC TOE CELLS. THE VALUES ARE: 0="O", 1="X", -1=BLANK
VAR CELLS: INT[SIZE 9]

FUNC START()
# CREATE A NEW ARRAY AND ASSIGN IT TO "MAIN::CELLS".
NEW INT[SIZE 9]() --> MAIN::CELLS

# MARK ALL THE CELLS AS BLANK
-1 --> MAIN::CELLS[0]
-1 --> MAIN::CELLS[1]
-1 --> MAIN::CELLS[2]
-1 --> MAIN::CELLS[3]
-1 --> MAIN::CELLS[4]
-1 --> MAIN::CELLS[5]
-1 --> MAIN::CELLS[6]
-1 --> MAIN::CELLS[7]
-1 --> MAIN::CELLS[8]
END FUNC
END MODULE

Let's shorten this code by replacing the index numbers with a variable I:

MODULE MAIN
# AN ARRAY OF CELLS
VAR CELLS: INT[SIZE 9]

FUNC START()
VAR I: INT

# CREATE A NEW ARRAY AND ASSIGN IT TO "MAIN::CELLS".
NEW INT[SIZE 9]() --> MAIN::CELLS

0 -> I
LOOP
-1 --> MAIN::CELLS[I]
IF I >= 8
DO DROP
I + 1 -> I
END LOOP
END FUNC
END MODULE

Important points:

  • The type INT[SIZE 9] indicates an array with nine INT elements.
  • The elements of an array must always have the same type.
  • The MAIN::CELLS variable is really an array pointer, whose value is initially NULL.
  • If you assign this pointer to another variable, the array is not copied; instead, both variables will point to the same array.
  • We create the array object (on the heap) by calling NEW INT[SIZE 9](). NEW always makes memory blocks that are all zeroes, thus the array elements start out as 0.
  • To read or write an element of the array, we use the [ ] operator with an array index number. For example: MAIN::CELLS[1].
  • The indexes start from 0, not 1.
  • Thus, an error will occur if you try to access MAIN::CELLS[9], because that is actually the tenth element.
  • Since I can change, each time we access CELLS[I], the compiled program will check I to make sure it is less than 9.
  • This check depends on the type, therefore you cannot mix types:
MODULE MAIN
FUNC START(A: INT[SIZE 8])
VAR B: INT[SIZE 9]

# ERROR: INT[SIZE 8] CANNOT BE ASSIGNED TO INT[SIZE 9]
A -> B
END FUNC
END MODULE

A thought experiment

Let's think about how variable-size arrays might work. If you imagine how an INT is stored in the computer's memory, it is four bytes, one after another. Suppose our INT value is $1122_3344 in hexadecimal, and its memory address is $10_1202. Those 4 bytes might look like this:

INT representation:

$10_1200: 00 00 11 22 33 44 00 00   00 00 00 00 00 00 00 00

How would we store an array of three INT values? Let's say the other elements are $5555_5555 and $6666_6666. We could just place them immediately afterwards, like this:

INT[SIZE 3] fixed-size representation:

$10_1200: 00 00 11 22 33 44 55 55   55 55 66 66 66 66 00 00

The memory block size would then be 12 bytes: three array elements, with 4 bytes for each INT. This is how the Hybrix compiler represents INT[SIZE 3].

It's a good start, but what if we later need to add a fourth array element? The Hybrix memory manager does not provide a way to resize a memory block; you must instead create a new one. Adding a new element would be rather expensive:

A costly way to add an element

  1. We start with our array of 3 INT elements, and are going to resize it to hold 4 elements
  2. Allocate a new memory block with space for 4 array elements.
  3. Copy the 3 elements from the old array to the new array.
  4. Free the old array.

During step 3, both memory blocks must coexist, so we've actually allocated 12 + 16 = 28 bytes total. This is very wasteful. Imagine if we needed to do all this for an array containing 10,000 elements!

A better design

The way to avoid this problem is to allocate extra memory, before we need it. For example, we could make our array big enough to hold 10 INT elements, even though we initially will only use 3 of them. That way, the expensive copying is only needed occasionally, when we need to add an 11th element (in which case we could double the size to be 20).

With this design, Hybrix arrays must now track two additional pieces of information:

  • ARRAY.MAX: The maximum number of elements that will fit in the array. We don't need to store this number directly. It can be calculated from the size of the memory block, which is already tracked by the memory manager.

  • ARRAY.SIZE: The number of elements being used. SIZE starts out as 0, and it can never be larger than MAX. In our example, increasing the size from 3 to 4 is very cheap, simply changing the SIZE number (as long as 4 is not larger than MAX). Behind the scenes, the compiler stores SIZE as a three byte TRIO at the start of the array.

For completeness, here is a memory block starting at $10_1202, currently storing 3 INT values, whose MAX is 5:

INT[] variable-size representation (SIZE=3, MAX=5):

$10_1200: 00 00 00 00 03 11 22 33   44 55 55 55 55 66 66
$10_1210: 66 66 00 00 00 00 00 00 00 00 00 00 00 00 00

Variable-size arrays

The example below illustrates the basics for working with variable-sized arrays:

MODULE MAIN
# AN ARRAY OF ITEMS THE PLAYER HAS COLLECTED.
# 101=BOOK, 102=PENCIL, 103=KEY, 104=RING
VAR ITEMS: INT[]

FUNC START()
# CREATE A NEW ARRAY AND ASSIGN IT TO "MAIN::ITEMS".
# THE "10" PARAMETER SPECIFIES THE MAX SIZE OF THE ARRAY.
# THE SIZE WILL INITIALLY BE ZERO
NEW INT[](10) --> MAIN::ITEMS

# CHANGE THE SIZE TO BE 3
MAIN::ITEMS.RESIZE(3)

# ASSIGN THE FIRST 3 ELEMENTS, WHOSE INDEXES ARE 0, 1, AND 2
103 --> MAIN::ITEMS[0]
101 --> MAIN::ITEMS[1]
104 --> MAIN::ITEMS[2]
END FUNC
END MODULE

Important points:

  • The "array of INT" type is written as INT[].
  • The MAIN::ITEMS variable is really an array pointer, whose value is initially NULL.
  • We create the array object (on the heap) by calling NEW INT[](10).
  • The 10 in parentheses specifies the maximum number of elements (MAIN::ITEMS.MAX) for this array.
  • To set the actual count of elements (MAIN::ITEMS.SIZE), we can call the RESIZE() member.
  • The elements added or removed by RESIZE() will automatically be set to all zeros.
  • To read or write an element of the array, we use the [ ] operator with an array index number. For example: MAIN::ITEMS[1].
  • The indexes start from 0, not 1.
  • Thus, an error will occur if you try to access MAIN::ITEMS[3], because that is actually the fourth element, when the SIZE is currently 3.

Reallocating while adding

Here's a function ADD_ITEM() that adds new items to the end of the array:

MODULE MAIN
# AN ARRAY OF ITEMS THE PLAYER HAS COLLECTED.
# 101=BOOK, 102=PENCIL, 103=KEY, 104=RING
VAR ITEMS: INT[]

# ADD AN ITEM TO THE END, RESIZING IF NEEDED
FUNC ADD_ITEM(ID: INT)
VAR I: INT
MAIN::ITEMS.SIZE -> I

MAIN::ITEMS.RESIZE(I + 1)

ID -> MAIN::ITEMS[I]
END FUNC
END MODULE

And here's a more advanced version, that reallocates the array whenever SIZE exceeds MAX:

MODULE MAIN
# AN ARRAY OF ITEMS THE PLAYER HAS COLLECTED.
# 101=BOOK, 102=PENCIL, 103=KEY, 104=RING
VAR ITEMS: INT[]

# ADD AN ITEM TO THE END, RESIZING IF NEEDED
FUNC ADD_ITEM(ID: INT)
VAR I: INT, L: INT, OLD_ITEMS: INT[]
MAIN::ITEMS.SIZE -> I

IF I >= MAIN::ITEMS.MAX THEN
# WE NEED TO REALLOCATE THE ARRAY
MAIN::ITEMS -> OLD_ITEMS

# DOUBLE THE MAXIMUM SIZE
NEW INT[](MAIN::ITEMS.MAX*2) -> MAIN::ITEMS

# COPY FROM THE OLD ARRAY TO THE NEW ARRAY
MAIN::ITEMS.RESIZE(I + 1)
0 -> L
LOOP
IF L >= I
DO DROP
OLD_ITEMS[L] --> MAIN::ITEMS[L]
L + 1 -> L
END LOOP
ELSE
MAIN::ITEMS.RESIZE(I + 1)
END IF

ID -> MAIN::ITEMS[I]
END FUNC
END MODULE