Arrays
Many kinds of data can be represented by variables that store just one value such as an INT or BOOL. But sometimes we need to manage a list of multiple things. We call such a list an array, which contains things called elements, that are accessed using numbers called indexes. The first array element has index 0. Indexes are an important feature, since the index can be computed, for example allowing a LOOP to process each element one by one.
There are two kinds of arrays:
-
Fixed-size arrays always have the same number of elements. For example, a list of walls (0=north, 1=south, 2=east, 3=west).
-
Variable-size arrays have a counter (
SIZE) that keeps track of the number of elements in the array. They also have a maximum size (MAX) based on the way memory is managed by Hybrix. A variable-size array could be useful for tracking a list of words. The list gets longer each time we add another word.
Fixed-size arrays
Suppose we want to play Tic Tac Toe. The game board will have 9 cells, and each cell can have an X or O or blank. We could represent this using 9 variables:
MODULE MAIN
# TIC TAC TOE CELLS. THE VALUES ARE: 0="O", 1="X", -1=BLANK
VAR CELL_0: INT
VAR CELL_1: INT
VAR CELL_2: INT
VAR CELL_3: INT
VAR CELL_4: INT
VAR CELL_5: INT
VAR CELL_6: INT
VAR CELL_7: INT
VAR CELL_8: INT
FUNC START()
# INITIALLY, MAKE ALL THE CELLS BLANK
-1 --> MAIN::CELL_0
-1 --> MAIN::CELL_1
-1 --> MAIN::CELL_2
-1 --> MAIN::CELL_3
-1 --> MAIN::CELL_4
-1 --> MAIN::CELL_5
-1 --> MAIN::CELL_6
-1 --> MAIN::CELL_7
-1 --> MAIN::CELL_8
END FUNC
END MODULE
It is awkward to manage so many variables, though. Instead, let's replace it with one variable that is an array of 9 integers. Its Hybrix type will be INT[SIZE 9] instead of INT:
MODULE MAIN
# AN ARRAY OF TIC TAC TOE CELLS. THE VALUES ARE: 0="O", 1="X", -1=BLANK
VAR CELLS: INT[SIZE 9]
FUNC START()
# CREATE A NEW ARRAY AND ASSIGN IT TO "MAIN::CELLS".
NEW INT[SIZE 9]() --> MAIN::CELLS
# MARK ALL THE CELLS AS BLANK
-1 --> MAIN::CELLS[0]
-1 --> MAIN::CELLS[1]
-1 --> MAIN::CELLS[2]
-1 --> MAIN::CELLS[3]
-1 --> MAIN::CELLS[4]
-1 --> MAIN::CELLS[5]
-1 --> MAIN::CELLS[6]
-1 --> MAIN::CELLS[7]
-1 --> MAIN::CELLS[8]
END FUNC
END MODULE
Let's shorten this code by replacing the index numbers with a variable I:
MODULE MAIN
# AN ARRAY OF CELLS
VAR CELLS: INT[SIZE 9]
FUNC START()
VAR I: INT
# CREATE A NEW ARRAY AND ASSIGN IT TO "MAIN::CELLS".
NEW INT[SIZE 9]() --> MAIN::CELLS
0 -> I
LOOP
-1 --> MAIN::CELLS[I]
IF I >= 8
DO DROP
I + 1 -> I
END LOOP
END FUNC
END MODULE
Important points:
- The type
INT[SIZE 9]indicates an array with nineINTelements. - The elements of an array must always have the same type.
- The
MAIN::CELLSvariable is really an array pointer, whose value is initiallyNULL. - If you assign this pointer to another variable, the array is not copied; instead, both variables will point to the same array.
- We create the array object (on the heap) by calling
NEW INT[SIZE 9]().NEWalways makes memory blocks that are all zeroes, thus the array elements start out as 0. - To read or write an element of the array, we use the
[]operator with an array index number. For example:MAIN::CELLS[1]. - The indexes start from
0, not1. - Thus, an error will occur if you try to access
MAIN::CELLS[9], because that is actually the tenth element. - Since
Ican change, each time we accessCELLS[I], the compiled program will checkIto make sure it is less than 9. - This check depends on the type, therefore you cannot mix types:
MODULE MAIN
FUNC START(A: INT[SIZE 8])
VAR B: INT[SIZE 9]
# ERROR: INT[SIZE 8] CANNOT BE ASSIGNED TO INT[SIZE 9]
A -> B
END FUNC
END MODULE
A thought experiment
Let's think about how variable-size arrays might work. If you imagine how an INT is stored in the computer's memory, it is four bytes, one after another. Suppose our INT value is $1122_3344 in hexadecimal, and its memory address is $10_1202. Those 4 bytes might look like this:
INT representation:
$10_1200: 00 00 11 22 33 44 00 00 00 00 00 00 00 00 00 00
How would we store an array of three INT values? Let's say the other elements are $5555_5555 and $6666_6666. We could just place them immediately afterwards, like this:
INT[SIZE 3] fixed-size representation:
$10_1200: 00 00 11 22 33 44 55 55 55 55 66 66 66 66 00 00
The memory block size would then be 12 bytes: three array elements, with 4 bytes for each INT. This is how the Hybrix compiler represents INT[SIZE 3].
It's a good start, but what if we later need to add a fourth array element? The Hybrix memory manager does not provide a way to resize a memory block; you must instead create a new one. Adding a new element would be rather expensive:
A costly way to add an element
- We start with our array of 3
INTelements, and are going to resize it to hold 4 elements - Allocate a new memory block with space for 4 array elements.
- Copy the 3 elements from the old array to the new array.
- Free the old array.
During step 3, both memory blocks must coexist, so we've actually allocated 12 + 16 = 28 bytes total. This is very wasteful. Imagine if we needed to do all this for an array containing 10,000 elements!
A better design
The way to avoid this problem is to allocate extra memory, before we need it. For example, we could make our array big enough to hold 10 INT elements, even though we initially will only use 3 of them. That way, the expensive copying is only needed occasionally, when we need to add an 11th element (in which case we could double the size to be 20).
With this design, Hybrix arrays must now track two additional pieces of information:
-
ARRAY.MAX: The maximum number of elements that will fit in the array. We don't need to store this number directly. It can be calculated from the size of the memory block, which is already tracked by the memory manager. -
ARRAY.SIZE: The number of elements being used.SIZEstarts out as0, and it can never be larger thanMAX. In our example, increasing the size from 3 to 4 is very cheap, simply changing theSIZEnumber (as long as4is not larger thanMAX). Behind the scenes, the compiler storesSIZEas a three byteTRIOat the start of the array.
For completeness, here is a memory block starting at $10_1202, currently storing 3 INT values, whose MAX is 5:
INT[] variable-size representation (SIZE=3, MAX=5):
$10_1200: 00 00 00 00 03 11 22 33 44 55 55 55 55 66 66
$10_1210: 66 66 00 00 00 00 00 00 00 00 00 00 00 00 00
Variable-size arrays
The example below illustrates the basics for working with variable-sized arrays:
MODULE MAIN
# AN ARRAY OF ITEMS THE PLAYER HAS COLLECTED.
# 101=BOOK, 102=PENCIL, 103=KEY, 104=RING
VAR ITEMS: INT[]
FUNC START()
# CREATE A NEW ARRAY AND ASSIGN IT TO "MAIN::ITEMS".
# THE "10" PARAMETER SPECIFIES THE MAX SIZE OF THE ARRAY.
# THE SIZE WILL INITIALLY BE ZERO
NEW INT[](10) --> MAIN::ITEMS
# CHANGE THE SIZE TO BE 3
MAIN::ITEMS.RESIZE(3)
# ASSIGN THE FIRST 3 ELEMENTS, WHOSE INDEXES ARE 0, 1, AND 2
103 --> MAIN::ITEMS[0]
101 --> MAIN::ITEMS[1]
104 --> MAIN::ITEMS[2]
END FUNC
END MODULE
Important points:
- The "array of
INT" type is written asINT[]. - The
MAIN::ITEMSvariable is really an array pointer, whose value is initiallyNULL. - We create the array object (on the heap) by calling
NEW INT[](10). - The
10in parentheses specifies the maximum number of elements (MAIN::ITEMS.MAX) for this array. - To set the actual count of elements (
MAIN::ITEMS.SIZE), we can call theRESIZE()member. - The elements added or removed by
RESIZE()will automatically be set to all zeros. - To read or write an element of the array, we use the
[]operator with an array index number. For example:MAIN::ITEMS[1]. - The indexes start from
0, not1. - Thus, an error will occur if you try to access
MAIN::ITEMS[3], because that is actually the fourth element, when theSIZEis currently3.
Reallocating while adding
Here's a function ADD_ITEM() that adds new items to the end of the array:
MODULE MAIN
# AN ARRAY OF ITEMS THE PLAYER HAS COLLECTED.
# 101=BOOK, 102=PENCIL, 103=KEY, 104=RING
VAR ITEMS: INT[]
# ADD AN ITEM TO THE END, RESIZING IF NEEDED
FUNC ADD_ITEM(ID: INT)
VAR I: INT
MAIN::ITEMS.SIZE -> I
MAIN::ITEMS.RESIZE(I + 1)
ID -> MAIN::ITEMS[I]
END FUNC
END MODULE
And here's a more advanced version, that reallocates the array whenever SIZE exceeds MAX:
MODULE MAIN
# AN ARRAY OF ITEMS THE PLAYER HAS COLLECTED.
# 101=BOOK, 102=PENCIL, 103=KEY, 104=RING
VAR ITEMS: INT[]
# ADD AN ITEM TO THE END, RESIZING IF NEEDED
FUNC ADD_ITEM(ID: INT)
VAR I: INT, L: INT, OLD_ITEMS: INT[]
MAIN::ITEMS.SIZE -> I
IF I >= MAIN::ITEMS.MAX THEN
# WE NEED TO REALLOCATE THE ARRAY
MAIN::ITEMS -> OLD_ITEMS
# DOUBLE THE MAXIMUM SIZE
NEW INT[](MAIN::ITEMS.MAX*2) -> MAIN::ITEMS
# COPY FROM THE OLD ARRAY TO THE NEW ARRAY
MAIN::ITEMS.RESIZE(I + 1)
0 -> L
LOOP
IF L >= I
DO DROP
OLD_ITEMS[L] --> MAIN::ITEMS[L]
L + 1 -> L
END LOOP
ELSE
MAIN::ITEMS.RESIZE(I + 1)
END IF
ID -> MAIN::ITEMS[I]
END FUNC
END MODULE