Searching
This part will cover
- Different search methods on 1D arrays:
- Iota
- Iota underbar
- Epsilon
- Epsilon underbar
- Key
Indices of
The most used searching function is the dyadic indices of, which shares its symbol with the index function ⍳
.
Typing the indices of function ⍳
Prefix method: PREFIX i
Tab method: i i Tab
If you want a simple way to find a value in an array, ⍳
is your guy.
On the left, you give it an array to search in; on the right, you give it an array of values to find.
The function will output the indices where your values are located.
3 5 2 1 4 ⍳ 2
3
'asdfghjkl' ⍳ 'agl'
1 5 9
What do you think happens when you try to search for a value that doesn't exist? Pick your guess:
0
- Big number
VALUE ERROR
LENGTH ERROR
If you picked any of the errors, then you think APL is too mean.
After all, why punish the programmer when they are just trying to search for a value?
No, there's a way better way of handling this issue.
If you picked 0
, you're closer, but still a little off.
It could work if zero was never a valid index; however, ⍳
is index-sensitive. Look:
'asdfghjkl' ⍳ 'agl'
1 5 9
⎕IO ← 0
'asdfghjkl' ⍳ 'agl'
0 4 8
⎕IO ← 1
Instead of returning zero (or a negative number), the APL developers decided to be clever and return a number that's one larger than the length of the array.
'asdfghjkl' ⍳ 'agl'
1 5 9
'asdfghjkl' ⍳ 'aglö'
1 5 9 10
This is actually super useful!
Let's say we have two vectors: STUDENTS
contains student names, and GPAS
contains their grades.
For example,
STUDENTS ← 'ALICE' 'BOB' 'CHARLIE' 'DAVID' 'EEVA'
GPAS ← 5.0 3.4 1.0 4.2 4.1
Say we wanted to get to find David's and Charlie's GPAs from the list, but we didn't know what position they were in.
To do this, we can use iota to find their indices, and then use them along with the brackets []
to get the GPAs from the list:
STUDENTS ⍳ 'DAVID' 'CHARLIE'
4 3
GPAS[STUDENTS ⍳ 'DAVID' 'CHARLIE']
4.2 1
Then, if we try to find a user that doesn't exist, we will get an index that's one larger than the largest item. So, we can just add one more "default" item to the GPA list and use that as a "not found" element!
GPAS,'X'
5 3.4 1 4.2 4.1 X
⍴GPAS,'X'
6
(GPAS,'X')[STUDENTS ⍳ 'DAVID' 'CHARLIE']
4.2 1
(GPAS,'X')[STUDENTS ⍳ 'DAVID' 'CHARLIE' 'JULIET']
4.2 1 X
(GPAS,'X')[STUDENTS ⍳ 'DAVID' 'CHARLIE' 'JULIET' 'BOB']
4.2 1 X 3.4
(GPAS,'X')[STUDENTS ⍳ 'DAVID' 'CHARLIE' 'JULIET' 'BOB' 'ROMEO']
4.2 1 X 3.4 X
Handy!
Here's another APL idiom that you should remember: changing one value to another.
NUMS ← 9 4 1 2 0 3
NUMS
9 4 1 2 0 3
NUMS[NUMS⍳4]
4
NUMS[NUMS⍳4]←6
NUMS
9 6 1 2 0 3
So, using NUMS[NUMS⍳4]
just gets you the number you were searching for (4
), but adding an assignment ←
lets you reset whatever was in the position.
Even though iota is handy, it's not perfect. Let's say our array had repeats in it, and we wanted to find them all...
LETTERS ← 'abcdefgaaahijkl'
LETTERS
abcdefgaaahijkl
LETTERS ⍳ 'a'
1
LETTERS ⍳ 'aa'
1 1
LETTERS ⍳ 'aaa'
1 1 1
Nope. Iota just finds the first occurrence of 'a'
and stops.
We'll learn how to find more than just one value later.
Here's a funny error that can happen with ⍳
:
10 ⍳ 10 11 12
RANK ERROR
10⍳10 11 12
∧
You could think that this should return 1 2 2
(the first element, and two elements not in the list).
But actually, the number 10 is a scalar and not an array! So you can't search inside it. Rip.
Member of
Dyadic iota ⍳
answers the question "where is it?"; dyadic epsilon ∊
answers the question "is it?"
4 ∊ 1 3 4 5 6
1
4 ∊ 1 3 2 5 6
0
4 3 ∊ 1 3 2 5 6
0 1
4 3 4 3 ∊ 1 3 2 5 6
0 1 0 1
It gives a simple yes/no answer, telling you whether the value is in the array or not.
Typing the member of function ∊
Prefix method: PREFIX e
Tab method: e e Tab
Watch out!
Iota ⍳
takes in the array to search on the left and the value(s) to find on the right.
Epsilon ∊
takes in the array to search on the right and the value(s) to find on the left.
Be careful!
This one's farily straightforward.
Where
Let's try answering the question "where are all the values?" instead of the more boring question "where is the first value?"
For this example, we'll use the vector PRICES
, which has a bunch of prices of items.
First, let's try finding where all the items are which have a price of 40
.
Simple ⍳
won't cut the job, since we saw earlier that it only found the first element.
Let's be more creative: first, we want to find all the positions that have the number 40...
PRICES←12 39 40 10 55 40 73
40=PRICES
0 0 1 0 0 1 0
Then, we want to convert these values to indices. A good strategy is to:
- Get the length of the array (
⍴
) - Get a vector of indices of the same length (
⍳
) - Use the bit mask
0 0 1 0 0 1 0
to select the indices (/
)
Step-by-step:
40=PRICES
0 0 1 0 0 1 0
⍴PRICES
7
⍳⍴PRICES
1 2 3 4 5 6 7
(40=PRICES)/⍳⍴PRICES
3 6
Slashiotarho
Look at the three symbols: slash, iota, and rho. This way of finding where values are is so useful that it has its own name: SLASHIOTARHO!
Chant it to yourself before bed and remember it well.
Slashiotarho is also more powerful than the dyadic ⍳
function we looked at earlier, since we can use it to find any condition we like.
For example, if we wanted to find all prices that are at most 40, we can just change one symbol:
PRICES
12 39 40 10 55 40 73
(40=PRICES)/⍳⍴PRICES
3 6
(40≥PRICES)/⍳⍴PRICES
1 2 3 4 6
Slashiotarho is such a useful function, that the APL developers decided to add a whole new symbol: the SUPER-IOTA (officially called "iota underbar") that can be used to perform the "where" function. This is a monadic function, which takes in a list of ones and zeros on the right and returns the location of all the ones. Let's compare slashiotarho with the where function:
PRICES
12 39 40 10 55 40 73
40=PRICES
0 0 1 0 0 1 0
(40=PRICES)/⍳⍴PRICES
3 6
⍸40=PRICES
3 6
40≥PRICES
1 1 1 1 0 1 0
(40≥PRICES)/⍳⍴PRICES
1 2 3 4 6
⍸40≥PRICES
1 2 3 4 6
Typing the where function ⍸
Prefix method: PREFIX I
Tab method: i _ Tab
It's a little more compact! Feel free to use it when needed.
Interval Index
Sometimes, instead of finding exactly where an element is in an array, you'd like to find what elements it's between. Consider the grading function we defined in chapter 3, write problem 12
GRADE 42
0
GRADE 43
1
GRADE 51
2
GRADE 58
3
GRADE 65
4
Grade 72
5
What this function does is try to find where the right argument ⍵
is, between the grade boundaries 0 43 51 58 65 72
for the grades 0 1 2 3 4 5
.
Another example of this kind of problem is looking for a word in a dictionary, where the page with the word ⍵
is the one where ⍵
is alphabetically between the first word of the page and the first word of the next page.
This problem can be solved with the dyadic ⍸ interval index function. The vector left argument ⍺
specifies the boundaries that the right argument ⍵
is placed between.
43 51 58 65 72 ⍸ 10
0
43 51 58 65 72 ⍸ 64
3
43 51 58 65 72 ⍸ 72
5
'aardvark' 'abrasion' 'accessory' 'acquittance' 'adamantine' ⍸ ⊂'absolute'
2
Find
All of the finding operations that we looked at so far have been to find one element in an array.
APL also has a function to find subarrays: the dyadic find function, which is written using the epsilon-underbar symbol ⍷
.
Typing the find function ⍷
Prefix method: PREFIX E
Tab method: e _ Tab
SENTENCE ← 'Hello, this is elsa from melbourne'
'el' ⍷ SENTENCE
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
In this example, the ones are where each 'el'
starts in the long sentence.
You can also do this with arrays of strings:
ANIMALS ← 'cow' 'horse' 'dog' 'horse' 'cow' 'cat' 'horse' 'cow' 'bat'
'horse' 'cow' ⍷ ANIMALS
0 0 0 1 0 0 1 0 0
These occurences can also overlap:
'ooo' ⍷ 'meoooooooow meooow'
0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0
Key
Last one for this part! The key ⌸ operator provides a way of grouping together elements of an array, and apply an arbitrary function to each group
{⍺ ⍵}⌸'ENTENTE'
┌─┬─────┐
│E│1 4 7│
├─┼─────┤
│N│2 5 │
├─┼─────┤
│T│3 6 │
└─┴─────┘
{⎕←⍺}⌸'ENTENTE'
ENT
{⎕←⍵}⌸'ENTENTE'
1 4 7
2 5
3 6
The left argument function for the key ⌸ operator takes as left argument the group value (in the above, the unique letters E N and T) and as right argument the indices of the elements of that group.
PRIMER ← 'CGCAAATGGGCGGTAGGCGTG'
⍝ Number of each nucleotide
{⍺,⍴⍵}⌸PRIMER
C 4
G 10
A 4
T 3
In the dyadic case, the ⌸ key operator groups by the left argument and uses the right argument to number the elements. Consider the following two dimensional array of fast food orders
ORDERS
┌───────────────┬───────┐
│STEVE MCALE │BURGER │
├───────────────┼───────┤
│JOHNSON SWEEMEY│HOT DOG│
├───────────────┼───────┤
│DEREK ARCHIBELD│BURGER │
├───────────────┼───────┤
│MIKE NANDES │BURGER │
├───────────────┼───────┤
│SCOTT WESTON │HOT DOG│
└───────────────┴───────┘
Grouping by the second column, creating a list of names for each menu item,
ORDERS[;1]
┌───────────┬───────────────┬───────────────┬───────────┬────────────┐
│STEVE MCALE│JOHNSON SWEEMEY│DEREK ARCHIBELD│MIKE NANDES│SCOTT WESTON│
└───────────┴───────────────┴───────────────┴───────────┴────────────┘
ORDERS[;2]
┌──────┬───────┬──────┬──────┬───────┐
│BURGER│HOT DOG│BURGER│BURGER│HOT DOG│
└──────┴───────┴──────┴──────┴───────┘
ORDERS[;2]{⍺ ⍵}⌸ORDERS[;1]
┌─────────┬─────────────────────────────────────────┐
│┌──────┐ │┌───────────┬───────────────┬───────────┐│
││BURGER│ ││STEVE MCALE│DEREK ARCHIBELD│MIKE NANDES││
│└──────┘ │└───────────┴───────────────┴───────────┘│
├─────────┼─────────────────────────────────────────┤
│┌───────┐│┌───────────────┬────────────┐ │
││HOT DOG│││JOHNSON SWEEMEY│SCOTT WESTON│ │
│└───────┘│└───────────────┴────────────┘ │
└─────────┴─────────────────────────────────────────┘
Perfect!