Clojure Journey VII – Collection and Sequences

After talking about basic datatypes on our last post of this series, we can talk about array, list, map and sets. Clojure has a rich set of built-in data structures, and the key is know the difference between each one and when use it in to take advantage.

Vector

Let’s start talking about vector or “Array” from those coming from languages that use this term, probably the most popular and easy to learn structure that we’ll talk about here. Th

Vectors in Clojure behaves just like you expect from other languages (sometimes named Arrays), it’s a sequential allocation of memory allocation, where index starts from 0.

=> (vector 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
=> (type (vector 1 2 3 4 5)
clojure.lang.PersistentVector

Note that just like other languages, vectors are represented with brackets []. You can create a new vector, just using its literal, like this:

=> [1 2 3 4 5]
[1 2 3 4 5]

The result is the same as using the vector function, so, feel free to use the best way in the context. Vectors can be of nested types too, so don’t feel locked to only one type:

=> [1 2 "Don't Panic" :hi]
[1 2 "Don't Panic" :hi]

It’s interesting to note that you don’t need to separate the values by commas, in Clojure, spaces are seen like commas in other languages.

Now we have our vectors but we need to do things with it, to do our tasks Clojure provides a nice set of functions to interact with it, the simpler one is count, as the name suggests it returns the number of items in the vector.

=> (count [100 142 42 "Don't Panic"])
4

We can create a new vector with things at the end, using conj function :

=> (conj [100 142 42 "Don't Panic"] "Hello World" "Ola")
[100 142 42 "Don't Panic" "Hello World" "Ola"]

Note that you can pass N values as arguments and it will be returned in the new vector.

The same way that we use conj to add to the end of our vector, we can use cons to return a new sequence with arguments at the beginning.

=> (cons "Hello World" [100 142 42 "Don't Panic"])
("Hello World" 100 142 42 "Don't Panic")

But different from its cousin, cons only accept one argument.



A brief of Immutability

Did you notice that I used the term “new vector” while referring to the return value of the function? The reason that I used this term is because Clojure always will return a new vector, and not the same vector with one more item as some people (depending of the language that you come from) may expect.

The proof is simple, let’s bind a vector to an identifier, use conj to add something, and print the vector again:

=> (def our-vector [1 2 3])
#'user/our-vector
=> our-vector
[1 2 3]
=> (conj our-vector 42)
[1 2 3 42]
=> our-vector
[1 2 3]

Note how the our-vector doesn’t change, Clojure always returns a new vector and you can expect this behavior for almost anything while coding Clojure, immutability is one of the pillars of Clojure. In future post I’ll talk only about immutability and it advantages, but for now this is whats matters.

OMG! These structures are strong like a rock!

If we want to use the value you can bind it to a net identifier:

=> (def second-vector (conj our-vector 42))
#'user/second-vector
=> second-vector
[1 2 3 42]


Back to talk about functions that you can use with vectors, other nice function is the nth that returns the value the value in the given index (0-indexed):

=> our-vector
[1 2 3]
=> (nth our-vector 1)
2

Just to finish about vector, let’s talk about first/rest functions, just as you imagine, first will return the value at index 0 of your vector:

=> (def our-new-vector ["Don't Panic" 42 "Hello World" 23])
#'user/our-new-vector
=> (first our-new-vector)
"Don't Panic"

And as we already know about immutability, the original vector will not be modified:

=> our-new-vector
["Don't Panic" 42 "Hello World" 23]

And we have our rest function, that will return a new sequence will all values except the first of our vector:

=> (rest our-new-vector)
(42 "Hello World" 23)

And I don’t need to say again that the original vector will stay the same, right?

Using the first/rest functions we can access any value of our vector just using recursion, but if you just want to access the last value, you can use last of course.

=> (last our-new-vector)
23

I think that we know everything to get started working with vectors, now let’s talk a bit about lists.

List

I already talked a little about the list in my post about syntax, but basically, everything in Clojure is a list (If you don’t know about what I’m talking check the post and you’ll understand), if a list is what we use in almost everything in Clojure, let’s take a look on how we can use lists.

To create a list just put your data around parentheses , just like it:

(1 2 42)

But if you try to create a list in this way, it will raise an error like we talked in our previous post. Clojure will try to execute this list as code and will expect a function in the first value of the list, so, if we want to really use a list we need to tell Clojure to not execute this value, and we do this appending an apostrophe to the beginning of our list:

=> '(1 2 42)
( 1 2 42)

Another amazing thing about the list is that the count, first, rest, last, nth, cons works equal as you learned on vectors (and we’ll learn why soon), so, don’t need to learn again! But different from vector, if you use conj it will append to begging of the list.

=> (conj '(1 2 3 4) 5)
(5 1 2 3 4)

Just like vectors if you don’t want to create with the literal, you can use a function to do this job for you.

=> (list 1 2 42)
(1 2 42)

Have you noticed that you use a list to create another list?

[IMAGEM FUNNY]

Maps

It’s impossible to imagine programming nowadays without maps, or as some languages call “Hashes”, its the simplest, fastest and elegant way to represent real world structured data, in functional programming it gains more attention, because is probably the best way to do that.

To create a new map in Clojure, just as the other structures we have two ways, the function, and the syntax sugar, let’s start with the sugar way:

=> {:name "Otavio" :country "Brazil"}
{:name "Otavio" :country "Brazil"}

Just put your data inside a curly brackets and have a fun, but pay attention that it need to follow the “key and value” order and if you don’t put a even number it will crash.

To create the same map, you can use the function hash-map:

=> (hash-map :name "Otavio" :country "Brazil")
{:name "Otavio", :country "Brazil"}

One thing that you can do to make your maps declaration more readable is to put a comma between each pair.

=> {:name "Otavio", :country "Brazil"}
{:name "Otavio", :country "Brazil"}

Pay attention that we almost always use keyword as map keys in Clojure!

To access a value from a map we have few options, the most common way is to put the key that you want to access and the map.

=> (:country {:name "Otavio" :country "Brazil"})
"Brazil"
Let’s get things from the map with ourr new magic!

We can pass a default value, so if it don’t find the key it will return the default value that we passed:

=> (:language {:name "Otavio" :country "Brazil"} :english)
:english

Warning: To access data in this way the key need to be a keyword, otherwise it will raise an error.

Opposite from the previous way, we can pass the map and the key that we want too:

=> ({:name "Otavio" :country "Brazil"} :country)
"Brazil"

Of course we can pass a default value here too:

=> ({:name "Otavio" :country "Brazil"} :age 0)
0

How we are getting used, we have many ways to do things in Clojure, and I promise that this is the last way that I’ll teach how access data from map in this post:

=> (get {:name "Otavio" :country "Brazil"} :country)
"Brazil"

And we can pass a default value to get too:

=> (get {:name "Otavio" :country "Brazil"} :language :english)
:english

We have a few ways to access data from a map, and feel free to use the way that you think is better, but particularly I like the first way, passing the keyword and the map.

If we don’t want to pass default values and just know if the key exists in the map, we can use the contains? function:

=> (contains? {:name "otavio"} :name)
true

As you may imagine we can create nested maps two, just putting a new map on a key:

=> (def person {:name "Otavio" :age 199 :address {:country :UK :city :london}})
#'user/person

And to access data inside nested maps, is easy, just access the keys in the right order to get the data:

=> (:city (:address person))
:london

When dealing with maps we have the basic set of functions to deal with our data, the first the the assoc that we can use to add values to map, or edit.

=> (assoc {:name "Otavio" :country "Brazil"} :language :english)
{:name "Otavio", :country "Brazil", :language :english}
=> (assoc {:name "Otavio" :country "Brazil"} :language :portuguese)
{:name "Otavio", :country "Brazil", :language :portuguese}

And in a meaningful way, we can use dissoc to remove something from the map.

=> (dissoc {:name "Otavio" :country "Brazil"} :country)
{:name "Otavio"}

It’s important to remember now that almost everything in Clojure is immutable, and maps are immutable, every operation that we are doing on our maps are always returning a newer one.

We can obtain all values or all keys from a map, using the vals and keys function:

=> (vals person)
("Otavio" "Brazil")
=> (keys person)
(:name :country)

And we can merge maps into one just using the merge function, we can pass many maps as we want and remember that if two or more maps have the same key, it will use the value from the rightmost map passed to the function.

=> (def person {:name "Otavio" :country "Brazil"})
#'user/person
=> (merge person {:age 199 :language :english} {:name "Henrique"})
{:name "Henrique", :country "Brazil", :age 199, :language :english}

Now we know enough about maps, and we already learned a lot about vectors and lists, now it’s time to learn a little bit about how to do things with our collections.

Sequences, the magic behind collections

When dealing with our collection we have common functions that we can use (As we saw with cons or first in this post for example), one great example is the first function:

=> (first [1 2 3 4])
1
=> (first '(1 2 3 4))
1
=> (first {:name "Otavio" :country "Brazil"})
[:name "Otavio"]

If we stop to think about it, its a little bit curious, we’re not in a object-oriented language where each object has it own “fist” function, we’re calling the same first function to different structures and its returning correctly… Wouldn’t it be correct to have one function for each structure like first-vector or first-map, just like elisp has?(This example about elisp is a beautiful example pointed by Daniel Higginbotham at his book Clojure for the Brave and True)

Other nice option would be use an if statement to select how to take the first item of each collection, but it didn’t sound escalable if we have tons of collection in our language right?

Clojure has a more elegant way to solve this problem and it’s through abstractions, did you realized that sometimes in this posts I said that some function return a sequence? For example, if I call cons on on a vector, it will not return a new vector, it will return a sequence, take a look:

=> (cons 5 [1 2 3 4])
(5 1 2 3 4)

It can looks like a list, but its a sequence, I promise!

const like many other functions of Clojure will work if you pass any data structure that implements the sequence interface. To implement a sequence interface and be a seq the data structure need to respond to three function: first, rest and cons.

=> (cons 5 [1 2 3 4])
(5 1 2 3 4)
=> (first [1 2 3 4])
1
=> (rest [1 2 3 4])
[2 3 4]

If one data structure respond to these functions this structure are called seqable by Clojurists, when can call the seq function on it and it will return a sequence:

=> (seq [1 2 3 4])
(1 2 3 4)
“seq” a new magic!!!

seq transforms our structure in a new sequential structure that behaves like a list, a function like first only expect that you pass a structure that implements that interface, so it call seq on every structure that you pass, and perform the operation.

Sequence Functions

Sequences brings a great power to Clojure, and if a collection is seqable a rich set of functions can be used, and fortunately all data structures that we learned in this posts, are “seqables”, and we can use all these functions!

We already learned few of them, like first, rest, cons, last but the true magic stands on functions that can make processes in our collection, like, map, reduce, filter and many others (Depending of your previous experience you’re already used to some of them).

Map

map is an ambiguous term in Clojure, because it represent the data structure the we saw (commonly called hash in other languages) and it also represent a common function that can be used on sequences. When talking about the function, it’s used when you want to create a new sequence by applying a function for each element of our sequence.

Imagine that we have a vector with maps, and each map represent a person:

=> (def person [{:name "Otavio" :country :br}
     {:name "Gabriela" :country :uk}
     {:name "James" :country :us}])

And we want to create a new sequence only with the name of these person, we can use map:

=> (map :name person)
("Otavio" "Gabriela" "James")

In this example we passed the function :name to map apply it for each item of our sequence and return a new sequence with the results of each function (:name) call.

Let’s get the names from map

It’s important to note that functions that operates on sequences, returns a new sequence, so if we want a vector again, we can use the into function (other common seq function) to put the items of our sequence inside a empty vector:

=> (into [] (map :name person))
["Otavio" "Gabriela" "James"]

When we learn about functions we will use map it a lot, and you’ll see the true magic behind it.

For

Other common seq function is the for which is very similar to what we see in other languages, it loop through our collection binding the current value to some identifier, the same operation of our last example using for will looks like:

=> (for [p person]
     (:name p))
("Otavio" "Gabriela" "James")

In this example we tell Clojure to loop through our collection person and bind each value to a identifier p, and after that call :name on p.

But let’s try to avoid using for for everything, its much more elegant and functional minded to use functions like map, reduce to do the things, mostly we can use them to solve our problem.

Reduce

The last function that we’ll see in this post is reduce, we will pass a function and a sequence to it too, just like map, but different from map the response of each function will be given as argument to the next one and at the end of our sequence, it returns the result. It sounds a little confusing while reading but it’s simple, the easiest example is use a reduce to obtain the sum of all elements in a vector:

=> (reduce + [26 23 05 42 19 100 120 123])
458
Reduce working

We can give a staring value if we want too and second argument:

(reduce + 10 [26 23 05 42 19 100 120 123])
468

Know how to use reduce correctly can be very powerful and we’ll see this in a future post, but now its enought.

Much more

We’ve a lot of functions to interact with our sequences, but I don’t want to take a deep view in each of them, so, I pretty recommend to read this post and learn other functions if you want, or wait to learn while having a fun with Clojure.

Off: What’s the difference between vectors and lists?

After reading this post a lot of questions come to mind, but the most frequent one can be: “What’s the difference between vectors and lists? Both look the same for me!”, the two structs look the same at first view but behind the scenes both are different.

The difference is simple, and for most people (except me) will remember college days, vectors are equal vectors of other languages, it is a sequential chunk of memory allocated (Did u Remember how to use malloc?), and lists are implemented like a linked list behind scenes.

Lists and Vectors under under the magnifying glass

This difference between each implementation creates advantages and disadvantages for each one, for example, add a new value to a list is simple, the language only needs to create a new “node” and add it to the end or beginning of the list, but add a new value in a vector we need to relocate the chunk of memory and them add the new value.

Otherwise, if you want to access a value from your vector, it will be faster, it only needs to calculate some arithmetic and access, a linked list will need to transverse all the list calling the next node, and it will take a while. (if you want a good post with nice benchmarks with time, check this post)

Both structures look similar but are different on an x-ray, both have its disadvantages and advantages, so you need to know where to use each one.

Conclusions

This is not a short read but of course, we learned a lot, now we know how to use Clojure data structures, the dream of immutability, and the magic of sequences.

We are progressing on our journey of becoming Clojure Wizards.

Next steps

As a good complement to this reading, I pretty recommend to read sequence documentation and learn about other functions that interact with sequences.

On next post we’ll talk about functions, high order functions, parameters, and much more.

Final thought

If you have any questions that I can help you with, please ask! Send an email (otaviopvaladares at gmail.com), pm me on my Twitter or comment on this post!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: