Let's talk about concurrency

Hello, guys, today we'll tackle one of the most complex problem in computer programming called concurrency.

To be honest with you, we'll not really tackle the problem but see how some bright people resolved the issue and see some technologies that are based on these concepts.

Nowadays it's a fact that computing is scaling horizontally, meaning that we do not improve CPU frequencies so much but much more CPU cores number. This is because it is much more costly to do the first and that because the second approach is much more flexible. It is much easier to add a processor than to change his frequency and manage the heat and energy consumption.

We'll talk about these patterns that apply equaly in thread communication as of data management systems and briefly see how it works.

Locks

Most of computer scientist learn this pattern first, because historically it was the first to be implemented. In linux programming they're refered as Semaphores and provides mutual exclusion. The idea is that when you want to access a resource that is being modified by someone else, you need to wait until the modification is done (and you're notified) so you can access it. This ensures you data modification is pure, meaning that it does not matter how much time you run your program you'll have the same expected behavior.

For threads communication a pattern called CSP (Communicating Sequential Processes) is also blocking. When you want to communicate with a thread you wait until the message read is finished. With the years, this model was improved and allows asynchronous channel communication being one step nearer from the Actor pattern. This is the main communication pattern for Go programming language.

For example when you want to edit an entry in an InnoDB MySQL table, the row will be locked until the modification is commited in database. All ACID databases implement this pattern.

Transactions

This pattern is based on the same principle but allows some more flexibility, the idea of the transaction is that you'll get the data when you want, you edit it as you wish on a temporary log and then you commit your modifications hoping that there is no conflict, otherwise you just retry by taking the most fresh data and replay the modifications on it hoping that it works of course. If not you rollback.

In databases this is used a lot to optimize big bunchs of operations on a dataset, in MySQL it is called TRANSACTION. Most ACID databases implement this pattern.

In programming, we call it STM (Software Transactional Memory) but it is exactly the same idea with some improved versions. One of the benefit of this technic is that you can compose your transactions by making transactions of transactions, allowing more heavy operation batching. It is a nice pattern which is implemented in Haskell programming language. See Haskell wiki for more informations.

Actors

When you start working with real world concurrent applications no matter which of the preceeding pattern you use, you find yourself quickly overwhelmed with all the locking / transaction code you need to manage this. That's why the Actors pattern appeared, the idea is to see each thread as an actor that receive a message, act and send another message to someone else if necessary. This way, because the same data is never accessed at the same time by various actors, we do not even have to manage concurrent accesses.



In order to be sure to never have a shared memory usage we need to introduce a new concept, immutability. It ensures that any modification on a specified memory space will be stored in another memory space. One of the problem introduced is that a naive implementation will just blow up your program memory, that's why programming languages like Erlang or Haskell that are immutables, does some memory optimizations under the hood.

Erlang provide a native implementation of this pattern with some fine tools to manage the spawned processes and communication message updates.

As you noticed this time we talked about the threads communication part first, this is because I do not know any database that implements this model for now. The closer data management system that I can think of would be the Blockchain. Were the message is sent from an actor to another until the transaction is verified and completed, with a lot more occuring between these simplified steps.

Conclusion

These are some of the most known patterns, each has his drawbacks and benefits. I presented them in my personal preference order but I'd recommend you to test which one fits best with your needs. For example, actors pattern may not fit for low loaded applications because simplified programming is counterweighted by message queues management and process scheduling.

Disruptor

To finish I wanted to talk briefly about a pattern that is pretty new, and as some of you guys I'm just discovering it, the disruptor pattern. The idea is that you'll have a ring of messages space, one per producer, that will be accessible by as much consumer as you want. A consumer can also be a producer. The concept is interesting and seems to have good performances for some usecases. See Disruptor-1.0.pdf for mor details.

Thanks for reading,

See you soon.

Most seen