ls -a
Cassandra Messaging Tutorial - Part 1

I am currently working on developing the basic architectural components for a project me and my buddies are putting together. Since forming relationships via  communication is going to be our core business, the internal messaging system needs to be built on something solid that can scale out horizontally without the need for replication. Enough about that, the goal is to show how how to build out a simple internal messaging system using Cassandra.

Before anyone goes on a rant about the fact that I am thinking scale here, I want to make one thing clear. Yes, it is way to early to think about ways to scale a service before even testing out the waters. I am fully aware of this fact, and I am in no way endorsing you to think that your service will eventually scale before even putting a minimum viable product out there. In fact, research has shown that it probably will never scale beyond a couple hundred thousand http requests a day and about the same amount of read/write storage transactions.

This is more of a learning experience for me and also an excuse to start getting in to some more in depth hacking on distributed data storage systems using Cassandra as a means to achieve this.

Loose Pre-requisite

You are using Rails for your front facing presentation layer. (Or are familiar with it)

Strict Pre-requisite

You have read this, and you know how to fire up an instance of Cassandra server running on at least 1 node. The aforementioned post is by far the best quick start guide I’ve found on Cassandra. Props to Evan Weaverfor writing it.

Mindset

Remember, unlike RDBMS’s you have to think carefully of modeling the data to suit your application. You can’t just model relationships and add indexes to support queries that will be needed as you progress. Therefore de-normalization is absolutely necessary. Just remember any college level DB course you had and do the exact opposite when it comes to data normalization.

Messaging System Supported Use Cases

1. A user should be able to compose a message and send it to another user.

2. A user should be able to view incoming messages via an inbox, ordered by most recent.

3. A user should be able to view outgoing messages, ordered by most recent first.

4. A user should be able to view a detailed message thread including any of it’s replies.

5. A user should be able to reply to a message thread.

* For the time being, a user will not be able to search through messages. Search will be implemented later, probably with lucene. Not sure on this yet, but will write about it when I decide to implement.

* This also doesn’t cover attaching objects such as virtual gifts and images to messages. I Will be covering this in a later post, but it’s pretty straight forward.

Step 1 - Data Model

I decided to stick with a simple construct and use a single ColumnFamily. There are many ways to do this, including using a multi column family to structure the data. But I chose to keep it as simple as possible to store and display messages.

I set up 1 main Keyspace for the project and  2 ColumnFamily’s for the messaging system. The Columnfamilys are ordered with the TimeUUIDType comparator since we want them ordered by time.

Contents ofstorage-conf.xml:

<Keyspace Name="MainProject">
<ColumnFamily CompareWith="TimeUUIDType" Name="Messages"/>
<ColumnFamily CompareWith="TimeUUIDType" Name="MessageReplies"/>
</Keyspace>

Messages - This will hold user ids as keys and associate a unique sortable column name with a JSON hash containing the message details.

MessageReplies - Messages ids as keys with an associative unique sortable column name containing a JSON hash that will hold reply details.

This modeling will easily allow us to list Messages for a given user as well as details and replies to a given message, if any are present.

Step 2 - Usage and examples

To illustrate usage, I’ll fire up a ruby interactive prompt, and populate/retrieve data. (just type ‘irb’ in your terminal to get a shell going)

require 'rubygems'

require 'cassandra'

include Cassandra:Constants

require 'rubygems'

Instantiate a Cassandra client object:

messages = Cassandra.new('MainProject')

A few insertions

Let’s assume Jennifer, user id 1200 is sending Emmanuel, user id 1120 a message. What we do, is insert the message once postfixing ‘_in’ to Emmanuel’s user id, and then once more postfixing ‘_out’ to Jennifer’s user id. These two will be our keys. (Remember that a user can view both an inbox and outbox of messages and there is no notion of easily querying over values, therefore redundancy in this case is good). Notice that we are doing two successive insertions so some level of transaction control is needed when this is put in to production code.

uuid = UUID.new

messages.insert(:Messages, '1120_in', {uuid=>'{"username":"Jennifer", "age":"24", "location":"Aventura Florida", "subject":"Hey there","body":"Cool profile. Drinks sometime?", "read":"0"}'})

messages.insert(:Messages, '1200_out', {uuid=>'{"username":"Emmanuel", "age":"26", "location":"Miami Florida", "subject":"Hey there", "body":"Cool profile. Drinks sometime?", "read":"0"}'})

Let’s insert another message to the same user, but this time from Natalie:

uuid = UUID.new

messages.insert(:Messages, '1120_in', {uuid=>'{"username":"Natalie", "age":"25", "location":"San Francisco CA", "subject":"Do you like Ninjas?", "body":"It might be a dumb question. Just wanted to know if you did :-)","read":"0"}'})

messages.insert(:Messages, '1320_out', {uuid=>'{"username":"Emmanuel", "age":"26", "location":"Miami FL", "subject":"Do you like Ninjas?", "body":"It might be a dumb question. Just wanted to know if you did :-)", "read":"0"}'})

So far we took care of use case #1. Next, let’s assume Emmanuel replied to Natalie’s message:

messages.insert(:MessageReplies, uuid, {UUID.new=>'{"username":"Emmanuel", "age":"26", "location":"Miami FL", "body":"I love Ninjas! OK we can officially go grab a drink now!"}'})

This takes care of use case #5. Next we’ll move on to cover the last 3 use cases via some basic retrieval operations.

And now for retrieval

Now I can easily retrieve all messages for a given user based on user id’s and a postfix of ‘_in’ or ‘out’. After that I can iterate over them and display messages however I want. This takes care of use cases #2, and #3.

For Emmanuel’s Inbox:

collection = messages.get(:Messages, '1120_in', :reversed=>'true').to_a

For Emmanuel’s Outbox:

collection = messages.get(:Messages, '1120_out', :reversed=>'true').to_a

Onto covering use case #4, let’s say you listed out a user’s messages on a page and upon clicking on an entry you want to display message details and replies, a sort of permalink for a message.

All we need now is the messages UUID, we can get this by calling to_guid, which will generate a readable representation of a Cassandra UUID.

For simplicity, let’s access Emmanuel’s first result, and get it’s message guid:

guid = collection[0].first.to_guid

Now you can use the guid to retrieve message details and any replies associated with it. We can make use of the only supported range query Cassandra has (Return a range of ids and limit the result set)

msg_details = messages.get(:Messages, '1120_in', :start=>UUID.new(guid), :count=>'1', :reversed=>'true').to_a

Last step is to retrieve all replies for this message, and to illustrate limiting  the result set lets return 20 latest replies.

guid = msg_details.to_a.first[0].to_guid

replies = messages.get(:MessageReplies, guid, :count =>'20', :reversed=>'true').to_a

That’s pretty much it. I’ll respond to the comment thread if anything needs clarification.

Next post will be about adding attachments to the message structure, which is going to be fairly easy, given that the meat of the module is done.

blog comments powered by Disqus