Twissandra is a project that provides similar functionality to Twitter.
Because it is beneficial to consider all the use cases for your data before you structure how it will be stored, the actions Twissandra needs to perform are listed here first.
The common actions to be supported are:
Here, a “friend” is somebody that a user follows.
Twissandra uses seven different column families to minimize the number of reads or inserts that need to be made for each type of action. The column families are:
Adding and requires two sets of writes: one to update the user’s row in FRIENDS, and one set of writes to update FOLLOWERS for each of the friends added.
Getting a slice of a user’s tweets (such as the last 20 tweets) requires at least two queries, three if we want to attach a user record to each tweet:
- Collecting the usernames from all of the tweets
- Calling multiget() on USER with the usernames as the keys
In USERLINE, the special PUBLIC_USERLINE_KEY is used to hold a timeline of all tweets. Obviously, on a Twitter-like scale, holding all tweets in a single row will eventually cause problems. This can easily be fixed by splitting the public userline row by day or hour, for example, but since this is an educational example, this has not been done.
Other than that, getting a slice of all tweets is performed exactly the same as for a single user, except the PUBLIC_USERLINE_KEY is used instead of a username.
To add a tweet, we have to do the following: