CompanyDecember 3, 2014

Cassandra and Windows: Past, Present, and Future

Josh McKenzie
Josh McKenzieOpen Source
Cassandra and Windows: Past, Present, and Future

The Past:

As far back as 0.4, Cassandra on Windows has been something people have been asking about. From adding support for nodetool to tooling in general, making Cassandra a better environment to work with on Windows has been going on in one way or another for years.  Fast-forward to 2012 and Cassandra on Windows has been raised to the level of being endorsed and supported, though still for development only. While it isn't that hard to get up and running, there have been some known issues that have kept us from recommending using it in a production environment - primarily recurring difficulty with deleting files. Given how critical reliable file I/O is for a database as well as the extent to which the Cassandra architecture makes use of snapshots and hard-links to files, it's something that's kept us from openly recommending you run Cassandra on Windows in production.

The Present:

While the 2.1 line of Cassandra still has some file access warnings popping up now and again, with 3.0 we hope to make that entire class of errors a thing of the past. The JDK7 introduced a new I/O library that allows us to open files using the FILE_SHARE_DELETE flag on Windows which closer matches the linux paradigm of being able to delete files while other processes still have handles open to them. While it sounds like this is something we could work around with strict ordering and/or reference-counting file access, the real trick comes in when working with hard-links to files. In brief, a hard link lets you set up multiple 'aliases' to point to a single file on disk. In Cassandra, we use hard-links for creating snapshots, for snapshot-based repair, before performing truncates, and now during the early re-open optimization during compaction results. The ability to delete a file that is hard-linked has less implications on repair performance now that we've introduced incremental repair, however it allows the Windows ecosystem to take advantage of the improvements to compaction that were introduced in 2.1. That, and it stops a bunch of annoying messages from popping up in your logs about being "Unable to delete" with FSWriteErrors! Note: These messages also indicate that file deletion will be re-attempted after a garbage collection and, so long as you're not using memory-mapped I/O, they can be ignored.

Memory mapping, Windows, and Cassandra:

Which brings me to the other known issue with Cassandra on Windows in both 2.1 and the upcoming 3.0 release: Memory-mapped file I/O on Windows is tricky if you're creating hard-links to files. From a technical perspective, if a process has a segment of a file memory-mapped on NTFS, you cannot delete either the original file or hard links to the file even with the FILE_SHARE_DELETE flag unless you've opted to completely skip the page cache (which we almost never want to do).  As a user, what this means to you is that we're currently disabling memory mapped I/O on Windows.  This should be in the upcoming 2.1.3 release, however you can also take care of this by adding "disk_access_mode: standard" to your cassandra.yaml file.  While the default mode of access is memory-mapped I/O for index files, we default to standard access for data files to allow compression so disabling this only impacts performance on scanning index files during reads where you miss on both the key and row cache. This is something we're actively looking into and hope to have a solution for in the 3.X line but no concrete details yet.

Enough about files - what else is new on Windows?

Cassandra 2.1 introduced a brand-new set of launch scripts for Windows aimed at duplicating the more complex logic and environmental setup available on linux. We rewrote the launch environment in PowerShell (keeping access to the .bat files available for those that don't have PowerShell access enabled on their machines) and it now provides much more robust determination of the JVM environment, GC flags, heap and young gen sizing based on the host, and automatic tuning of tcp keepalive for more deterministic handling of connection failures. Along with the changes to the launch scripts, we've fixed shutdown when run as a service, added ccm support for development environments, and cleaned up a few other things.  On the whole you should find the Cassandra development and usage environment on Windows to be a much smoother experience!

House cleaning:

A technique that's often used to ensure program correctness and prevent changes from introducing new bugs is unit testing.  With over 850 unit tests now in Cassandra there were surprisingly few failures on Windows by earlier this year when I took the reigns of making Windows a 1st class citizen for Cassandra (thanks team!). This is something we constantly keep an eye on as we move towards 3.0 and formal Windows support.

The Future:

Cassandra-3.0 is our current target for official Windows support for Cassandra - specifically with the file rename / deletion fixes mentioned above. Moving forward, we plan on looking into replicating some of the page cache optimizations currently available in linux only, whether by reading files in advance to "dirty-load" them into the page cache, or by using some of the new APIs introduced in Windows 8/Server 2012 to prompt the page cache to pre-load data.  As mentioned above there's still some work to do in getting memory-mapped file I/O to play nicely with our usage of hard-links within the database.

In summary:

A lot of hard work has gone into making Cassandra on Windows a stable and performant platform. We're on the last mile of the journey to formally declaring Cassandra on Windows a supported platform and, if you have the time and inclination, we'd love it if you could grab the latest development release, kick the tires, and let us know if you run into any problems we don't yet know about!

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.