CompanyApril 30, 2015

Cassandra Testing Improvements For Convenience And Confidence

Russell Hatch
Russell Hatch
Cassandra Testing Improvements For Convenience And Confidence

In the DataStax Test Engineering group we're always working to provide better testing and developer enablement tools for the Cassandra project. Today I'm happy to tell you about some exciting new tools to help developers get more done, and to help the project as a whole release with more confidence.

Release And Testing Big Picture

If you follow the cassandra-dev mailing list you might have noticed a recent discussion about release process and testing from the Cassandra project committers. To sum it up, Cassandra development is likely moving to a "tick-tock" release cycle where new features and bug fixes will be released on a monthly basis (new features one month, bug fixes the next). Some major motivations cited by contributors are releasing smaller changesets, being able to release new features more quickly, and lower source control complexity.

To release more quickly requires a commitment to release-readiness (meaning that code is known to be in a good working state at all times). To help with these efforts we're pitching in some great new enablement tools so code can be tested more completely before it's part of Cassandra.

High Level Contributor Workflow

You may already know that Cassandra developers work from their own personal forks of the Cassandra code base (typically on GitHub). Developers build new features and prepare bug fixes on their fork, and provide patch files representing these changes. When the code is deemed ready by both developer and reviewer(s), it's committed to the code base.

There's a few wrinkles with this approach, particularly in combination with long release cycles. The first problem is that committed code may be partially tested before joining the code base, in part due to the time required to run a complete suite of functional tests. The second issue is that if bugs are introduced by new code it takes time to figure this out amid the background noise of a large codebase in flux. Thirdly, once an engineer or an automated test uncover an issue, the developer has usually moved on to new tasks, which creates some obvious difficulty in getting the problem resolved. One could summarize these issues by saying that better feedback is needed sooner.

Fast Feedback Is On The Way

So, we've got faster releases on the way but we need fast feedback, and we need it earlier in the process. To meet these requirements, there's two great tools taking shape right now on the Cassandra continuous integration server hosted at cassci.datastax.com. The first tool is functional test parallelization. Basically we're teaching the continuous integration system to break a lengthy functional test job into small pieces that can be run simultaneously. With this improvement we're now getting functional test feedback in under an hour, and targeting 30 minutes as a goal.

Perhaps greater than the quickness of parallelization, we're going to be offering "self-serve" branch testing for Cassandra developers to use as they build new features and fix bugs. Developers will be able to tell the continuous integration server about their branch and it will be tested against the large suite of unit and functional tests. What this means is that test feedback can start arriving less than an hour after a developer commits a new change to their personal Cassandra fork, and can repeat as needed while development continues. This will add an extra level of confidence before code review even begins, and before code is merged into Cassandra's source.

Quick Start Guide For Frequent Contributors

If you're a frequent Cassandra contributor stop by the #cassandra-dev IRC channel to ask about self-service testing (ping exlt or ptnapoleon). They will set up the continuous integration system to be aware of your fork. Once the system is aware of your fork it will search for your branches, add test jobs, and start an initial job run (this takes about an hour to get going). At this point you will be added to this developer list.

Next, you'll need to set up a simple post-commit hook on GitHub. This will trigger testing on new commits to your known branches.

  • Log into GitHub and open your personal Cassandra fork repo.
  • Click the right pane "Settings" link.
  • Click the left pane "Webhooks & Services" link.
  • Click "Add webhook".
  • Set the Payload URL (with your username substituted) as http://cassci.datastax.com/git/notifyCommit?url=http://github.com/$GITHUB_USERNAME/cassandra
  • Leave the Content type field as "application/json"
  • Set the "Which events...?" radio button to "Just the push event".
  • Make sure "Active" is checked
  • Click the "Add webhook" button and you're done!

Now, when you push changes test jobs will be automatically triggered. You can view your test builds by clicking your name in this list.

Click the thumbnail below to see a short animation of setting up the Webhook and triggering a test run with commit+push (open in Chrome for optimal viewing):

Webhook Setup

For Less Frequent Project Contributors

If you're not doing major Cassandra development but still want to offer patches and take advantage of this new testing infrastructure, we're adding some features to help you out too! Watch (soon) for a blog post about IRC triggered ad-hoc testing. When this feature is available you'll be able to tell an IRC bot about your branch and it will see that a suite of tests are run on your behalf.

Special Thanks

Some recognition is deserved here for my hard-working peers on the Cassandra Test Engineering team at DataStax who quietly built this new infrastructure and supporting code, and kindly let me blog about their work: Alan BoudreaultRyan McGuireMichael Shuler, and Philip Thompson.

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.