DataStax Developer Blog

DataStax Test Engineering

By Cathy Daw -  March 10, 2014 | 0 Comments

The DataStax engineering team is organized to support and grow an OSS community while simultaneously building an enterprise software company.  To be successful, we are respectful of the philosophies and release processes required for each initiative.

The DataStax Enterprise (DSE) and OpsCenter (OpsC) teams follow a fairly standard enterprise development and test process where the decision to ship occurs after all release activities and must-have fixes are completed. Both enterprise teams follow an agile scrum development methodology with branch based development, where code is merged to the main line only after code reviews and testing are completed.

Our open source projects follow the Apache Software Foundation (ASF) release process where contributors submit patches that are reviewed and committed by a committer.  DataStax test engineers work on tickets on a per-request basis in collaboration with the C* development team, DataStax field staff and the community.  The decision to ship is a democratic process where a community vote determines the criteria for release.  Our test engineering team uses results of their testing as input to the ASF release manager at the time of vote, but we do not block releases.

There are several test engineering teams at DataStax which cover various product lines.

  • DataStax Enterprise
  • OpsCenter
  • Apache Cassandra
  • Dev Center
  • Various clients and drivers supported by DataStax

The Enterprise and Open Source test engineering teams execute under similar QA processes, incorporating the following in their day to day activities:

  • Development of test strategies and checklists for each release
  • Authoring, executing and automating test plans for new features
  • Running performance, scale and load tests
  • Building test infrastructure
  • Coding up tools that enable testing

Our test engineers undergo a rigorous technical screen with a pre-interview coding test and are asked to write a test plan for C* functionality which does not have a clear user surface.  The members of our team come from various backgrounds and development languages.  We are all united by a passion for working on big data technologies and are intrigued by the challenge of how to test a distributed database platform.  When your end users are engineers of varying backgrounds, viewpoints and objectives, it can be challenging to try to outguess exactly how your product will be used.  To think like an end user we need to be the end user (a developer or DevOps engineer), so we look for the best to use technology to test a complex distributed technology.

The primary mission of our test teams are to:

  • Provide predictable and reliable high quality releases
  • Ensure enterprise quality by exercising customer use cases at scale
  • Ensure our product undergoes strenuous longevity, load and failure testing
  • Ensure integrations and compatibility of all components and products

Enterprise Products – Testing Process

  • Functional Verification
    • All tickets are verified prior to merging to the main line
    • New features and enhancements undergo formal test planning: documented test plans, test cases, specs for tools
    • Automated test coverage is continually evolving to accommodate new tests cases and bugs found in the field
  • Upgrade Testing
    • Package upgrades are run to ensure the binary upgrade is successful across different package types across all supported OS types and versions
    • Rolling upgrades are run to ensure you can continue to run workloads while the upgrade is in progress and the cluster contains nodes of various versions
    • Coverage includes verification of all upgrade paths (start / end points), as well as covering all workload types, data types and installations topologies
  • Platform Testing
    • For all products, all installation mechanisms are validated across the supported platforms (OS types and versions)
    • For OpsCenter we verify browser compatibility tests
    • For drivers the platform list includes client operating systems and versions, client compiler versions (Java version x, C# version y), and various C* / DSE server versions
    • For DSE and OpsC  we run verifications for all supported Cloud Vendors
  • Automated Performance Regressions
    • In addition to performance tests required as check-in criteria for specific tickets or new features, we run performance regressions
    • Performance regressions are run using both static environments and workloads, varying only development branches or releases.
    • Automated tests are run to compare performance between releases to ensure no regressions are introduced
    • We run through various permutations of the C* and DSE benchmarks, as well as in-house developed workloads
    • We persist all run results in a DSE database for future analysis and trending
  • Scale and Time Based Exit Criteria
    • High scale operational tests
      • Patch Requests: 100 node cluster covering multiple data centers
      • Major Releases: 1000 node cluster 2covering multiple data centers
      • Operational tests exercise gossip and streaming based activities:  Start, Stop, Status, Bootstrap new node, Decommission new node, Add new DC, Decommission DC
      • We ensure we are running continual workloads as the cluster expands and contracts
    • Time based duration tests
      • Duration and stress tests have been written using our different open source drivers (Java, Python, C#, C++)
      • For all releases we ensure  72 hours durations tests which cycle through heavily load mixed workloads
      •  Node faults are randomly simulated ensuring the server continues to operate smoothly and no corruptions are encountered
  • Test Infrastructure
    • Python based tool to provision, install, configure and run distributed tests
      • integrated with our internal build engineering processes
      • provisioning against different cloud vendors and configurations
      • installation from different package types across repos, git branches, etc
      • abstracted parallel ssh, scp and log capture functionality for distributed tests
    • Generalized workloads that can be used by all products (OpsCenter, DSE, C*, etc) to support our write/update once, use everywhere philosophy
    • Framework to iterate thru different cluster configurations (size, partitioner, etc) and reuse existing workloads under various conditions
    • General technologies in use:  github for source control, Jira for bug tracking, Greenhopper for project management, python for system tests, and java / TestNG for unit tests

Apache Cassandra Testing Process

When we fund C* testing, we treat DataStax Enterprise and OpsCenter as downstream consumers and customers of Apache Cassandra.  DataStax started investing in test resources targeted specifically at the Apache Cassandra open source project in 2013 in order to improve predictability of the quality of DataStax Enterprise releases by catching issues earlier.  Since Apache C* and our Drivers projects are community driven projects, the role of the test team is to build and maintain test infrastructure and validate use cases for DataStax’s products and customers.

The test team focuses their efforts on:

  • Verifying critical bugs affecting DataStax customers
  • Writing, executing and automating test plans for new C* features
  • Enhancing and maintaining the build and test infrastructure for Apache Cassandra.  The primary test harness used for our open source projects are based on CCM.
  • Managing a public facing Jenkins servers that provide continuous integration and nightly regressions that run:
    • C* unit tests
    • C* distributed tests based on CCM
    • Various drivers test suites (unit, system integration and duration tests).  Note, we treat the drivers tests as additional coverage for C* as well as functional tests for the drivers
  • Running static code analysis using Coverity
  • Running code coverage via Cobertura

Takeaway

Even though we have different teams and release processes for our enterprise and open source projects, we all operate under the same corporate mission to be a best of breed technology, focused on providing releases to our customers that are  predictable both in delivery and quality.

We’ve evolved from an OSS testing philosophy in 2011 to a standardized enterprise development methodology in 2013.   Today we find ourselves continually evolving toward a more customer-oriented quality engineering process.  Our test engineering team collaborates with various field facing organizations at DataStax, and we welcome our customers to collaborate with us to ensure their use cases are covered in our regression test harnesses and factored into our enterprise release processes.  As we continue to grow our test engineering discipline and infrastructure, I look forward to bullet proofing our offerings and improving our customer enjoyment while using our products.



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>