Woods Hole Delivers 90TB of Oceanographic Data to Researchers
Woods Hole uses DataStax Luna to handle 90 terabytes worth of Apache Cassandra™ data with increased reliability and performance.
Enables system to access the organization’s six years of data much more effectively
Standardizes garbage cleanup across all nodes to improve performance
Accelerates the identification of configuration issues to increase developer productivity
Woods Hole Oceanographic Institution (WHOI) is a nonprofit research and educational organization that focuses on marine science and oceanographic engineering. Founded in 1930 and headquartered in Massachusetts, WHOI is the largest independent oceanographic research institution in the United States and employs the services of approximately 1,400 staff and students.
The Ocean Observatories Initiative (OOI), whose Program Management Office (PMO) is located at WHOI, is a science-driven ocean observing network that delivers real-time data from more than 800 instruments to address critical science questions regarding the world’s oceans. Funded by the National Science Foundation to encourage scientific investigation, OOI data are freely available online to anyone with an Internet connection.
The program has about six years’ worth of data on hand currently—or about 90 terabytes. Users from around the world can download this data themselves for their own scientific, research, and educational purposes.
Apache Cassandra™, the open-source, NoSQL database, has served as the heart of this system, which lives on hybrid infrastructure. Over the last two years, the team at WHOI has focused on making their system more transparent, reliable, and performant in order to improve the user experience and better support their mission of learning more about the ocean.
When Jeff Glatstein, Senior Manager of Cyberinfrastructure, was hired by WHOI to work on the OOI program, one of his tasks was making sure that the Cassandra architecture is optimized. While the development team was good at figuring out how to load data into Cassandra, Glatstein and his colleagues realized that they weren’t using the best configuration of Cassandra, which was leading to performance issues.
“We really had no eyes into Cassandra at all in terms of performance,” Glatstein says. “At the time, we had no idea our partitions were too big and we didn’t know that our configuration actually varied from node to node.”
The cluster started to lose nodes. Initially, maybe one node would go down every couple of months. Then, all of a sudden, three nodes went down in two days. Due to a bug in the version of Cassandra the WHOI team was using, it was taking them over 72 hours to bring a node back online.
Something needed to change.
After researching their options, the WHOI team signed up for DataStax Luna, an expert support offering for Cassandra.
“We were getting to the point where we knew we wouldn’t be able to operate, and that’s when we contacted Luna support,” Glatstein says.
As a result, the Program has been able to optimize their Cassandra architecture, improving performance, and helping stabilize access to oceanographic observation data.
“It just opened up a world to us in terms of realizing we had more tables than we probably should,” Glatstein explains. “That was news to us that there was a limit.”
Prior to enlisting Luna, the WHOI team was encountering a number of issues in their Cassandra deployment.
“Luna was able to get in and work us through those issues right away, and it cleared up a lot of mystery,” Glatstein continues.
I recommend anybody using Cassandra who is unsure if they are getting the most out of it to try Luna. It's been great.
Data Delivery Manager, WHOI
The decision to use DataStax Luna to support their Cassandra architecture proved to be a wise one. The Program has already experienced a number of benefits thanks to Luna.
Instead of having to wait days to bring nodes back online, OOI is able to put nodes in as needed.
“Our Cassandra cluster hadn’t been bounced in three years—which is not good practice—and the Luna team talked us through bouncing it,” Glatstein continues. “By doing that, it alleviates us from that 72-hour waiting period.”
DataStax Luna has also helped improve performance across their cluster. For example, the Luna team was able to identify that there was corruption in some of the files, which impacted performance. Additionally, Luna helped expedite garbage cleanup by standardizing configuration across all nodes.
“Our garbage cleanup was taking way, way too long,” Glatstein explains. “We fixed that so that our compaction and garbage cleanup is no longer impacting our performance.”
As a small team, any time spent maintaining or searching health problems is time that someone’s not spending on moving the system forward. Thanks to Luna, OOI’s developers can focus on more pressing support matters instead of tuning a database.
“What should have taken an hour from the Luna team’s point of view took five hours, and from our point of view, would have taken us days,” Glatstein explains. “So, right there was a productivity gain.”
Maximized returns from Cassandra
Thanks to DataStax Luna, the Program has the peace of mind that comes with knowing their Cassandra deployment is optimized—not just crossing their fingers and hoping that’s the case.
“I recommend anybody using Cassandra who is unsure if they are getting the most out of it to try Luna,” Glatstein concludes. “It’s been great.”