Hadoop, Security, and the Enterprise
eWeek recently published an article/slide deck on 10 reasons why Hadoop poses a big data security risk. As I mentioned a few months ago in a blog post that talked about our release of DataStax Enterprise (DSE) 3.0, the fact that NoSQL databases are lax on security was something getting attention last year in the tech media. I’m happy to say that, with DSE 3.0, enterprise quality security in the NoSQL world is no longer an afterthought.
But the eWeek article demonstrates that the same concerns exist where Hadoop implementations are concerned. The article says: “It [Hadoop] was not written to support hardened security, compliance, encryption, policy enablement and risk management.”
Because DSE is a single integrated platform that includes Apache Cassandra for online application use cases, Solr for enterprise search, and Hadoop for batch analytics, we wanted to make sure we had the security bases covered in our platform for each technology. The good news for Hadoop users is that many of the security concerns called out by eWeek are handled in DSE.
For example, eWeek says, “Hadoop also doesn’t support encryption on nodes or on data in transit between nodes”. That’s not true in DSE. Because we use Cassandra for storage vs. HDFS, the transparent data encryption we offer in DSE applies to Hadoop data. Moreover, DSE also supplies client-to-node and node-to-node encryption of data for Hadoop as well as Cassandra and Solr.
eWeek also states, “The distributed nature of Hadoop clusters also renders many traditional backup and recovery methods and policies ineffective. Companies using Hadoop need to replicate, back up and store data in a separate, secured environment.” In the same vein, they state later: “Traditional data security technologies have been built on the concept of protecting a single physical entity (like a database or server), not the uniquely distributed big data computing environments characterized by Hadoop clusters. Traditional security technologies are not effective in this type of distributed, large-scale environment.”
One of the nice things about the Hadoop component of DSE is that automatic redundancy and replication is built in to the platform itself, so all of the goodness of Cassandra – which is architected specifically for distributed, large-scale environments – is inherited on the Hadoop (and Solr) side. This equates into Hadoop data being easily replicated in one location or many; across one datacenter or multiple centers; across one cloud availability zone or several zones. Further, it means no single point of failure or write bottleneck as data can be written to and read in any location.
Backups aren’t hard either as all data is stored in Cassandra column families / tables, so typical snapshot backups and recovery tasks are uniform across a cluster.
So if you’re interested in easily integrating Hadoop batch analytics with your modern line-of-business applications and want to ensure both are secured, you should give DSE 3.0 a try. Download DSE, which is completely free to use without restrictions in development environments (note that production deployments do require a software subscription) and see how it can satisfy both your big data needs and your requirements for security.