Get your copy of the O’Reilly Cassandra eBook: The Definitive Guide - Download FREE Today
How Data Teams are Changing and Importance of Self-Service Data - A Conversation with Jesse Anderson of Big Data Institute
In this conversation, Sam Ramji, Chief Strategy Officer at DataStax and Jesse Anderson, a thought leader in managing and building successful data teams, talk about what a data team is, why you need one, and the three different types of data teams. They also explore some of the problems that data teams encounter, the importance of self-service data, and why you should be keeping up with the latest technology.
In this episode of The Open Source Data podcast, Jesse Anderson and I talk about the problems companies have with their data and the obstacles they encounter in accessing data to inform data-driven decisions.
Some organizations think that getting insights from big data is infeasible due to barriers of technology and cost. According to Anderson, sometimes the biggest barriers are people, “There’s a particular thing that I call the ‘No Team’. They’re there to say ‘no’. You want to access your data? No, sorry. Can’t do that.”
Anderson thinks data teams should be working on how to say yes. He noted that there are always limitations and problems with technology. Yet he has also seen plenty of cases where one organization has failed, but another has succeeded—using the very same technologies.
To figure out why some organizations failed and others didn’t, Anderson went on a deep dive, interviewing people and asking them what made them successful. What were they doing differently? What he discovered was that the problem lay in the data teams, not technology.
In the podcast, Anderson said, “Start with the right people on the bus and the right places on the bus.” With a unified data team that knows their role and responsibility, an organization can reap the rewards from insights that it otherwise wouldn’t be able to see, instead of leaving its teams blindly driving the bus.
What is a data team?
A data team is responsible for an organization’s big data and how to help solve business needs with data-driven decisions. In order to produce those insights, the data team must transform raw data into actionable insights that can help drive improvements in production and other key results. A better understanding of effective team structures is essential, but that understanding is not well distributed today.
Why do you need a data team?
In some organizations, there is a dangerous belief that all it takes to manage big data is a data scientist to handle such roles as:
- Running data pipelines
- Storing business logic
- Preparing data
- Creating charts and dashboards
The reality is data scientists may not have the knowledge to perform those roles. You could get lucky and find one person with all the skill sets you need to achieve this, but it’s extremely unlikely. Building a team of one is a fragile strategy for such a critical function.
Instead, Anderson suggests segregating duties into three different teams within the data team. “What you’re better off doing is getting these teams, working together, getting symbiotic relationships, good, strong connections with each other, and having them work together. And you’re so much better off than trying to find that unicorn.”
The three data teams
Anderson’s book Data Teams looks at three different teams; data science, data engineering, and operations. Each team has a different responsibility to help get value from data.
Within the data science team is where you would find the data scientists. It’s their role to produce the output for the organization to make their decisions via advanced analytics.
In the podcast, Anderson spoke about the data science team being team members that have “a mathematical background, statistical background, learned some programming and applied that to creating models. And that ‘some programming’ is a key part of that definition.”
The data scientist on the team must be efficient in these skills because it’s the combination of them all that enable the development of models that can highlight helpful insights and present them in a form that is easy to read.
The data engineering team is responsible for creating data products and the architecture to build data products. Without these, there would be no relevant data to consume and garner insights.
Creating a data pipeline is a crucial task for the data engineering team, taking raw data and turning it into something that can be used. They also continuously make sure that the organization's data is in a format where it’s ready to be used and can change with the ebb and flow of the organization’s demand.
Anderson said that you should look for a data engineer who is a “software engineer who has specialized their skills in big data.”
The operations team’s primary goal is to put cluster software and custom software into production and make it run smoothly. The result is to guarantee customer access to the data service and ensure that the system doesn’t fail when in production.
One of the main challenges for the operations team is to make sure they are familiar with how the custom software has been developed. Anderson explained, “There’s also this other key part. And that’s your code. They need to understand your company code, to be able to operate how that works with the framework.”
If your operations team has little knowledge of the internal software, it can mean problem-solving bugs can take longer and require the involvement of more people.
The operations team should be the “data gurus”. They’re the ones who will know the size of the data, the type that’s being sent, and what format it’s in, allowing for better resource allocation planning.
If you find there are problems in your organization accessing data and you see the burden of that falling entirely on data scientists, it might be time to consider making changes to your data team so you can benefit from insightful business decisions.
About Jesse Anderson
Jesse Anderson is the managing director at Big Data Institute, where he works with start-ups and Fortune 500 companies to understand their data problems. He shares his wisdom from his experience working within the industry to empower organizations and teach best practices for building and managing data teams. Jesse has also trained over 30,000 people and is the author of Data Teams: A Unified Management Model for Successful Data-Focused Teams.