In this episode Adron talks with Travis about starting and working site reliability in a large retail enterprise. They tackle topics ranging around outages, database sizing, monitoring and observability, disparate workloads and migrations between cloud providers. Then Travis and Adron head into discussion about distributed cache and some of the questions we would need to ask to determine the functionality needed. The episode then wraps up with a few outtakes at the end.
0:15 Introduction. Meet Travis Mattera @slacknroll
1:38 Is it multicloud, or disparate payloads across multiple clouds. Clarifications ensue!
2:11 The way Apache Cassandra/DSE works could make migration from multicloud or to multicloud environments easier.
2:40 A culture shift to public managed and hosted services (i.e. clouds) vs. traditional internal data center managed and hosted services.
3:54 Travis' current role around monitoring systems.
4:58 Introducing the idea around observability in systems. The step *beyond* mere systems monitoring.
6:00 The secret is out, we've got some Prometheus meetup coming up (video will be available here) and more around these capabilities.
6:44 A discussion around distributed cache, and complexities that come up. Including thrashed databases that led to an outage.
9:30 The Netflix case of CDN end points as cache and the related viable option for retail.
11:10 Possible database underprovisioned? The RDBMS paradigm of vertical scale provisioned limitations.
12:20 Travis get's his dance on while we do an outtro, and you get to enjoy all the outtakes!