“Craigslist of Norway” Finn.no Manages Services of All Kinds Except Their Cassandra Clusters Thanks to DataStax Astra
Welcome to the next installment of our Q&A series: Behind the Innovator.
Behind the Innovator takes a peek behind the scenes with learnings and best practices from leading architects, operators, and developers building cloud-native, data-driven applications with Apache Cassandra™ and open-source technologies in unprecedented times.
This week, we talked with Espen Amble Kolstad and Benjamin Weima Lager of Finn.no. Finn.no is the largest classified advertisements and eCommerce company in Norway, helping people buy and sell what they want online. Finn.no is part of Schibsted, the largest publishing company in Norway.
Here’s what they had to say.
1. Can you share your technical background to date? Key accomplishments, achievements, etc.
Benjamin Weima Lager, Technical Domain Expert - Data Intelligence, Finn.no: I started my tech career as a developer at Skiinfo, a site that provides all sorts of information for skiers like weather reports, news, location details, and the like. I put together systems for handling messages, emails, alerts, and so on. After this, I moved to Finn.no as a consultant before joining the company full-time.
I’ve worked at several teams within the company - starting off, I worked on Finn’s map service, before I moved into teams covering real estate, cars, and jobs. My most recent move was about two years ago when I joined the Data Intelligence team.
Espen Amble Kolstad, Senior Developer, FINN.no: For my first role, I started working at Telenor, a global telecommunications company, around text messaging services. After my time there, I moved on to a couple of other companies - Aspiro and T-Rank - where I worked around developing search and ranking services. I also started as a consultant at Finn, and then joined the company full-time. I have been there for about 10 years.
Benjamin: We are part of the Data Intelligence team at Finn, working around the company’s recommendations engine and data models. We have a team of three engineers and three to four data scientists, depending on the volume of data modeling that we have to do. Recommendations traffic is responsible for around twenty percent of our traffic to our classified ads, so it’s an important team for the business.
2. What is your current priority for the team and what are you trying to achieve?
Benjamin: We have just been through a massive cloud migration project across the whole company. It’s been a big shift for all our systems, and our Apache Cassandra implementation was no exception. Cassandra has been in use at Finn for eight years, and we had always previously run on the open-source Cassandra version. This year, Finn migrated over to running on Google Cloud Platform.
We started looking at this a couple of years ago. We had some Cassandra skills internally and a great set-up, but we had some concerns around how we would keep on managing this over time. We had our Cassandra expert leave, so we had a choice - we could carry on doing things in the same way, or we could look at managed Cassandra services instead. With our company’s overall migration to the cloud taking place, it made sense to move our Cassandra install to the cloud too.
The team had a hard deadline to make - September 15th - so we had to have everything prepared and ready to move over before that day. The move over was a big project for us - we worked to make the changeover as smooth as possible. We moved from open-source Cassandra to DataStax Astra, and our on-premises clusters to GCP.
We decided on Astra for a couple of reasons - firstly, Astra provides Cassandra fully supported by DataStax and recommended to us by someone that used to work at Finn. Secondly, it runs on Google Cloud Platform (GCP), so we could fit in with our choice of cloud.
3. What other systems are in place at Finn, and how does Cassandra support them?
Benjamin: In our data science team, we use a lot of different tools to build machine learning models - mostly we use Python, with some use of PyTorch and Tensorflow as well. We use whatever best suits our model - and our data scientists use what they prefer.
We have some other database instances for other areas of Finn - for example, we use PostgreSQL for database clusters - and we have a mix of different languages in place - Scala, Kotlin, Java, and even some Go.
Our data scientists create and test their models based on data from our data lake. Once we have the models finalised, they get published and used as part of our API, which then gets combined with our Cassandra implementation. The combination of our API, which contains all our data models, and our Cassandra instance, now runs our recommendation engine. We use Cassandra as it provides the read performance and the resiliency that we require.
4. What are some key learnings and challenges you've experienced while working on this current project?
Espen: We have implemented on Astra, and we are looking to get the same level of performance as we had on our previous on-premise deployment. We are expanding how we look at our monitoring approach and scaling up our systems on Astra. We are looking at our instances and potentially moving up to the next size of cluster.
5. What's your vision for this project?
Benjamin: Recommendation engines are hard. So keeping these systems up to date will be an ongoing challenge for us as a team. However, our recommendations product is getting built into more of our services for customers. For example, we have a travel product that does not yet have recommendations built in, so we are developing our approach here. We can start to give recommended suggestions for hotels as part of our search and advertising results.
Espen: We are having more conversations with business teams across our organisation, helping them understand recommendations and how this can work. We want to make our services smarter so they help customers better. For example, we can show adverts to customers that should match their interests, but we can also check how many times those adverts have been displayed. We can check those results and stop adverts being displayed if they are not of interest. That is an area that we can use data to improve our overall user experience. You have to work at speed here, and Cassandra is best for our needs around read and write performance.
6. Do you have any advice for your younger self that you would share now?
Espen: That’s a good question! I would suggest learning more programming languages. The reason behind this is that it would help me see different solutions to the same problem.