TechnologyFebruary 2, 2021

The Multiverse Next Door

Joey Anuff
Joey AnuffML Developer
The Multiverse Next Door

I’ve deployed some personally exciting ML-powered demos recently, but until now nothing personally essential: an app to help translate these growing piles of bootleg comic books.

The bootlegs are almost as new to me as the Machine Learning. Like ML, thrifting international flea-marketplaces (and Facebook Groups) for freak knock-offs is only recently an actually doable thing–in fact, my first big score wasn’t until late 2018:

Frustratingly, the internationalization (“i18n”) improvements that make multilingual comics-buying an addicting pastime have yet to make multilingual comics-reading any easier.

I keep expecting Kindle or Google Drive to add some magic “scanlation” button to perform the kind of typeset OCR I need–as we’ll see, their existing APIs make this task fairly straightforward, if not trivial–but I’ve still yet to see any such thing.

A reluctance to antagonize publishers? Maybe. But then again, nobody knew world comics had so much untranslated corpora. Surely the discovery of an international brotherhood of Dr. Doom clones changes the equation somehow?

This project thus has dual goals: to introduce some crazy new archaeology to comics fandom, and to scaffold a simple tool for deciphering it, narrowly designed for personal, non-commercial use.

What’s the stack?

A minimalist React app, built using the Next.js 10 web framework and chiefly dependent on two powerfully simple APIs: Amazon’s Amplify CLI, which we’ll use to access several cloud ML endpoints, and DataStax’s Stargate APIs for Apache Cassandra, the database we’ll use to store and edit our private translation libraries.

Who is this for?

Language learners and insatiable comic readers, eventually; comic collectors and historians, who’ll enjoy the upcoming galleries and data dumps; but right now mainly developers, particularly those eager to squeeze Machine Learning functions into their full-stack web apps.

Parsing a comic book page is a challenge perfectly suited to modern ML algorithms–and there’s a lot of great new research to draw from, some with ready code like this paper:

 

The first and most basic ML function of this project–handwriting recognition–is one of the earliest Computer Vision tasks, one of the first things taught in CV courses, and was the subject of the first how-to of this series, 6 Tricks for Simpler Cloud CV:

Discovering how much easier Amazon’s newer Amplify CLI was for deploying an ML endpoint (compared with Amazon’s older Lambda web console) was the major breakthrough I wanted to share in that essay.

What’s next?

The next installment will constitute another dramatic API upgrade, this time to our project’s database: Apache Cassandra, the database used by Apple, Spotify, Netflix and eBay. As with our OCR, we’ll be setting up some extremely concise serverless CRUD functions with the help of DataStax’s new @astrajs/collections package.

The CV and NLP models we’ll be using are often better than 90% accurate, but whatever error remains require human correction. Creating solid UX for revising our ML-annotated data–while at the same time optimizing our storage for continuous training and data analysis–is the kind of challenge Apache Cassandra is built for.

With input from the team at DataStax, I’m confident I’ll soon be explaining a solution, if not the solution, to many such database concerns common to ML-driven web apps. Some specific areas I’ll be looking into include:

  • Using the Stargate Javascript SDK in SSR and SSG situations.
  • Optimizing an Apache Cassandra db for NLP analysis.
  • The easiest approach to non-destructive edits.
  • Setting up a private system for collaborative translation.
  • Quick-starting a pop-up store with Next.js 10 Commerce.

 

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.