TechnologySeptember 6, 2023

Build a Text and Image Search App with Astra DB Vector Search, NodeJS, Stargate’s New JSON API, and Stargate-Mongoose

Yuqi Du
Yuqi DuEngineering
Build a Text and Image Search App with Astra DB Vector Search, NodeJS, Stargate’s New JSON API, and Stargate-Mongoose

Vector search is a powerful way to bring the magic of generative AI to your applications. And if you have the right tools, it’s not necessarily difficult. Here, I’ll show you a simple way to build a NodeJS application with DataStax Astra DB (and vector search) support by using Mongoose driver stargate-mongoose and Stargate’s new JSON API.

I’m a new engineer working on the DataStax Stargate. It’s a tradition that the first ticket in the Stargate team is to develop a demo app by using Stargate’s API. I decided to incorporate Stargate’s new JSON API and the stargate-mongoose driver, to build a NodeJS app.

Computers and cameras are my two obsessions. Whether it’s lines of code or lines that frame a photograph, I enjoy the logic and artistic creativity—it makes my life colorful. So to combine my two passions, I decided to build the app Photography-Site as a way to organize my work.

To make it an AI app, I also decided to incorporate vector similarity search, using the vector search capabilities in Astra DB.

Stargate-mongoose and the JSON API 

I am not particularly proficient with JavaScript, but building this app was still relatively effortless.

Mongoose is a widely-used object data mapping tool, often paired with the MongoDB driver, and it boasts an active JavaScript developer community. The open source API framework Stargate offers a new Mongoose driver called stargate-mongoose. It’s an alternative driver for Mongoose and it is based on Stargate’s new JSON API, which is a stand-alone microservice for Stargate that gives access to data stored in an Apache Cassandra® cluster using a JSON document-based interface.

This collaboration provides Mongoose developers with an open-source solution, marking a pivotal advancement and introducing a significant phase for Cassandra's evolution. Having stargate-mongoose cooperating with Mongoose and the new JSON API, JavaScript developers get a great JSON-oriented data model experience and the ability to build with Cassandra’s scalability and performance. 

Vector search

The Stargate JSON API and stargate-mongoose provide full support for Astra Vector Search, which empowers AI models with the ability to find specific sets of information in a collection that are the most closely related to a prescribed query. A crucial aspect of this process is the capability to save embedding vectors, which are sets of floating-point numbers used to represent the similarity between distinct objects or entities. Astra DB Vector Search integrates this feature into the serverless Astra DB database.

Architecture

The demo app is a Node.js application developed with the Express web application framework. It stores and fetches all data (including vectors) from Astra DB by using stargate-mongoose as the Mongoose driver. Stargate-mongoose relies on the Stargate JSON API to access Astra DB. 

As for the vector search part, the app uses the OpenAI embedding API to generate text embedding vectors and Google MediaPipe to generate image embedding vectors. Details for these will be discussed later.

Photography-Site app walkthrough 

Here, I’ll walk through the various operations that are supported by the app and show you some of the key API calls that make this possible.

Basic functionality

The app supports basic functionality such as image browsing by categories, exploring latest images, showing random images, adding images, and searching an image by name. 

The app presents images by category in the homepage. To store and fetch data in Astra DB using stargate-mongoose, we first need to construct the data model. Then one simple find method will fetch data for you.

const photoSchema = new mongoose.Schema({
  //schema fields
});
cons Photo = mongoose.model('photo', photoSchema);
const photosOfCategory = await Photo.find({ 'category': categoryName }).limit(limitNumber);

Once the app obtains the list of photos using the Photo model, it can populate the home screen:

Clicking on one specific photo will pull out its detailed information including photo name, photo category, and photo description.

Behind the scenes, this uses the Mongoose findById method to get the target photo from Astra DB.

const photo = await Photo.findById(photoId);

The app enables adding photos with photo name, photo description, category and photo image itself as input. 

When the user clicks “Add Photo,” the app creates a new Photo object and calls the save method; data will be saved into Astra DB.

const newPhoto = new Photo({
      name: req.body.name,
      description: req.body.description,
      category: req.body.category,
      image: newImageName,
      "$vector": description_embedding,  
    });
await newPhoto.save();

Text similarity search

The app enables searching photos by text similarity. You can describe the photo or scene you’re looking for, and take that as input to search. Behind the scenes, the feature uses text embeddings and DataStax vector search

Remember that everytime we add a photo, it requires a photo description as a data model field. You take this description text and call the OpenAI text embedding API to get the corresponding embedding vector. Similarly, when doing the text similarity search, you also get an embedding vector for the search text. Then, you can employ vector search by using the find and sort method of doing a similarity search.

const description_embedding = await getTextEmbedding(searchTerm);
cosnt photos = await Photo.find({}).sort({ $vector: { $meta:
description_embedding } }).limit(3);

In the following screenshot, you can see we searched for “a place for cows to eat.” Then we got two photo results. They both contain grass and have certain herbivores in it. So the text similarity search result makes sense.

Image similarity search

Besides text similarity search, another interesting feature I built is image similarity search. An image embedding vector needs to be generated first. This time, you use Google MediaPipe to generate an image embedding, and the specific model is mobilenet_v3_large.tflite. To perform this, you rely on python-shell to run the Python script in the NodeJS environment. After having image embedding vectors, then we can do vector search.

const photo_embedding = await getPhotoEmbedding(image); 
const photos = await PhotoEmbedding.find({}).sort({ $vector: { $meta:
photo_embedding } }).limit(3);

Here we have an image of “car in the sunset.” We can search it to get similar image results.

As you can see, we have three results. They are all in a sunset color tone and they all kind of follow a pattern which is light sky above and dark land below.

Give it a try

This demo app is in the stargate-mongoose-sample-apps repo now. You can run this demo in under two minutes by following the README instructions. If you’d like to run this demo using the JSON API in Astra DB, please contact us at json-preview@datastax.com and we will send you detailed instructions to get you started.

Additional resources

Discover more
Vector SearchStargateNode.js
Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.