CompanyAugust 17, 2023

My Internship Journey: A Dive into AI

Anisha Rao
Anisha RaoIntern
My Internship Journey: A Dive into AI

Stepping into the realm of artificial intelligence and machine learning as a software engineering intern at DataStax was like embarking on a journey into the future. As a student, I was keenly interested in processing raw data to find intelligent results and the opportunity to work with professionals in a real-world setting, so this internship was a dream come true. The promise of developing applications that can learn, adapt, and make intelligent decisions was an exhilarating prospect. 

Here I’ll reflect on my internship journey, from the initial days of uncertainty to the practical lessons that shaped my knowledge of DataStax Astra DB, vector search, and CassIO and enhanced my understanding of how these solutions can be leveraged to handle AI tasks in real-world scenarios. In the end, I became a confident contributor to the team.

Diving into the unknown

On the first day, I was excited and apprehensive. I met the team and got an overview of the ongoing project. While I had a strong theoretical foundation in machine learning, the complexity of actual business problems was overwhelming. I saw this as an opportunity to expand my knowledge and skill set. To begin, I familiarized myself with DataStax's flagship product, Astra DB, a vector database-as-a-service built on Apache Cassandra, and the open-source CassIO. a framework for integrating Cassandra with generative AI workloads. My mentor assisted me in answering any questions I had, and gave me the chance to gain hands-on experience with the product.

Learning the ropes

One of the most valuable aspects of my internship was the learning environment. My mentor was patient and eager to guide me through the complexity and helped me navigate the tools and software, providing valuable insights into the open-source tool CassIO. Weekly knowledge-sharing sessions helped me understand the methodologies behind the software we were implementing and reinforced the importance of continuous learning in the field of AI/ ML. Throughout this internship, I delved into the complexities of how Astra DB seamlessly aligns with the requirements of generative AI endeavors.

From collaboration to insights to impact

Collaboration became a cornerstone of my internship experience. The brainstorming sessions, code reviews, and collaborative discussions opened my eyes to alternative approaches and improved my ability to communicate complex ideas effectively. 

During my internship, I was a part of the Engineering Integration team where I had the opportunity to contribute to various tasks, so, as the weeks went by, I transitioned from a spectator to a contributor. Here are a few highlights:

  • In the beginning, I delved into the fundamentals of Astra DB and CassIO, gaining a deeper understanding of their potential applications for developers. I reviewed the documentation to grasp essential principles, laying the foundation for my subsequent tasks.
  • Previously, CassIO was confined to integration solely within the large language model LLM) framework, which just encompasses the textual modality. After a meticulous evaluation, I identified this constraint and suggested enhancing its capabilities by including diverse modalities, including images and audio. To do this, I worked on the creation of two distinct notebook files:
    1. Sound Similarity Search with Vector Database This involves the development of an interactive user interface tailored for audio input in WAV/MP3 format. Subsequently, the system facilitates a comparison of the input audio against a vector database that contains audio embeddings, providing the user with the five most similar audio files. Read more about sound similarity vectors.
    2. Image Similarity Search with Vector Database Within this module, users can engage with an intuitive user interface to submit images in PNG/JPG/JPEG format. The system then performs a comparative analysis utilizing a vector database that consists of image embeddings, ultimately presenting the user with the five most similar images.Read more about image similarity vectors.

The above two features were pushed to the CassIO website in the production environment and are now available here. This expansion represents a significant evolution in CassIO's functionality, enabling a broader scope of applications and facilitating a richer user experience. This exercise enhanced my understanding of vector search and gave me insight into the CassIO code.

  • I also researched how to make it easier and faster to find things that are similar in a big group of documents stored in a vector database. I focused on using clustering algorithms to group documents in a way that ensures that similar embeddings reside in one cluster. This enables the user to find similar embeddings from a particular cluster, thus reducing the time it takes to search through an entire set of documents. It's like putting similar books on the same shelf in a library, making the similarity search more efficient and faster. 

Overcoming challenges

Not everything went smoothly. I hit roadblocks—models failed to get features for the entire dataset due to lack of RAM, unexpected data anomalies cropped up, and it was challenging to understand the process of creating pull requests for the CassIO website repository. These challenges taught me resilience and resourcefulness. I learned to approach problems systematically, seeking advice from mentors and using online resources to find solutions.

As my internship drew to a close, I found myself reflecting on the growth I had experienced. I had come a long way from being a novice intimidated by real-world data problems. I had gained proficiency in using industry-standard tools, understanding the nuances of data, and collaborating effectively with a team. But more importantly, I had developed confidence in my ability to apply techniques that I learned during my graduate school studies to solve practical challenges.

My internship was more than just a job experience; it was a transformative journey that enhanced my technical skills, improved my problem-solving abilities, and emphasized the importance of continuous learning. It taught me about effective communication, collaboration, and translating insights into actionable strategies. As I move forward in my academic and professional journey, I carry with me the invaluable lessons and experiences gained during this internship, ready to tackle new challenges and contribute meaningfully to the world of AI.

Want to take Astra DB’s vector search capabilities for a spin? Get started for free.

Discover more
DataStax
Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.