Building a Q&A App on private data with Open Source LLM

Published on: Nov. 30, 2023, 10:23 a.m.

QR Code

Project Objectives

A Deep Dive into Open Source LLMs for Private Q&A

In my previous posts , I've shared my journey with the OpenAI API, specifically in building a Q&A application. Curiosity led me to the next challenge: could I achieve similar results using open source LLMs?

Beginning with Google searches and inquiries to GPT-4, I hoped to uncover a straightforward guide. Unfortunately, my quest initially led me to a maze of clickbait with little substance.

The potential for open source Q&A on private data is vast and could revolutionize organizational knowledge management. I've delved into the benefits and drawbacks in another blog .

Perseverance paid off when I returned to an old haunt: Kaggle. This platform was instrumental in my deep learning ventures, particularly in satellite image recognition using CNNs. Years after it guided me through the nuances of deep learning for satellite image recognition, Kaggle's comprehensive resources now paved the way for this new endeavor. It was here that I discovered a Jupyter notebook that became the cornerstone of my project.

My objective was clear: run the LLMs locally on my MacBook Pro. I was determined to gauge the performance of a 16GB RAM MacBook without external GPU/TPU resources.

Adapting the source code for macOS was my first hurdle. I quickly realized that Conda was indispensable, leaving Virtualenv by the wayside for this complex endeavor.

MacBook users, take note: the LlamaCpp library is crucial. I was able to get the LLMs functioning on my device, but code modifications were necessary, particularly with embeddings, where I pivoted to OpenAI for simplicity.

Gradio's Chat UI was the interface that brought this all together, delivering a user experience that was as familiar as it was functional, with little need for further refinement. Gradio delivered more than expected.

QR Code

This exploration has not only highlighted the viability of constructing private Q&A systems using open source LLMs but has also showcased the unexplored potential that lies within the synergy of community-driven resources like Kaggle and the advancement of technology.

For those looking to navigate the rich waters of private data with the guidance of open source LLMs, the journey has never been more accessible. If integrating such an AI solution into your organization is on the horizon, reaching out could be the first step toward tapping into a reservoir of untapped capability.

Link to Jupyter notebook on GitHub .

Lessons Learned

QR Code