I got this email this week.
We were building similar solutions at my agency, but we've since redirected our focus elsewhere so I'm referring our clients to established providers, and I think you'd be the best fit of all the custom chatbot SaaS I've researched.
Now, this agency is extremely technical with significant development resources. In fact, the principals even run a tech podcast with thousands of followers.
Now what would make them abandon their "Do It Yourself" approach with Langchain?
Short Answer
For the same reason you would not run a server in your basement (you would use Amazon AWS!), trying to build and run your own RAG (Retrieval Augmented Generation) system means that you now become the proud owner of dealing with all the production-related issues.
The Problems With Langchain
When I jumped on the call, it became obvious what the issues were. Lets get started:
Hallucinations
Dealing with hallucinations requires putting in anti-hallucination measures at each and every step of the RAG pipeline. Hallucinations is not a simple "add this to the prompt" type of thing.
For every design decision you make in the entire pipeline — from data ingestion to final response — you need to ask yourself "How does this affect hallucinations?" — we've seen this the hard way dealing with thousands of customers.
Hallucinations alone would have taken thousands of engineering manhours in our system (See why).
Data ingestion issues
The second big killer is: Data ingestion. On paper, ingesting PDFs sounds simple. But then when you get into production use and you are importing 1000s of documents and webpages, does the ingestion work seamlessly? Is each document accountable with a clear audit log? Is the pipeline resilient to failures? How do you refresh or re-process a document?
While all the media attention is on AI, when it comes to running a production RAG pipeline, the silent killer is actually data ingestion. There is a reason data ingestion takes up almost 40% of our engineering time. It's because every data format and data source has it's own set of intricacies.
For example, try ingesting Youtube videos. You will quickly notice the nightmare that is.
There is a reason that Langchain has FIVE different PDF parsers. Nobody knows which one to use and under what conditions. Those design decisions nicely fall into developer hands. Cross your fingers and hope everything works.
Citations / Sources
The good news with Langchain is that you can quickly prototype and show off results in a very short time. Sweeet!
And then you try to demo it to your boss or CEO and the first question is "Where did that response come from?" or "How was that response computed?" or (even worse!) "Wait — that makes no sense at all!"
The solution: Clear citations and sources around the responses. Langchain does not build this for you — you have to yourself build a citation algorithm and show transparency and trust around the responses.
Query relevancy issues
While most tutorials around RAG pipelines deal with the "Happy Case", in real life, user's dont know how to query. They enter queries like "Hmm — ok", "yeah, tell me more", "2", "yes", "that one", "ok — yeah" and more.
Unless you've clearly built a process around understanding the query intent, normal users are going to chat with the chatbot like they regularly do on Livechat to a real human. We see this ourselves in our bot.
Even worse: Some users just don't know how to hold a conversation. Their minds get blocked even though they have clear needs. Does your bot know how to engage the user and lead him towards satisfaction? (NPS scores!)
Maintainance and MLOps
So this part is mostly overlooked, but each time OpenAI comes up with a new feature or releases a new model (e.g. June 13th release), how does your RAG pipeline get affected?
For example: With the June 13th release, for some reason, with some customers, our bot started responding in Spanish to normal English queries.
This was quite frustrating because we had to hunt down the reason and get it fixed (Side note: It was some single word we were using in the `system` message)
But this is not the only thing. How do you deal with rate limiting issues? Or API downtime issues?
And best of all: The investigations when the customer or the boss says "Hey, why is the bot responding like this for this question?"
Economy of Scale
The reason we all love cloud platforms like AWS is: The huge cost of developing them is spread out across millions of customers. And so we get a nice AWS instance for much less than the TCO cost of running it ourselves.
Same is true of the OpenAI API as well. OpenAI spent the millions to build the LLM and we now all get it for a pay-for-use cost of $1 (or less)
When you "Build It Yourself", the cost of development is "Divided By 1".
When OpenAI fixes an issue in their LLM, we all get the benefit of it. When there is an issue in your self-developed RAG pipeline, you bear the full development cost yourself.
This is why running Langchain in production is so expensive — every little issue has to be debugged by you and your team!
For example: We spent a couple of 100s of engineering hours fixing our Youtube video ingestion pipeline. Now our thousands of customers ALL get the benefit of it — for pennies, NOT tens of thousands of dollars.
Security
One of the benefits of Langchain is that you get to control the data security of your documents. And so it gives you the ability to run Langchain within your own VPC or on-premise infrastructure. That is indeed great.
However, there are 3 aspects to security and you will need to consider ALL 3 of them. In particular: 1. Data Security: The ingestion and subsequent deletion of your documents and resources. Plus the at-rest security of your chunks and vectors. In addition, if you need PII removal and/or anonymization, that is something you will need to implement too. Nothing too difficult, but you will need to build it out.
2. Chat Security: If the chatbot is being used by untrusted users, do you have chat security built in? In particular, NSFW queries or jailbreaking attempts?
3. Chat Access Security : Now that you have the chatbot built, who can access it? Do you have SSO built? Or a Teams access feature? Who gets access and is that logged and audited? Will you be building out an entire access control system?
Audits & Analytics
The biggest impediment to Gen AI deployment these days are CISOs (Chief Information Security Officer) blocking deployments (there is just too much FUD!).
Has your deployment plan included full audit trails to see what the AI is saying and other aspects of access?
Or on a positive front: Do you plan on implementing a dashboard and analytics to glean insights from the chat logs? (specially when the boss or other stakeholders ask "Hey, can you tell me what's going on in the bot?")
Ongoing Development
Last but not least, who would be responsible for ongoing maintenance and continued development as technologies evolve. OpenAI (and others) seem to be releasing new features almost weekly. Who keeps on top of this and incorporates the new features (or even worse, sun setting) after your solution has been deployed?
Conclusion
While Langchain is really nice to get started and an excellent educational tool, it is NOT designed to take real-life use cases into production.
For the same reason you would not go down to Radioshack to assemble your own server, building your own RAG pipeline is akin to that. While fun to do, when push comes to shove and it's time for real production use, running your own RAG system is just like buying each part of a computer and trying to assemble it yourself.
Yeah — we did that decades ago — now we just buy from Dell or Apple.
The author is CEO @ CustomGPT, a no-code/low-code cloud RAG platform that let's any business build RAG chatbots with their own content. This blog post is based upon experiences working with thousands of business customers over the last 8 months (since the ChatGPT API was introduced).