Doti

At Doti, we're committed to delivering seamless, lightning-fast AI experiences for our enterprise search users. They need fast and accurate answers to get company information quickly, so choosing the right infrastructure was crucial. Recently, we made a significant shift in our AI infrastructure: from Azure OpenAI to Google's Gemini Flash 2. This decision wasn't easy, but the results speak for themselves. Here’s the inside story on why we made the switch.

‍

The Azure Challenge: Performance and Consistency

When we first launched our enterprise search platform, Doti, Azure OpenAI was our natural choice. However, we quickly encountered unexpected hurdles.

Customers expect Doti’s AI to save time by connecting them instantly to knowledge scattered across their systems, but our users began experiencing inconsistent response times, especially in regions in the United States. Since user experience is our top priority, combating these delays proved crucial.

We implemented load balancing across multiple Azure regions and Azure accounts. While this improved performance somewhat, it introduced complexity without fully resolving the core issues.

As shown in the chart below, Azure OpenAI usage was highly variable, with significant fluctuations and unpredictability in performance. After transitioning to Gemini Flash 2 (represented by the stable, consistent line), we experienced drastically improved stability and reliability.

*Latency: Google (orange line) vs. OpenAI (purple line) [lower is better]*

Searching for Alternatives: LLaMA and Beyond

Determined to provide our users with a more reliable and responsive experience, we explored several alternative solutions, including LLaMA. Despite promising initial tests, LLaMA fell short of our performance benchmarks and didn't address our infrastructure complexity. Plus, the accuracy and quality of the answers was just not as good as GPT-4o.

Discovering the Game Changer: Gemini Flash 2

Then came Google's Gemini Flash 2 — a transformative breakthrough for our team. From our very first tests, Gemini Flash 2 delivered the lightning-fast responses that our users expect, significantly outperforming Azure OpenAI. It also allowed us to streamline our infrastructure, eliminating the need for complex multi-region deployments.

The impact on our budget was equally impressive: Gemini Flash 2 proved nearly 100 times more cost-effective than Azure OpenAI. How?

Gemini Flash 2 tokens cost about 1/15th of GPT-4o tokens — already a 93% discount on raw pricing
Gemini 1M token context window enabled single API calls, eliminating thousands of redundant tokens we previously needed for multiple GPT-4o requests

This unlocked new capabilities that weren’t economically feasible with GPT-4o, allowing us to offer enterprise-grade quality at affordable prices.

Why We Still Use Multiple LLMs

While Gemini Flash 2 now handles most of our AI workloads, we strategically use multiple large language models (LLMs) like Azure OpenAI GPT-4o and LLaMA for specialized tasks.

GPT-4o remains our choice for complex language analysis and tasks requiring multi-language accuracy, but Gemini Flash 2 shines with its larger context windows, ideal for deeply understanding complex topics from extensive documentation across Slack, Salesforce, Jira, and other integrated platforms.

This balanced approach ensures our users consistently get rapid responses with the highest quality, contextually accurate responses.

Real-World Benefits for Our Users

Our move to Gemini Flash 2 directly benefits our end users. Here’s now:

Quicker Information Access: Users can now get critical company information twice as fast, improving their productivity
Reliability and Trust: Users are more confident in using Doti as a dependable AI tool, thanks to its consistent, predictable performance
Enhanced Security: Users enjoy tighter security and protection against malicious activity, since Gemini's affordability enables us to run multiple simultaneous verification checks on inputs

‍

Results That Matter

Since adopting Gemini Flash 2, Doti has delivered dramatically improved performance, consistency, and security. Our infrastructure is now simpler, costs are significantly reduced, and most importantly — our users are experiencing faster, more reliable, and highly accurate enterprise search results when searching for the company information they need.

‍

< Go back to Blog

Why Doti Switched from GPT4o to Google Gemini

April 29, 2025

|

|

The Azure Challenge: Performance and Consistency

Searching for Alternatives: LLaMA and Beyond