Search Smarter, Not Harder.

Add Pongo to your existing RAG pipeline with 1 line of code, and reduce incorrect RAG outputs by 80%.

Take Vector Search to the Next Level

Pongo's semantic filter technology can greatly improve the performance of a RAG pipeline using vector or hybrid search alone.

How it Works

Vector search compresses documents into a single vector leading to information loss, this combined with a lack of context on the query leads to suboptimal search results.

To fix this, our semantic filter analyzes the query and document together, utilizing multiple models to minimize information loss and hallucinations. This results in significantly higher accuracy compared to vector and hybrid search approaches alone.

“Pongo has made it incredibly easy to get accurate results when building RAG pipelines”
- Parsa Khazaeepoul, AI2 Startup Incubator

Works with your Existing Pipeline

Pongo sits right on top of your existing pipeline, whether you use a vector database or elasticsearch. Just send us your top 100-200 search results and we’ll return the relevant results.

Production Ready

Lightning Fast
Our distributed architecture ensures consistent latency whether you run 100 or 1,000,000 requests a day.
Zero Data Retention
Pongo only operates at runtime. No data from your queries is stored, and no data leaves our AWS VPC.


  • 500 free queries / mo
  • We'll work with you to integrate Pongo
$60 / mo
  • 60K queries / mo
  • Standard compute
  • $8 per addtl. 10k queries
$250 / mo
  • 350K queries / mo
  • 60% faster compute
  • $12 per addtl. 10k queries
  • Optional BYOC Deployment
  • Custom Models
  • 99.99% Uptime SLA


Can I self host Pongo?

Yes, Pongo can be deployed in a VPC. Just book a call with us, and we'll find the best option for you.

What is Pongo's latency?

Deploy tier is 600-650 ms for 100 documents of 512 tokens vs 350-400ms on the Lightning tier. By default requests are routed to US-West-2 in Oregon, please contact us if you need deployments in another region.

Is Pongo secure?

Yes, Pongo only operates at runtime. We store 0 data, and no data leaves our VPC in AWS. We are in the process of getting SOC2 compliance.

Can I fine-tune Pongo?

Yes, however fine-tuning Pongo is a complex process as we utlize multiple models, and requires a non-trivial amount of quality data samples. However we do offer fine-tuned models to enterprise customers.