Vector Databases Explained: A Practical Tutorial

Vector databases have become essential infrastructure for AI applications—but if you’re not steeped in machine learning, they can seem mysterious. What exactly are embeddings? Why can’t regular databases handle them? What’s all this about “semantic search”?

I was confused by vector databases when I first encountered them, so I appreciate the learning curve. After building several production systems with them, I’ve come to see them as genuinely elegant solutions to a real problem: making AI applications that actually understand meaning, not just keywords.

This tutorial explains vector databases from first principles, walks through how they work technically, compares the major options, and includes working code examples you can run yourself. By the end, you’ll understand when to use vector databases, how to choose between them, and how to build a basic retrieval-augmented generation (RAG) system.

What are Embeddings?

Before understanding vector databases, you need to understand embeddings—the data they store.

From Text to Numbers

Computers don’t understand words the way humans do. They work with numbers. For decades, we’ve converted text to numbers in crude ways: representing documents as word counts, or converting characters to ASCII codes. These representations captured nothing about meaning.

Modern AI models, however, can convert text (or images, audio, video) into numerical representations that actually capture semantic meaning. These representations are called embeddings.

An embedding is an array of floating-point numbers—typically 384 to 4,096 numbers depending on the model. (Understanding how tokens work in AI helps grasp why embeddings have specific dimensions.) Here’s what a tiny portion of an embedding might look like:

# Embedding for "How to train a machine learning model"
[0.0123, -0.0456, 0.1234, 0.0567, -0.0891, 0.0234, ...]  # 1536 numbers total

The magic is in what these numbers represent. Each dimension captures some aspect of meaning—though not in human-interpretable ways. The key property is that semantically similar items have geometrically similar embeddings.

This means:

“How to train a machine learning model” and “ML model training tutorial” produce similar embeddings
“Best pizza in New York” produces a very different embedding

Creating Embeddings

You generate embeddings by passing data through an embedding model. Here’s how with OpenAI’s API:

from openai import OpenAI

client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-large",
    input="What is a vector database?"
)

embedding = response.data[0].embedding  # List of 3072 floats
print(f"Embedding dimension: {len(embedding)}")  # 3072

Other popular embedding models include:

OpenAI text-embedding-3-large/small: High quality, easy to use, API-based
Cohere embed-v3: Strong multilingual support
Google Gemini embeddings: Integrated with Google Cloud
Sentence Transformers (open source): Free, run locally, many model variants
Voyage AI: Specialized for code and legal documents

The choice of embedding model matters. Different models produce embeddings of different dimensions and quality. Once you’ve embedded your data with a particular model, you generally need to re-embed everything if you switch models.

Semantic Similarity

The core insight is that embeddings enable semantic similarity—finding things with similar meaning rather than similar words.

Traditional search matches keywords:

Query: “car” matches documents containing “car”
Misses: documents about “automobile,” “vehicle,” “sedan”

Semantic search with embeddings:

Query: “car” finds documents about cars regardless of specific words used
Matches: “automobile maintenance,” “vehicle repair,” “sedan reviews”

This works because the embedding model learned semantic relationships from vast training data. It understands that “car” and “automobile” mean the same thing, even though they’re different strings.

What is a Vector Database?

A vector database is a specialized database designed to store, index, and query embeddings efficiently. While you could store embeddings in a regular database, vector databases are optimized for the specific operations AI applications need.

Why Not Regular Databases?

Traditional databases excel at exact matches: “Give me all users where country = ‘USA’ and age > 25.” They use indexes like B-trees that work well for this.

But with embeddings, you’re asking: “Give me the 10 items most similar to this embedding.” This requires comparing your query embedding against potentially millions of stored embeddings—a fundamentally different operation.

Regular databases can do this (PostgreSQL with pgvector, for example), but they’re not optimized for it. At scale, you need specialized data structures and algorithms.

How Vector Databases Work

Vector databases use specialized indexing algorithms to make similarity search fast. The key insight is that you don’t need perfect accuracy—finding “approximately” the most similar items is usually good enough, and much faster.

The most common algorithm is HNSW (Hierarchical Navigable Small World). It creates a multi-layer graph structure where:

Higher layers contain fewer nodes, allowing quick navigation to the right neighborhood
Lower layers contain more nodes for fine-grained search
Queries traverse from top to bottom, progressively refining

Other algorithms include:

IVF (Inverted File Index): Clusters vectors and searches only relevant clusters
PQ (Product Quantization): Compresses vectors to reduce memory usage
Annoy (Approximate Nearest Neighbors Oh Yeah): Tree-based, used by Spotify

These algorithms trade perfect accuracy for speed. A query might find 95% of the true nearest neighbors in 10ms rather than 100% in 10 seconds. For most applications, this tradeoff is excellent.

Core Operations

Vector databases typically support:

Insert: Add embeddings with associated metadata
Query: Find k nearest neighbors to a query embedding
Filter: Combine similarity search with metadata filters
Delete/Update: Modify stored data

Here’s what a typical query looks like conceptually:

# Find the 5 most similar documents to a query
results = vector_db.query(
    vector=query_embedding,
    top_k=5,
    filter={"category": "technology", "year": {"$gte": 2024}}
)

Similarity Metrics

Vector databases support different ways to measure similarity:

Cosine Similarity: Measures the angle between vectors. Values range from -1 to 1, where 1 means identical direction. Most common for text embeddings.

Euclidean Distance (L2): Measures straight-line distance between points. Smaller is more similar.

Dot Product: Similar to cosine but also considers magnitude. Useful when magnitude is meaningful.

For normalized embeddings (like most text embedding models produce), cosine similarity and dot product give equivalent rankings.

How Similarity Search Works: A Deep Dive

Let’s trace through what happens when you search a vector database.

Step 1: Embed the Query

First, you convert your query into an embedding using the same model used for your stored documents:

query = "How do neural networks learn?"
query_embedding = embed(query)  # [0.023, -0.089, 0.156, ...]

Step 2: Approximate Nearest Neighbor Search

The database uses its index structure to find candidates without examining every vector. With HNSW:

Start at a random node in the top layer
Greedily move toward the query embedding
Drop to the next layer when stuck
Repeat until the bottom layer
Return the nearest neighbors found

This typically examines hundreds of vectors rather than millions, completing in milliseconds.

Step 3: Apply Filters (if any)

If you specified metadata filters, the database applies them—either before the similarity search (pre-filtering) or after (post-filtering). Pre-filtering is more efficient but can sometimes skip relevant results.

Step 4: Return Results

You get back the top-k most similar items with their similarity scores and metadata:

[
    {"id": "doc_123", "score": 0.92, "metadata": {"title": "Deep Learning Basics"}},
    {"id": "doc_456", "score": 0.87, "metadata": {"title": "Neural Network Training"}},
    {"id": "doc_789", "score": 0.81, "metadata": {"title": "Backpropagation Explained"}},
]

Understanding Scores

Similarity scores have different meanings depending on the metric:

Cosine similarity: 0.95+ is very similar, 0.7-0.9 is reasonably similar, below 0.5 is probably not relevant
Euclidean distance: Lower is better, scale depends on embedding dimensions
Dot product: Higher is more similar, scale depends on normalization

These thresholds are rough—what counts as “similar enough” depends on your application.

Popular Vector Databases Compared

The vector database market has exploded. Here’s my assessment of the major players as of early 2026.

Pinecone

Type: Fully managed cloud service
Best for: Production applications where you want minimal operations burden

Pinecone is the most polished managed vector database. You create an index, insert vectors, and query—that’s it. No infrastructure to manage, no tuning required, auto-scaling handles traffic spikes.

from pinecone import Pinecone

pc = Pinecone(api_key="your-key")
index = pc.Index("my-index")

# Upsert vectors
index.upsert(vectors=[
    {"id": "doc1", "values": embedding, "metadata": {"title": "Article One"}}
])

# Query
results = index.query(vector=query_embedding, top_k=5, include_metadata=True)

Pros:

Zero ops burden—just works
Excellent documentation and SDKs
Serverless pricing scales from free tier to enterprise
Fast query performance

Cons:

Vendor lock-in (cloud only)
Cost can add up at scale
Less flexibility than self-hosted options

Pricing: Free tier (limited), then usage-based starting around $0.033/hour per pod

Weaviate

Type: Open-source, available as managed cloud or self-hosted
Best for: Teams wanting flexibility with good developer experience

Weaviate combines vector search with traditional search capabilities (“hybrid search”). It also has built-in modules for generating embeddings, so you can insert raw text and Weaviate handles embedding.

import weaviate

client = weaviate.Client("http://localhost:8080")

# Create schema with auto-vectorization
client.schema.create_class({
    "class": "Document",
    "vectorizer": "text2vec-openai",
    "properties": [{"name": "content", "dataType": ["text"]}]
})

# Insert (Weaviate generates embeddings)
client.data_object.create({"content": "Vector databases are fascinating"}, "Document")

# Query
result = client.query.get("Document", ["content"]).with_near_text({"concepts": ["embeddings tutorial"]}).do()

Pros:

Open-source with commercial support available
Hybrid search (vector + keyword) built-in
Auto-vectorization modules
GraphQL API is powerful and flexible
Can self-host or use Weaviate Cloud

Cons:

More complex to operate self-hosted
Performance requires tuning at scale
Learning curve for GraphQL API

Pricing: Open-source free, Weaviate Cloud has free tier then usage-based

Chroma

Type: Open-source, primarily local
Best for: Prototyping, development, smaller applications

Chroma is the easiest vector database to start with. Install it via pip, and you’re running a local vector database in seconds. It’s perfect for learning, prototyping, or smaller applications.

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_collection")

# Add documents with auto-embedding
collection.add(
    documents=["Vector databases store embeddings", "Embeddings capture semantic meaning"],
    metadatas=[{"source": "docs"}, {"source": "docs"}],
    ids=["doc1", "doc2"]
)

# Query
results = collection.query(query_texts=["What are embeddings?"], n_results=2)

Pros:

Incredibly easy to get started
Runs locally with no external dependencies
Good Python integration
Built-in embedding functions
Integrates well with LangChain and LlamaIndex

Cons:

Not designed for large-scale production
Limited enterprise features
Single-node architecture
Less mature than alternatives

Pricing: Open-source, free

Other Notable Options

Milvus: Open-source, highly scalable, complex to operate. Good for very large deployments.

Qdrant: Open-source, Rust-based, excellent performance. Growing in popularity.

PostgreSQL + pgvector: Add vector capabilities to existing Postgres. Great if you’re already on Postgres and have modest scale.

Supabase Vector: pgvector with Supabase’s developer experience.

MongoDB Atlas Vector Search: Native vector search in MongoDB. Good if you’re already in the MongoDB ecosystem.

Building a RAG System: Complete Code Tutorial

Let’s build a complete Retrieval-Augmented Generation (RAG) system. RAG is the most common pattern for connecting LLMs to your own data: retrieve relevant context from a vector database, then pass it to an LLM for generation.

The Architecture

Indexing: Convert your documents to embeddings and store them
Retrieval: Given a query, find relevant documents
Generation: Pass retrieved documents to an LLM as context

Full Working Example

Here’s a complete example using Chroma and OpenAI:

import chromadb
from openai import OpenAI

# Initialize clients
chroma_client = chromadb.Client()
openai_client = OpenAI()

# Create collection with OpenAI embeddings
collection = chroma_client.create_collection(
    name="knowledge_base",
    embedding_function=chromadb.utils.embedding_functions.OpenAIEmbeddingFunction(
        api_key="your-openai-key",
        model_name="text-embedding-3-small"
    )
)

# Index your documents
documents = [
    "Vector databases store high-dimensional vectors called embeddings.",
    "Embeddings are created by AI models and capture semantic meaning.",
    "Similarity search finds the most similar vectors to a query.",
    "RAG combines retrieval with LLM generation for accurate answers.",
    "Pinecone is a managed vector database service.",
    "Chroma is an open-source embedding database for developers.",
]

collection.add(
    documents=documents,
    ids=[f"doc_{i}" for i in range(len(documents))]
)

def ask(question: str) -> str:
    """Answer a question using RAG."""
    # Retrieve relevant documents
    results = collection.query(query_texts=[question], n_results=3)
    context = "\n".join(results["documents"][0])
    
    # Generate answer with context
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Answer based on this context:\n\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    
    return response.choices[0].message.content

# Try it
answer = ask("What is Chroma and what is it used for?")
print(answer)

What’s Happening

We create a Chroma collection with OpenAI’s embedding function
Documents are automatically embedded when added
When querying, the question is embedded and similar documents are found
Retrieved documents become context for the LLM
The LLM generates an answer grounded in your data

Scaling Up

For production, you’d extend this with:

Chunking: Split large documents into smaller pieces
Metadata filtering: Store and filter by document source, date, etc.
Re-ranking: Use a second model to re-rank retrieved results
Hybrid search: Combine vector and keyword search
Caching: Cache embeddings and responses

Check out frameworks like LangChain and LlamaIndex that handle these concerns.

Choosing the Right Vector Database

The best choice depends on your specific situation.

Choose Pinecone if:

You want fully managed, zero-ops experience
You’re building production applications
You want great documentation and support
Budget isn’t the primary constraint

Choose Weaviate if:

You need hybrid (vector + keyword) search
You want flexibility between cloud and self-hosted
You need built-in embedding generation
You’re comfortable with moderate operational complexity

Choose Chroma if:

You’re prototyping or learning
You want the fastest path to working code
Your application is relatively small-scale
You prefer everything running locally

Choose pgvector if:

You’re already using PostgreSQL
Your scale is modest (millions of vectors)
You want to minimize new infrastructure
Your team knows Postgres well

Key Questions to Ask

Scale: How many vectors? Millions? Billions?
Operations: Can you manage infrastructure, or do you need fully managed?
Latency: What query latency is acceptable?
Filtering: Do you need complex metadata filtering?
Hybrid search: Do you need keyword search alongside vector search?
Cost: What’s your budget at expected scale?

Best Practices for Vector Databases

Chunking Strategy

Large documents should be split into chunks before embedding. Common approaches:

Fixed-size chunks: 512 tokens, 1000 characters, etc.
Semantic chunking: Split at paragraph or section boundaries
Overlapping chunks: Include overlap to maintain context

Chunk size affects retrieval quality—too small loses context, too large dilutes relevance.

Metadata Design

Store relevant metadata with each vector:

Source document ID
Chunk position within document
Creation/update timestamps
Categories, tags, permissions

Good metadata enables filtering and helps with post-processing.

Embedding Model Selection

Match embedding dimensions to your quality/speed needs
Use the same model for indexing and querying
Consider domain-specific models for specialized content
Test different models on your actual data

Index Tuning

Most vector databases let you tune index parameters:

ef_construction/m (HNSW): Higher = better quality, more memory
nlist/nprobe (IVF): Balance between speed and recall

Start with defaults, tune based on your specific recall/latency requirements.

Monitoring

Track:

Query latency distributions
Index size and memory usage
Recall quality (if you have ground truth)
Embedding generation costs

Frequently Asked Questions

How many vectors can vector databases handle?

Modern vector databases can handle billions of vectors at production scale. Pinecone, Milvus, and Qdrant are designed specifically for this scale, using distributed architectures that spread vectors across multiple nodes. Smaller deployments (millions of vectors) work well even with simpler solutions like pgvector or Chroma. Your choice depends on expected scale and growth trajectory—start simple, scale up as needed.

How much do vector databases cost?

Costs vary widely based on scale and hosting model:

Self-hosted open source (Chroma, Milvus, Qdrant): Free software, but you pay for infrastructure
Pinecone: Free tier for development, then usage-based pricing starting around $70/month for basic production use
Weaviate Cloud: Similar to Pinecone with generous free tier
pgvector: Free if you already have PostgreSQL

At enterprise scale with billions of vectors, expect costs of $500-5,000+/month for managed services. Self-hosting can be cheaper but requires DevOps expertise.

Do I need a vector database for small projects?

Not necessarily. For prototypes or small applications with fewer than 100,000 vectors, you have simpler options:

Store embeddings in SQLite or any database
Use in-memory structures if your dataset fits
Use libraries like FAISS for local similarity search

Vector databases become valuable when you need: fast queries at scale, production reliability, metadata filtering, or features like hybrid search. For a weekend project, don’t overthink it—Chroma in-memory is often enough.

What’s the difference between vector databases and traditional search?

Traditional search (like Elasticsearch) finds documents based on exact keyword matches and uses techniques like TF-IDF or BM25 for ranking. It works well when users search using the same words documents contain.

Vector search finds semantically similar content even when words don’t match at all. “How to fix a slow laptop” matches documents about “computer performance optimization” because the embeddings are similar even though the words are different.

Many applications benefit from combining both approaches—this is called hybrid search. Weaviate and some other databases support this natively.

How do I test if my vector search is working well?

Evaluate retrieval quality using:

Manual inspection: Query with test questions and check if retrieved documents are actually relevant
Recall@k: If you have labeled data, measure what percentage of known-relevant documents appear in top-k results
Hit rate: For conversational AI, measure how often retrieved context actually helps answer questions
User feedback: In production, track whether users find answers helpful
A/B testing: Compare different embedding models, chunk sizes, or configurations

Quality testing is crucial—bad retrieval means bad AI answers regardless of how good your LLM is.

Can I use vector databases for images or audio?

Absolutely. Embeddings work for any data type that can be converted to vectors:

Images: Use CLIP, OpenAI’s image embeddings, or other vision models
Audio: Use Whisper embeddings or specialized audio models
Video: Embed frames or use video-specific models
Code: Use code-specific embedding models like Voyage or CodeBERT

The principle is identical: convert your data to embeddings, store in a vector database, query with embeddings of the same type. A product image search stores image embeddings; a music recommendation system stores audio embeddings.

What about real-time updates?

Most vector databases support real-time upserts and deletes, though performance varies. Considerations:

Index updates may briefly impact query performance
Some databases (like Pinecone) handle updates more gracefully than others
For very high update rates, check benchmarks for your specific database
Consider batch updates during low-traffic periods for non-critical updates

How do vector databases handle security and access control?

Enterprise vector databases offer:

API key authentication
Role-based access control (RBAC)
Namespace/collection isolation
Encryption at rest and in transit
Audit logging

For sensitive data, ensure your vector database meets your compliance requirements (SOC 2, HIPAA, etc.). Managed services like Pinecone Enterprise and Weaviate Cloud Professional offer compliance certifications.

Real-World Use Cases

Understanding common patterns helps you apply vector databases effectively.

Semantic Search / Enterprise Search

The most common use case: let users search by meaning, not just keywords. An enterprise search system might index:

Internal documentation
Slack messages
Email archives
Meeting transcripts
Code repositories

Users can then ask natural questions (“What was the decision about pricing in Q3?”) and find relevant content regardless of exact terminology used.

Retrieval-Augmented Generation (RAG)

The pattern we coded earlier: LLMs generate more accurate answers when given relevant context. RAG combines:

Vector database for retrieval
LLM for generation

This powers most “chat with your data” applications—AI that can answer questions about your documents, codebase, or knowledge base.

Recommendation Systems

Recommend similar items based on embeddings:

E-commerce: “Customers who viewed this also viewed…”
Content: “Articles similar to this one”
Music/Video: “Because you liked X, you might like Y”

Calculate the embedding of what the user engaged with, find nearest neighbors, and recommend.

Duplicate Detection / Deduplication

Find semantically similar content that might be duplicates:

Customer support: Find similar past tickets to suggest solutions
Content moderation: Detect rephrased harmful content
Data quality: Identify near-duplicate database records

Even when text is rephrased, embeddings remain similar, enabling semantic deduplication.

Anomaly Detection

Train embeddings on normal data, then flag items whose embeddings are distant from any training examples. Applications include:

Fraud detection
Log analysis
Sensor data monitoring
Security threat detection

Image Search

Store embeddings of product images; let users search by uploading similar images or by text description (using multimodal embeddings like CLIP).

Personalization

Embed user behavior and preferences; find similar users for collaborative filtering or embed content and user profiles for content-based recommendations. Modern recommendation systems increasingly use embeddings rather than traditional collaborative filtering because embeddings can capture more nuanced similarities.

The Future of Vector Databases

Vector databases continue to evolve as AI applications become more sophisticated. Several trends are shaping the space:

Hybrid Search Standardization

The combination of vector search and traditional keyword search is becoming standard. Most AI applications benefit from both: keywords for precise matches, vectors for semantic understanding. Expect all major databases to support hybrid search natively within the next year.

Multimodal Embeddings

As multimodal AI models improve (think models that understand images, text, and audio together), vector databases are adapting to store and query multimodal embeddings. This enables applications like “find images that match this text description” or “find documents similar to this video.”

Improved Integration with ML Pipelines

Vector databases are integrating more tightly with ML tooling—automatic embedding generation, model versioning, and pipeline orchestration. The lines between vector database, feature store, and ML platform continue to blur.

Edge Deployment

Smaller, efficient vector databases that can run on edge devices (phones, IoT devices) are emerging. This enables local semantic search without round trips to cloud servers—important for latency-sensitive and privacy-focused applications.

Native AI Assistance

Vector databases will increasingly include AI-native features: automatic chunking suggestions, embedding model recommendations, and intelligent query optimization. The goal is making the right choices without requiring deep expertise.

Getting Started: Your Action Plan

If you’re new to vector databases, here’s a practical path forward:

Week 1: Experiment with Chroma locally. Install via pip, index some documents, run queries. Get comfortable with the concepts before worrying about production concerns.

Week 2: Build a simple RAG application using the code template in this guide. Connect a vector database to an LLM and see how retrieval improves answer quality.

Week 3: Experiment with different embedding models. Try OpenAI’s embeddings, then try an open-source alternative like sentence-transformers. Notice how embedding quality affects retrieval.

Week 4: If you have a real use case, evaluate production options. Consider Pinecone for managed simplicity, Weaviate for flexibility, or pgvector if you want to minimize new infrastructure.

Ongoing: Keep testing. Vector search quality depends heavily on embedding model choice, chunking strategy, and tuning. Invest in evaluation pipelines that measure what matters for your application.

Wrapping Up

Vector databases have become fundamental infrastructure for AI applications. They solve a real problem—efficiently finding semantically similar content—that becomes essential as soon as you want AI to work with your data rather than just its training data.

The core concepts are straightforward: embeddings capture meaning in numbers, and vector databases make finding similar embeddings fast. The ecosystem has matured significantly over 2024-2026, with options ranging from simple local development (Chroma) to fully managed enterprise scale (Pinecone) and everything in between.

What’s exciting is how vector databases enable new categories of applications. Before embeddings, building a “search by meaning” system was a research project. Now it’s an afternoon of coding. Before RAG, connecting LLMs to private data was complex and error-prone. Now it’s a established pattern with well-documented best practices.

If you’re building AI applications in 2026, understanding vector databases isn’t optional—it’s foundational. Whether you’re building RAG systems for customer support, semantic search for enterprise knowledge, recommendation engines for content platforms, or any application that needs to find relevant content intelligently, vector databases provide the infrastructure to do it well.

Start experimenting. The learning curve is gentler than you might expect, and the capabilities you gain enable entirely new kinds of applications. The AI developer’s toolkit keeps expanding, and vector databases are now a core part of what makes modern AI applications work.

For related topics, check out our guides on building AI agents and getting started with AI development. Happy building!