Vector Store - Embeddings - Timescale

Michal S
Feb 24
3 min read

Updated: Feb 25

Vector store is used to name a database that is able to store information in vector format (Vectors are mathematical representations of data in a high-dimensional space — wikipedia). Each dimension in a vector represents a specific feature or characteristic of the data like relationship between words or color and shape if we deal with image data. The number of dimension depends on embedding model, each available embedding model can produce embeddings with different semantic meaning, each model can give back different amount of dimensions.

Abstract geometric mirrored reflections with a futuristic design, representing structured data storage in a vector store. — Photo by drmakete lab on Unsplash

Higher dimensions capture more detail but require more storage & computation. Although too many dimensions can make searches inefficient. So how to decide how many dimensions we should use ?

It’s generally best to limit each embedding to a short paragraph or around 100 tokens to maintain clarity and efficiency. But available embedding models offer much more for example openai embedding model called text-embedding-3-small by default offer 1536 dimensions

Some information like related to genomic data is characterized by a high dimensionality which is as well called “curse of dimensionality” where we have too many features which make it very difficult to store and compute the data.

Digital representation of DNA strands with a futuristic glow, symbolizing the structured organization of a vector store — Photo by Sangharsh Lohakare on Unsplash

Choosing the right balance between token length and vector size is a key to optimizing performance in similarity searches and AI applications. But we need to remember that everything starts with a recognition of a problem domain and nature of it’s data as that’s what drive how we transform pure information into vectors.

I want to create my own vector store and experiment where to start ? There is many available solutions like Timescale, Postgresql, Pinecone or Neon. Neon and Timescale are based on Postgresql. Postgresql is matured and opensource relational database server with large community. Postgresql on it’s own can store vector data with pgvector extension and allow you to implement NN(nearest neighbor) and ANN(approximate nearest neighbor, recommended for highly dimensional data).

PostgreSQL database integration with a vector store for optimized AI-driven data retrieval.

That’s where Timescale database comes in, it does all what postgresql can do but in addition to that it provide two additional extensions for postgresql pgvetorscale and pgai which allow you to fully automate the process of creating embeddings.

In simple words, you insert data to your timescale database server and timescale takes the rest for you, it connect with your embedding model like text-embedding-3-small from openai. That’s already allow you to implement NN or ANN algorithm and feed your AI applications.

If you like to experiment in your local environment with docker containers, timescale is fully open source and can be run in your local environment with docker and docker compose, ready to use example is here.

What Is a Vector Store?

A vector store is a specialized database designed to store, manage, and search for data in vector format. Vectors are mathematical representations of information, allowing AI models to understand relationships between words, images, or any complex data. Instead of storing raw text or images, a vector store saves embeddings—high-dimensional numerical representations that capture semantic meaning.

Why is this useful? Traditional databases struggle with similarity-based searches, like finding documents with related meaning or images that “look alike.” A vector store enables efficient nearest neighbor (NN) and approximate nearest neighbor (ANN) searches, making it a core component of AI applications, recommendation systems, and semantic search engines.

Many options exist, from PostgreSQL with pgvector to dedicated solutions like Pinecone or Timescale. The right choice depends on your use case—whether you need fine-tuned control, automation, or high-performance scaling. Whatever the path, a vector store is the foundation for making AI smarter and more context-aware.

Leave comments if you are blocked with something are struggle.