AI & Vectors

RAG with Permissions

Fine-grain access control with Retrieval Augmented Generation.

Since pgvector is built on top of Postgres, you can implement fine-grain access control on your vector database using Row Level Security (RLS). This means you can restrict which documents are returned during a vector similarity search to users that have access to them. Supabase also supports Foreign Data Wrappers (FDW) which means you can use an external database or data source to determine these permissions if your user data doesn't exist in Supabase.

Use this guide to learn how to restrict access to documents when performing retrieval augmented generation (RAG).

Example

In a typical RAG setup, your documents are chunked into small subsections and similarity is performed over those sections:


_16
-- Track documents/pages/files/etc
_16
create table documents (
_16
id bigint primary key generated always as identity,
_16
name text not null,
_16
owner_id uuid not null references auth.users (id) default auth.uid(),
_16
created_at timestamp with time zone not null default now()
_16
);
_16
_16
-- Store the content and embedding vector for each section in the document
_16
-- with a reference to original document (one-to-many)
_16
create table document_sections (
_16
id bigint primary key generated always as identity,
_16
document_id bigint not null references documents (id),
_16
content text not null,
_16
embedding vector (384)
_16
);

Notice how we record the owner_id on each document. Let's create an RLS policy that restricts access to document_sections based on whether or not they own the linked document:


_12
-- enable row level security
_12
alter table document_sections enable row level security;
_12
_12
-- setup RLS for select operations
_12
create policy "Users can query their own document sections"
_12
on document_sections for select to authenticated using (
_12
document_id in (
_12
select id
_12
from documents
_12
where (owner_id = (select auth.uid()))
_12
)
_12
);

Now every select query executed on document_sections will implicitly filter the returned sections based on whether or not the current user has access to them.

For example, executing:


_10
select * from document_sections;

as an authenticated user will only return rows that they are the owner of (as determined by the linked document). More importantly, semantic search over these sections (or any additional filtering for that matter) will continue to respect these RLS policies:


_10
-- Perform inner product similarity based on a match_threshold
_10
select *
_10
from document_sections
_10
where document_sections.embedding <#> embedding < -match_threshold
_10
order by document_sections.embedding <#> embedding;

The above example only configures select access to users. If you wanted, you could create more RLS policies for inserts, updates, and deletes in order to apply the same permission logic for those other operations. See Row Level Security for a more in-depth guide on RLS policies.

Alternative scenarios

Every app has its own unique requirements and may differ from the above example. Here are some alternative scenarios we often see and how they are implemented in Supabase.

Documents owned by multiple people

Instead of a one-to-many relationship between users and documents, you may require a many-to-many relationship so that multiple people can access the same document. Let's reimplement this using a join table:


_10
create table document_owners (
_10
id bigint primary key generated always as identity,
_10
owner_id uuid not null references auth.users (id) default auth.uid(),
_10
document_id bigint not null references documents (id)
_10
);

Then your RLS policy would change to:


_10
create policy "Users can query their own document sections"
_10
on document_sections for select to authenticated using (
_10
document_id in (
_10
select document_id
_10
from document_owners
_10
where (owner_id = (select auth.uid()))
_10
)
_10
);

Instead of directly querying the documents table, we query the join table.

User and document data live outside of Supabase

You may have an existing system that stores users, documents, and their permissions in a separate database. Let's explore the scenario where this data exists in another Postgres database. We'll use a foreign data wrapper (FDW) to connect to the external DB from within your Supabase DB:

Let's assume your external DB contains a users and documents table like this:


_12
create table public.users (
_12
id bigint primary key generated always as identity,
_12
email text not null,
_12
created_at timestamp with time zone not null default now()
_12
);
_12
_12
create table public.documents (
_12
id bigint primary key generated always as identity,
_12
name text not null,
_12
owner_id bigint not null references public.users (id),
_12
created_at timestamp with time zone not null default now()
_12
);

In your Supabase DB, let's create foreign tables that link to the above tables:


_16
create schema external;
_16
create extension postgres_fdw with schema extensions;
_16
_16
-- Setup the foreign server
_16
create server foreign_server
_16
foreign data wrapper postgres_fdw
_16
options (host '<db-host>', port '<db-port>', dbname '<db-name>');
_16
_16
-- Map local 'authenticated' role to external 'postgres' user
_16
create user mapping for authenticated
_16
server foreign_server
_16
options (user 'postgres', password '<user-password>');
_16
_16
-- Import foreign 'users' and 'documents' tables into 'external' schema
_16
import foreign schema public limit to (users, documents)
_16
from server foreign_server into external;

We'll store document_sections and their embeddings in Supabase so that we can perform similarity search over them via pgvector.


_10
create table document_sections (
_10
id bigint primary key generated always as identity,
_10
document_id bigint not null,
_10
content text not null,
_10
embedding vector (384)
_10
);

We maintain a reference to the foreign document via document_id, but without a foreign key reference since foreign keys can only be added to local tables. Be sure to use the same ID data type that you use on your external documents table.

Since we're managing users and authentication outside of Supabase, we have two options:

  1. Make a direct Postgres connection to the Supabase DB and set the current user every request
  2. Issue a custom JWT from your system and use it to authenticate with the REST API

Direct Postgres connection

You can directly connect to your Supabase Postgres DB using the connection info on your project's database settings page. To use RLS with this method, we use a custom session variable that contains the current user's ID:


_12
-- enable row level security
_12
alter table document_sections enable row level security;
_12
_12
-- setup RLS for select operations
_12
create policy "Users can query their own document sections"
_12
on document_sections for select to authenticated using (
_12
document_id in (
_12
select id
_12
from external.documents
_12
where owner_id = current_setting('app.current_user_id')::bigint
_12
)
_12
);

The session variable is accessed through the current_setting() function. We name the variable app.current_user_id here, but you can modify this to any name you like. We also cast it to a bigint since that was the data type of the user.id column. Change this to whatever data type you use for your ID.

Now for every request, we set the user's ID at the beginning of the session:


_10
set app.current_user_id = '<current-user-id>';

Then all subsequent queries will inherit the permission of that user:


_10
-- Only document sections owned by the user are returned
_10
select *
_10
from document_sections
_10
where document_sections.embedding <#> embedding < -match_threshold
_10
order by document_sections.embedding <#> embedding;

Custom JWT with REST API

If you would like to use the auto-generated REST API to query your Supabase database using JWTs from an external auth provider, you can get your auth provider to issue a custom JWT for Supabase.

See the Clerk Supabase docs for an example of how this can be done. Modify the instructions to work with your own auth provider as needed.

Now we can simply use the same RLS policy from our first example:


_12
-- enable row level security
_12
alter table document_sections enable row level security;
_12
_12
-- setup RLS for select operations
_12
create policy "Users can query their own document sections"
_12
on document_sections for select to authenticated using (
_12
document_id in (
_12
select id
_12
from documents
_12
where (owner_id = (select auth.uid()))
_12
)
_12
);

Under the hood, auth.uid() references current_setting('request.jwt.claim.sub') which corresponds to the JWT's sub (subject) claim. This setting is automatically set at the beginning of each request to the REST API.

All subsequent queries will inherit the permission of that user:


_10
-- Only document sections owned by the user are returned
_10
select *
_10
from document_sections
_10
where document_sections.embedding <#> embedding < -match_threshold
_10
order by document_sections.embedding <#> embedding;

Other scenarios

There are endless approaches to this problem based on the complexities of each system. Luckily Postgres comes with all the primitives needed to provide access control in the way that works best for your project.

If the examples above didn't fit your use case or you need to adjust them slightly to better fit your existing system, feel free to reach out to support and we'll be happy to assist you.