Back to Templates
What this does
Receives a URL via webhook, uses Firecrawl to scrape the page into clean markdown, and stores it as vector embeddings in Supabase pgvector. A visual, self-hosted ingestion pipeline for RAG knowledge bases. Adding a new source is as simple as sending a URL.
The second part of the workflow exposes a chat interface where an AI Agent queries the stored knowledge base to answer questions, with Cohere reranking for better retrieval quality.
How it works
Part 1: Ingestion Pipeline
url fieldPart 2: RAG Chat Agent
Requirements
Setup
-- Enable the pgvector extension
create extension vector
with
schema extensions;
-- Create a table to store documents
create table documents (
id bigserial primary key,
content text,
metadata jsonb,
embedding extensions.vector(1536)
);
-- Create a function to search for documents
create function match_documents (
query_embedding extensions.vector(1536),
match_count int default null,
filter jsonb default '{}'
) returns table (
id bigint,
content text,
metadata jsonb,
similarity float
)
language plpgsql
as $$
#variable_conflict use_column
begin
return query
select
id,
content,
metadata,
1 - (documents.embedding <=> query_embedding) as similarity
from documents
where metadata @> filter
order by documents.embedding <=> query_embedding
limit match_count;
end;
$$;
How to use
Send a POST request to the webhook URL:
curl -X POST https://your-n8n-instance/webhook/your-id \
-H "Content-Type: application/json" \
-d '{"url": "https://firecrawl.dev/docs"}'
Then open the chat interface in n8n to ask questions about the ingested content.