To improve this blog, I wanted to suggest further reading by showing related articles at the end of each post. A good solution would be to use embeddings (which are representations of the text of an article as numerical vectors) to find the closest neighbors. However, this blog uses SQLite and therefore can't store vectors natively.
Considering this feature is small and non-essential, I decided to store it on Supabase, which provides managed PostgreSQL databases with the pgvector addon. Here's how I set that up.
First, create your Supabase account. Their free plan will be more than enough. You'll get credentials to help you set up your database.yml file:
Considering this feature is small and non-essential, I decided to store it on Supabase, which provides managed PostgreSQL databases with the pgvector addon. Here's how I set that up.
First, create your Supabase account. Their free plan will be more than enough. You'll get credentials to help you set up your database.yml file:
default: &default
adapter: sqlite3
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
timeout: 5000
development:
primary:
<<: *default
database: storage/development.sqlite3
queue:
<<: *default
database: storage/development_queue.sqlite3
migrations_paths: db/queue_migrate
vector:
<<: *default
adapter: postgresql
migrations_paths: db/vector_migrate
database: postgres
username: <%= Rails.application.credentials.dig(:supabase, :username) %>
password: <%= Rails.application.credentials.dig(:supabase, :password) %>
host: aws-0-eu-west-3.pooler.supabase.com
port: 6543
migrations_paths: db/vector_migrate
prepared_statements: false
lang-yamlRails handles multiple databases effectively and manages migrations just as it does with your primary database. In this case, we define the migration path as db/vector_migrate. We can now enable the pgvector extension and add the table that will store our embeddings:
class CreatePostEmbeddings < ActiveRecord::Migration[8.0]
def change
enable_extension "vector"
create_table :post_embeddings do |t|
t.bigint :blog_id, null: false
t.bigint :post_id, null: false
t.column :embedding, :vector
t.timestamps
end
add_index :post_embeddings, [:blog_id, :post_id], unique: true
end
end
lang-rubyYou can now run the migration: rails db:migrate and create the associated model:
class PostEmbedding < VectorRecord belongs_to :post belongs_to :blog has_neighbors :embedding endlang-ruby
As you can see in the model, we'll be using the neighbor gem to add nearest neighbor search to ActiveRecord. In particular, the gem defines nearest_neighbors method that we'll use to find related articles:
Post.first.post_embeddings.nearest_neighbors(:embedding, distance: "euclidean").where(blog: blog).first(5) lang-ruby
Of course, we'll need to generate the embeddings first. A simple POST request to the OpenAI API will get us our vector. Right now, the text-embedding-3-small model is both very inexpensive and provides decent results, so we'll use that. In any case, it's easy enough to regenerate the embeddings later if we need to.
def insert_embedding
embedding = generate_openai_embedding
PostEmbedding.find_or_create_by!(post: self, blog: blog).update!(embedding: embedding)
end
private
def generate_openai_embedding
response = HTTParty.post(
"https://api.openai.com/v1/embeddings",
headers: {
"Authorization" => "Bearer #{Rails.application.credentials.openai_api_key}",
"Content-Type" => "application/json"
},
body: {
model: "text-embedding-3-small",
input: title.to_s + "\n" + content&.to_plain_text.to_s
}.to_json
)
response["data"][0]["embedding"]
end
lang-rubyWe're ready to display related articles. Since PostEmbedding is a standard Rails model, there's nothing special to do here. However, considering that we're connecting to a remote database, the latency is going to be much higher than usual.
The solution I implemented is to move the related posts section to a turbo frame that gets eager-loaded. We then add a cache to the HTML response:
class RelatedController < BlogController
def show
@post = Post.find(params[:post_id])
expires_in 1.day, public: true
render "posts/related", layout: false, cached: true
end
end
lang-rubyAnd that's it! The page isn't slowed down, cost is negligible, and we can use vectors even with vanilla SQLite.
💬 Comments