Lantern: PostgreSQL Vector Search Extension

GitHub Stats Value
Stars 749
Forks 53
Language C
Created 2023-07-11
License Other

Lantern is an open-source PostgreSQL database extension designed to efficiently manage and query vector data. It enables the storage of vector data, generation of embeddings, and execution of vector search operations. A key feature of Lantern is the introduction of a new index type called lantern_hnsw, which significantly speeds up queries involving ORDER BY ... LIMIT clauses. By leveraging a state-of-the-art HNSW (Hierarchical Navigable Small World) implementation, Lantern enhances the performance of vector-based queries. This tool is particularly useful for applications requiring fast and accurate vector search capabilities, making it worth exploring for anyone working with large datasets involving vector data.

Overview

  • Lantern is an open-source PostgreSQL database extension designed to store vector data, generate embeddings, and perform vector search operations.

Main Capabilities

  • Vector Data Handling: Supports storing and querying vector data with a new index type called lantern_hnsw, which optimizes ORDER BY ... LIMIT queries.
  • Embedding Generation: Allows generation of embeddings using popular models like CLIP, Hugging Face models, and custom models.
  • Indexing: Utilizes the HNSW (Hierarchical Navigable Small World) algorithm for efficient indexing. Supports parallel index creation and external index graph generation.
  • Interoperability: Compatible with pgvector’s data type, enabling seamless transition from pgvector to Lantern.
  • Performance: Matches or outperforms other solutions like pgvector and pg_embedding in terms of index creation time, select throughput, and select latency.
  • Customization: Allows customization of lantern_hnsw index parameters such as distance functions, construction parameters, and search parameters.

Installation and Usage

  • Can be installed using Docker, Homebrew, or by building from source code.
  • Retains the standard PostgreSQL interface, ensuring compatibility with existing tools.
  • Supports various distance functions and operator classes for flexible querying.

Additional Features

  • Cloud-hosted version planned.
  • Future support for hardware-accelerated distance metrics, industry-specific application templates, and more tools for generating embeddings.
  • Community contributions and support are encouraged.

Lantern can be used to speed up vector search operations in PostgreSQL databases. Here’s an example:

sql

-- Create a table with a vector column
CREATE TABLE small_world (id integer, vector real[3]);
INSERT INTO small_world (id, vector) VALUES (0, '{0,0,0}'), (1, '{0,0,1}');

-- Create an HNSW index on the vector column
CREATE INDEX ON small_world USING lantern_hnsw (vector);

-- Query data using the index
SET enable_seqscan = false;
SELECT id, l2sq_dist(vector, ARRAY[0,0,0]) AS dist
FROM small_world ORDER BY vector <-> ARRAY[0,0,0] LIMIT 1;

Lantern supports embedding generation for various models, such as CLIP and Hugging Face models.

sql

-- Generate embeddings using a custom model
-- (Assuming you have a function to generate embeddings)
INSERT INTO embeddings_table (id, embedding)
VALUES (1, generate_embedding('input_text'));

You can customize the lantern_hnsw index parameters to optimize for your specific use case.

sql

-- Create an HNSW index with custom parameters
CREATE INDEX ON small_world USING lantern_hnsw (vector dist_l2sq_ops)
WITH (M=2, ef_construction=10, ef=4, dim=3);
  • Quick Installation: Use Docker or Homebrew to quickly set up Lantern with PostgreSQL.

    bash

    docker run --pull=always --rm -p 5432:5432 -e "POSTGRES_USER=$USER" -e "POSTGRES_PASSWORD=postgres" -v ./lantern_data:/var/lib/postgresql/data lanterndata/lantern:latest-pg15

    or

    bash

    brew tap lanterndata/lantern
    brew install lantern && lantern_install
  • Build from Source: Build Lantern on top of your existing PostgreSQL installation.

    bash

    git clone --recursive https://github.com/lanterndata/lantern.git
    cd lantern
    mkdir build
    cd build
    cmake -DMARCH_NATIVE=ON

Key Points:

  • Vector Data Handling: Lantern is an open-source PostgreSQL extension for storing vector data, generating embeddings, and performing vector search operations.
  • Performance: It introduces a new index type, lantern_hnsw, which speeds up ORDER BY ... LIMIT queries and matches or outperforms existing solutions like pgvector and pg_embedding.
  • Installation: Easy installation via Docker, Homebrew, or building from source code.
  • Compatibility: Retains standard PostgreSQL interface, compatible with existing tools.
  • Features: Supports embedding generation, parallel index creation, and external index graph generation.
  • Roadmap: Plans include cloud-hosted versions, hardware-accelerated distance metrics, industry-specific templates, and enhanced embedding tools.

Future Potential:

  • Performance Improvements: Ongoing enhancements to maintain top performance.
  • New Features: Autotuned index types, support for additional vector elements, and version control for embeddings.
  • Community Engagement: Encourages community contributions and provides support for troubleshooting and feature requests.

For further insights and to explore the project further, check out the original lanterndata/lantern repository.

Content derived from the lanterndata/lantern repository on GitHub. Original materials are licensed under their respective terms.