VecDB: Simple Vector Embedding Database Tool

GitHub Stats Value
Stars 31
Forks 1
Language Go
Created 2024-07-08
License MIT License

VecDB is a simple vector embedding database designed to find items similar to the one you are searching for, functioning much like a hash table. Created by a databases enthusiast as a fun and learning project, VecDB can also be used in production environments. It uses a {key => value} data model, where the key is a unique identifier and the value is the vector itself, represented as a list of floats. The database can be configured using a config.yml file, allowing you to customize settings such as the HTTP server address and storage driver. Exploring VecDB can provide valuable insights into vector embedding databases and their applications.

VecDB is a simple vector embedding database that functions like a hash table, allowing you to find items similar to the one you are searching for.

  • Uses a {key => value} model where key is a unique identifier and value is the vector (a list of floats).
  • Configurable via a config.yml file, with options for HTTP server, storage driver (currently supports BoltDB), and embedder settings (supports Gemini).
  • Raw Vectors Layer: Allows writing and searching vectors using endpoints like POST /v1/vectors/write and POST /v1/vectors/search.
  • Embedding Layer (Optional): Enables text embedding and search using endpoints like POST /v1/embeddings/text/write and POST /v1/embeddings/text/search.
  • Supports various request types:
    • VectorWriteRequest: Store a vector.
    • VectorSearchRequest: Search for similar vectors based on cosine similarity.
    • TextEmbeddingWriteRequest: Store text as a vector using an embedder.
    • TextEmbeddingSearchRequest: Search for similar vectors based on text content.
  • Available as a binary or Docker image.

VecDB is designed for both fun and learning, with the potential for use in production environments.

You can use VecDB to build a product recommendation system. Here’s how:

  • Store Product Vectors: Send VectorWriteRequest to store product vectors with unique keys (e.g., product IDs).

    json5

    {
      "bucket": "products",
      "key": "product-id-1",
      "vector": [1.929292, 0.3848484, -1.9383838383, ...]
    }
  • Search Similar Products: Use VectorSearchRequest to find products similar to a given product vector.

    json5

    {
      "bucket": "products",
      "vector": [1.929292, 0.3848484, -1.9383838383, ...],
      "min_cosine_similarity": 0.5,
      "max_result_count": 10
    }

If you enable the embedder, you can search for similar texts:

  • Store Text Embeddings: Send TextEmbeddingWriteRequest to store text embeddings.

    json5

    {
      "bucket": "texts",
      "key": "text-id-1",
      "content": "This is some text representing the product"
    }
  • Search Similar Texts: Use TextEmbeddingSearchRequest to find texts similar to a given text.

    json5

    {
      "bucket": "texts",
      "content": "A Product Text",
      "min_cosine_similarity": 0.5,
      "max_result_count": 10
    }
  • Configure VecDB: Use a config.yml file to set up the server, storage, and embedder settings.

    yaml

    server:
      listen: "0.0.0.0:3000"
    store:
      driver: "bolt"
      args:
        database: "./vec.db"
    embedder:
      enabled: true
      driver: gemini
      args:
        api_key: "${GEMINI_API_KEY}"
        text_embedding_model: "text-embedding-004"
  • Deploy Using Docker: You can deploy VecDB using a Docker image for easy setup and management.
  • Simple Vector Embedding Database: VecDB acts as a hash-table to find items similar to the search item based on vector embeddings.
  • Data Model: Uses {key => value} model with unique keys and vector values.
  • Configurable: Supports custom configurations via config.yml for server, storage, and embedder settings.
  • Components:
    • Raw Vectors Layer: Allows writing and searching vectors.
    • Embedding Layer: Optionally generates and stores vectors from text using embedders like Gemini.
  • Requests: Supports various request types for writing and searching vectors and text embeddings.
  • Production Use: Can be used in production environments for similarity searches.
  • Customization: Flexible configuration options allow for adaptation to different use cases.
  • Extensibility: Potential to support additional embedder drivers and storage solutions.
  • Ease of Use: Simple and straightforward API for vector and text embedding operations.
  • Available as a binary or Docker image for easy deployment.

For further insights and to explore the project further, check out the original alash3al/vecdb repository.

Content derived from the alash3al/vecdb repository on GitHub. Original materials are licensed under their respective terms.