vectordb: High-Performance Vector Database Library

GitHub Stats Value
Stars 896
Forks 37
Language C++
Created 2023-07-09
License GNU General Public License v3.0

Vectordb, powered by Epsilla, is an open-source vector database designed to enhance the efficiency and cost-effectiveness of vector search operations. It focuses on scalability, high performance, and bridging the gap between information retrieval and memory retention in Large Language Models. With Vectordb, you can achieve up to 10 times faster and cheaper vector search capabilities compared to other solutions. It is easy to set up using Docker and interact with via a Python client, making it a valuable tool for anyone looking to optimize their vector database needs. Exploring Vectordb can significantly improve your data management and search functionalities.

Epsilla is an open-source vector database designed for high performance, scalability, and cost-effectiveness in vector search.

  • High Performance: Achieves 10 times faster vector search than HNSW with precision levels over 99.9%, leveraging advanced academic parallel graph traversal techniques.
  • Database Management: Full-fledged database management system with database, table, and field concepts, including vector fields.
  • Hybrid Search: Supports hybrid search with both dense and sparse vectors.
  • Metadata Filtering: Allows filtering based on metadata.
  • Built-in Embedding Support: Natural language in and out search experience with built-in embedding support.
  • Cloud Native Architecture: Features compute storage separation, serverless, and multi-tenancy.
  • Rich Ecosystem Integrations: Integrates with LangChain and LlamaIndex.
  • Multi-Language Clients: Supports Python, JavaScript, Ruby clients, and a REST API interface.
  • Can be run using Docker for easy setup.
  • Experimental option to use as a Python library without a Docker image.
  • Ideal for large language models and applications requiring efficient vector search and retrieval.
  • Use Case: Implement a efficient search system for large language models. Vectordb allows you to store and query embedding vectors, enabling fast and accurate retrieval of relevant documents.

    python

    client.query(
        table_name="MyTable",
        query_text="Celestial bodies and their characteristics",
        limit=2
    )
  • Use Case: Filter search results based on metadata and use hybrid search to combine dense and sparse vectors.

    python

    client.query(
        table_name="MyTable",
        query_field="EmbeddingEuclidean",
        response_fields=["ID", "Doc", "EmbeddingEuclidean"],
        query_vector=[0.35, 0.55, 0.47, 0.94],
        filter="ID < 6",
        limit=10,
        with_distance=True
    )
  • Use Case: Integrate Vectordb with tools like LangChain and LlamaIndex for enhanced functionality.
    • Use Python, JavaScript, or Ruby clients to interact with Vectordb.
    • Utilize the REST API interface for seamless integration.
  • Use Case: Deploy Vectordb in a cloud environment using Epsilla Cloud, a fully managed vector DBaaS.
    • Benefit from serverless architecture, multi-tenancy, and compute storage separation.
  • Step 1: Run the backend in Docker.

    shell

    docker pull epsilla/vectordb
    docker run --pull=always -d -p 8888:8888 -v /data:/data epsilla/vectordb
  • Step 2: Interact using the Python client.

    python

    from pyepsilla import vectordb
    
    client = vectordb.Client(host='localhost', port='8888')
    client.load_db(db_name="MyDB", db_path="/data/epsilla")
    client.use_db(db_name="MyDB")
  • Documentation: Refer to the documentation for detailed guides and tutorials.
  • Community: Join the Discord, follow on Twitter, or subscribe to the Blog and YouTube channel for updates and community support.
  • Performance: Epsilla is 10 times faster and more cost-effective than existing solutions like HNSW, with precision levels over 99.9%.
  • Scalability: Designed for high performance and production-scale similarity search, with cloud-native architecture supporting serverless and multi-tenancy.
  • Database Management: Full-fledged database system with familiar concepts like databases, tables, and fields, including vector fields.
  • Hybrid Search: Supports dense and sparse vector searches, along with built-in embedding and natural language search capabilities.
  • Ecosystem Integrations: Rich integrations with tools like LangChain and LlamaIndex, and clients for Python, JavaScript, and Ruby.
  • Future Potential: Positioned to bridge the gap between information retrieval and memory retention in Large Language Models, enhancing AI and machine learning applications.

Epsilla has the potential to revolutionize vector search and database management, particularly in AI and machine learning contexts. Its high performance, scalability, and cost-effectiveness make it an attractive solution for large-scale applications. The integration with popular AI tools and its cloud-native design further enhance its future prospects.

For further insights and to explore the project further, check out the original epsilla-cloud/vectordb repository.

Content derived from the epsilla-cloud/vectordb repository on GitHub. Original materials are licensed under their respective terms.