Skip to content
    Back to all Bounties

    Earn 40,500 ($405.00)

    Time Remainingdue 1 year ago
    Completed

    Debug and Deploy Complex vector search fastAPI server

    luquitared
    luquitared
    Posted 1 year ago
    This Bounty has been completed!

    Bounty Description

    I have a broken FastAPI file here:
    https://drive.google.com/file/d/19MLLJon-dwQUbKu-pNMrwG0_Fv9VRo-J/view?usp=share_link

    I'm looking for someone who can debug the posted code and successfully deploy this application with Docker.

    Here is a summary of what is required for a finished project:
    Create a FastAPI server.

    Make a dummy middleware that returns a user_id. Protect routes with it.

    Create separate functions for doing CRUD operations on passed files using s3 and Redis. Always pass user_id to ensure an authenticated user is performing CRUD on the file. Check if the user has shared the file with other user_ids or if the file is public.

    Create three Milvus vector indexes. One that stores document titles, one that stores the document content as chunks, and an index for images that uses CLIP embeddings.

    Make an index_file route that allows users to upload any type of document or file. Pass each file to the s3 + redis CRUD functions to save the file. Next, create separate functions for processing different types of files to parse the text. Create functions that handle all text file types, including PDFs, Word, Markdown, txt, code, and a catch all type and grab the text. If the input is audio and user passes "tts" flag, create a dummy function that will perform speech to text and grab the text. If the input is a link, create a dummy function that will grab the text from the website. Once text is collected, chunk the text content with a specific chunk size and count the tokens using GPT-2 tokenizer. If the file is a zip or does not contain text content, simply index the file using the file title only. After processing the chunks, index all the content with Milvus vector search document content index. If it is an image, pass to CLIP and index in the images index. Allow the user to pass a dict of metadata that can be stored with each object.

    Make a search route that allows a user to search their index. Implement optional Milvus filtering (right now there was an attempt to do filtering with redis. We can remove that and replace with Milvus)

    Make all CRUD and text processing functions composable and modular.

    *** NOTE!
    It is very hard to get this set up on your env. If CLIP is giving you trouble, simply remove this from the project and try with a different CLIP package (hugginface). If this is still too difficult, we can remove CLIP from the project and talk later about making a microservice that returns the embedding.

    Copyright © 2024 Replit, Inc. All rights reserved.
    • twitter
    • tiktok
    • instagram
    • facebook

    Replit

    Programming languages

    • Python
    • JavaScript
    • TypeScript
    • Node.js
    • Nix
    • HTML, CSS, JS
    • C++
    • Golang