Skip to content
    Back to all Bounties

    Earn 22,500 ($225.00)

    Time Remainingdue 2 years ago
    Open

    VS Code, Docker, Azure Machine Learning Studio Pipeline to Transcribe Youtube Videos

    YorkvilleDS
    YorkvilleDS
    Posted 2 years ago

    Bounty Description

    I need someone to help me write a docker container which will work locally on MacBook Pro with M1 Chip and can be pushed to Azure Container Registry and work with Azure Machine Learning Studio.

    The functionality in the container is to transcribe text fromYoutube videos and create files stored on my Azure storage resource. So if we supply a list of URLs to the system, it will transcribe them using my NVDA Compute Cluster on Azure.

    I already have this working on my local MacBook. But the reason I want to do it on Azure is the GPUs will run the transcription way faster on Azure - I need to productionize this capability , build efficient pipelines so I can scale in the cloud.
I have a lot of the necessary code built already and will share as part of the build/test process here. Main issue is getting the connectivity between Docker image and Azure Machine Learning Studio to work correctly.

    The challenge I’m having is that I can’t figure out which base image to use and then which versions of the relevant libraries to use so that they all compile and play nicely together. Some combination of the base image I’m using plus the packages is gumming things up.

The compute cluster I have procured on Azure Machine Learning Studio is: GPU - 1 x NVIDIA Tesla K80

    If you are familiar with these platforms, this should be an easy task. If not, it would be a very useful set of capabilities for you to have : )

    Here is the current Dockerfile I’ve been trying to use. It indicates which packages are needed.
    FROM nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04
    #FROM python:3.8

    Update packages and install necessary tools

    RUN apt-get update && apt-get install -y ffmpeg

    RUN apt-get update &&
    apt-get install -y --no-install-recommends
    git
    wget
    build-essential
    ca-certificates &&
    rm -rf /var/lib/apt/lists/*

    RUN pip install git+https://github.com/openai/whisper.git

    Install Python packages

    RUN pip install azureml-sdk numpy scipy scikit-learn sklearn pandas pytube

    You need to be familiar with Visual Studio Code, Python, OpenAI, Azure , Azure Machine Learning Studio, Docker.

    Success criteria will be: ability to run this process from my local machine - 
Build the Docker image on VS Code
Push it to Azure Container Registry
Run ScriptRunConfig calling the script to transcribe the Youtube video. The process MUST run on Azure GPU compute .

    (Note - I have tried using GPT 4 to do this. It is helpful but still gets caught up with errors that are hard to fix. It is taking me too long....)