Earn 22,500 ($225.00)

due 2 years ago

Open

VS Code, Docker, Azure Machine Learning Studio Pipeline to Transcribe Youtube Videos

YorkvilleDS

Posted 2 years ago

Details

Applications

Discussion

Bounty Description

I need someone to help me write a docker container which will work locally on MacBook Pro with M1 Chip and can be pushed to Azure Container Registry and work with Azure Machine Learning Studio. 
The functionality in the container is to transcribe text fromYoutube videos and create files stored on my Azure storage resource. So if we supply a list of URLs to the system, it will transcribe them using my NVDA Compute Cluster on Azure. 
I already have this working on my local MacBook. But the reason I want to do it on Azure is the GPUs will run the transcription way faster on Azure - I need to productionize this capability , build efficient pipelines so I can scale in the cloud. I have a lot of the necessary code built already and will share as part of the build/test process here. Main issue is getting the connectivity between Docker image and Azure Machine Learning Studio to work correctly.

The challenge I’m having is that I can’t figure out which base image to use and then which versions of the relevant libraries to use so that they all compile and play nicely together. Some combination of the base image I’m using plus the packages is gumming things up.  The compute cluster I have procured on Azure Machine Learning Studio is: GPU - 1 x NVIDIA Tesla K80

If you are familiar with these platforms, this should be an easy task. If not, it would be a very useful set of capabilities for you to have : )

Here is the current Dockerfile I’ve been trying to use. It indicates which packages are needed.
FROM nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04
#FROM python:3.8

Update packages and install necessary tools

RUN apt-get update && apt-get install -y ffmpeg

RUN apt-get update &&
apt-get install -y --no-install-recommends
git
wget
build-essential
ca-certificates &&
rm -rf /var/lib/apt/lists/*

RUN pip install git+https://github.com/openai/whisper.git

Install Python packages

RUN pip install azureml-sdk numpy scipy scikit-learn sklearn pandas pytube

You need to be familiar with Visual Studio Code, Python, OpenAI, Azure , Azure Machine Learning Studio, Docker.

Success criteria will be: ability to run this process from my local machine -  Build the Docker image on VS Code Push it to Azure Container Registry Run ScriptRunConfig calling the script to transcribe the Youtube video. The process MUST run on Azure GPU compute .

(Note - I have tried using GPT 4 to do this. It is helpful but still gets caught up with errors that are hard to fix. It is taking me too long....)