Earn 10,000 ($100.00)

due 2 years ago

In Progress

AI4ALL: Build a RLHF Interface with Bitcoin Payouts

Fedi

Posted 2 years ago

Details

Applications

Discussion

Bounty Description

*This project when completed will be published as free and open-source software under an MIT License.

Problem Description

Build a Reinforcement Learning with Human Feedback mechanism with Bitcoin Payouts. Language/Code Model creates 2-3 alternatives, human ranks them, human gets paid for their work creating the fine-tuning dataset.

This project can also be submitted for the AI4ALL remote hackathon running until July 31st, and be eligible for the Overall prize of $10,000 or the Training track prize of $1,000. Sign up here: https://bolt.fun/tournaments/ai4all/overview and hop in the discord!

https://discord.gg/T3C4YrAg2d

Here's how you can apply:

Register your project for the hackathon that meets the bounty criteria
Apply for a bounty with a link to your project.
If your project is approved, you claim the bounty.

Acceptance Criteria

Feel free to leverage and use existing frameworks and open source software. The MVP for this is described below, but you can go way further with it to improve your project for the hackathon submission. For extensive details on how to implement the L402 bitcoin payment flow, see the specification: https://github.com/lightninglabs/L402/blob/master/protocol-specification.md and we've got a bunch of mentors who are happy to help you in the discord if you run into issues.

MVP:

An open source language or code model (preferably the replitLM) is prompted to perform a task and outputs 2-3 different versions of the answer
The user is presented with the 2-3 different versions and selects which one is best according to a rubric of requirements like correctness. The user is paid 1 satoshi (1/100,000,000th of a Bitcoin) per answer.
The curated training set is saved in a database or other format to be used for fine-tuning the model. (Don't need to actually implement the fine-tuning just build out the process for paying to collect the dataset)

Some More Ideas (Optional, but will be a cooler submission for the hackathon):

Allow users to grade outputs from other users to develop reputations or verify their skills, higher reputation or skilled users get paid more for participating in grading responses than new/low reputation users.
Actually implement the fine tuning over the refined dataset.