Skip to content
    Back to all Bounties

    Earn 15,300 ($153.00)

    Time Remainingdue 2 weeks ago
    In Progress

    Reinforcement Learning Notebook - Change from Model judge to Human judge

    AskProgrammers
    AskProgrammers
    Posted 4 weeks ago

    Bounty Description

    bounty:

    Create a one-click, run-all Colab notebook implementing RULER RFT, but modified to use a human as the scoring judge

    the human judge should input trajectory scores (0-1 scale)

    This is the notebook to modify:

    https://colab.research.google.com/github/openpipe/art/blob/main/examples/art-e/art-e.ipynb

    use qwen3:32b as model we're training

    acceptance criteria:

    • runs completely with a single click, end-to-end
    • no dependency or runtime errors
    • human scoring via simple stdin input
    • can use a smaller qwen3 model for testing (e.g. if too memory consuming on colab)

    bonus $ if you make human judging more effective or less time-consuming
    (e.g. better interface to select trajectories from)

    extra bonus $ if your implementation naturally results in the generation of a high-quality, clean Direct Preference Optimization (DPO) dataset

    link to tweet:
    https://x.com/uncensored_ai/status/1949163720338239755

    You may be able to 1shot this with a model like o3 pro which can return ipynb files

    Copyright © 2025 Replit, Inc. All rights reserved.
    • twitter
    • tiktok
    • instagram
    • facebook

    Replit

    Programming languages

    • Python
    • JavaScript
    • TypeScript
    • Node.js
    • Nix
    • HTML, CSS, JS
    • C++
    • Golang