Skip to content
    Back to all Bounties

    Earn 39,001 ($390.01)

    Time Remainingdue 6 months ago
    Completed

    Rust Developer to Use Spider.rs Library + Actix

    YigitKonur
    YigitKonur
    Posted 7 months ago
    This Bounty has been completed!

    Bounty Description

    Problem Description

    We need a high-performance web crawler built in Rust that will continuously scrape specified websites and store the results as markdown files in S3. The crawler should utilize the spider.rs library's smart mode for efficient data extraction based on predefined rules and metadata and will extract HTML and markdown and write to S3 with TTL.

    Acceptance Criteria

    • Build a web crawler using spider.rs library that:
      • Accepts configuration for start URLs and matching rules
      • Uses smart mode for intelligent content extraction
      • Converts scraped content to markdown format
      • Continuously writes results to S3
      • Can be deployed on Cloud Run
      • Handles rate limiting and respects robots.txt
      • Provides logging and error handling
      • Is configurable via environment variables or config files

    Technical Details

    • Must be written in Rust

    • Required libraries:

      • spider.rs for web crawling / markdown conversion
      • Actix for Rust API
      • AWS SDK for S3 integration
    • Infrastructure:

      • Deployable on Google Cloud Run
      • Uses S3 compatible storage
    • Configuration:

      • Ability to define start URLs
      • Pattern matching rules for URL filtering
      • Customizable crawl intervals
      • S3 credentials and bucket configuration
      • Rate limiting parameters

    Please dont write if you have never wrote a single of Rust code before, I can write that on Cursor too but need production ready good solution