Back to all Bounties
Earn 39,001 ($390.01)
due 6 months ago
Completed
Rust Developer to Use Spider.rs Library + Actix
YigitKonur
Details
Applications
3
Discussion
This Bounty has been completed!
Bounty Description
Problem Description
We need a high-performance web crawler built in Rust that will continuously scrape specified websites and store the results as markdown files in S3. The crawler should utilize the spider.rs library's smart mode for efficient data extraction based on predefined rules and metadata and will extract HTML and markdown and write to S3 with TTL.
Acceptance Criteria
- Build a web crawler using spider.rs library that:
- Accepts configuration for start URLs and matching rules
- Uses smart mode for intelligent content extraction
- Converts scraped content to markdown format
- Continuously writes results to S3
- Can be deployed on Cloud Run
- Handles rate limiting and respects robots.txt
- Provides logging and error handling
- Is configurable via environment variables or config files
Technical Details
-
Must be written in Rust
-
Required libraries:
- spider.rs for web crawling / markdown conversion
- Actix for Rust API
- AWS SDK for S3 integration
-
Infrastructure:
- Deployable on Google Cloud Run
- Uses S3 compatible storage
-
Configuration:
- Ability to define start URLs
- Pattern matching rules for URL filtering
- Customizable crawl intervals
- S3 credentials and bucket configuration
- Rate limiting parameters
Please dont write if you have never wrote a single of Rust code before, I can write that on Cursor too but need production ready good solution