Web Scraping with Rust

Published on
Written by
Mantas Kemėšius
Web Scraping with Rust

Rust is increasingly popular for web scraping because of its speed, memory safety, and concurrency capabilities. In this guide, we’ll build a scraper to fetch IMDb’s top ten movies using both blocking and asynchronous IO. Finally, we’ll show how FoxScrape can simplify fetching dynamic or protected content, letting you focus on parsing and data processing.

⚙️ 1. Why Rust for Web Scraping?

Rust offers:

FeatureBenefit
Memory SafetyAvoids crashes or memory leaks during scraping
SpeedEfficient HTTP requests and HTML parsing
ConcurrencyAsync tasks with Tokio for multiple requests
Type SafetyCompile-time checks prevent common runtime errors

Popular Rust crates for scraping:

  • reqwest – HTTP client (blocking & async)
  • scraper – CSS-selector-based HTML parsing
  • tokio – async runtime
  • serde_json – optional for JSON serialization
  • 🛠️ 2. Setting Up Your Rust Project

    Create a new project:

    BASH
    1
    cargo new imdb_scraper
    2
    cd imdb_scraper

    Add dependencies in Cargo.toml:

    TOML
    1
    [dependencies]
    2
    reqwest = { version = "0.12.12", features = ["blocking"] }
    3
    scraper = "0.22.0"
    4
    tokio = { version = "1", features = ["full"] } # for async
    5
    serde_json = "1.0" # optional for saving JSON

    🏗️ 3. Blocking IO Scraper

    For simple single-page scraping, blocking IO works well.

    RUST
    1
    use reqwest::blocking::get;
    2
    use scraper::{Html, Selector};
    3
    4
    fn main() -> Result<(), Box<dyn std::error::Error>> {
    5
    let url = "https://www.imdb.com/chart/top/";
    6
    let response = get(url)?.text()?;
    7
    8
    let document = Html::parse_document(&response);
    9
    let selector = Selector::parse("td.titleColumn a")?;
    10
    11
    let titles: Vec<String> = document
    12
    .select(&selector)
    13
    .take(10)
    14
    .map(|x| x.text().collect::<String>())
    15
    .collect();
    16
    17
    println!("Top 10 IMDb Movies:");
    18
    for (i, title) in titles.iter().enumerate() {
    19
    println!("{}. {}", i + 1, title);
    20
    }
    21
    22
    Ok(())
    23
    }

    What’s happening:

  • reqwest::blocking::get fetches HTML synchronously.
  • scraper::Html::parse_document parses HTML into a queryable DOM.
  • Selector::parse with CSS selectors targets movie titles.
  • We collect the top 10 titles and print them.
  • Limitation: Works only for static pages; dynamic content or anti-bot measures require extra handling.

    ⚡ 4. Async Scraping with Tokio

    Blocking IO is simple but slow for multiple pages. Using Tokio, you can fetch many pages concurrently:

    RUST
    1
    use reqwest::Client;
    2
    use scraper::{Html, Selector};
    3
    use futures::future::try_join_all;
    4
    5
    #[tokio::main]
    6
    async fn main() -> Result<(), Box<dyn std::error::Error>> {
    7
    let client = Client::new();
    8
    let urls = vec![
    9
    "https://www.imdb.com/chart/top/",
    10
    "https://www.imdb.com/chart/moviemeter/"
    11
    ];
    12
    13
    let futures = urls.into_iter().map(|url| async {
    14
    let resp = client.get(url).send().await?.text().await?;
    15
    Ok::<String, reqwest::Error>(resp)
    16
    });
    17
    18
    let results = try_join_all(futures).await?;
    19
    20
    for html in results {
    21
    let document = Html::parse_document(&html);
    22
    let selector = Selector::parse("td.titleColumn a")?;
    23
    for movie in document.select(&selector).take(5) {
    24
    println!("{}", movie.text().collect::<String>());
    25
    }
    26
    }
    27
    28
    Ok(())
    29
    }

    Benefits of async:

  • Multiple requests execute concurrently
  • Dramatically reduces runtime (e.g., 11s blocking → 334ms async)
  • 🦊 5. Handling Protected or JS-heavy Pages with FoxScrape

    Some pages (like modern e-commerce or IMDb user-specific pages) use JavaScript or anti-scraping measures. Managing proxies, headless browsers, and retries in Rust can be complex. FoxScrape simplifies this:

  • Fetch pages with JavaScript rendered if needed
  • Automatic retries and proxy rotation
  • Returns clean HTML ready for parsing
  • 🔧 Example: Using FoxScrape with Rust

    RUST
    1
    use reqwest::blocking::get;
    2
    use scraper::{Html, Selector};
    3
    use std::error::Error;
    4
    5
    fn main() -> Result<(), Box<dyn Error>> {
    6
    let api_key = "YOUR_API_KEY";
    7
    let target_url = "https://www.imdb.com/chart/top/";
    8
    let fox_url = format!(
    9
    "https://www.foxscrape.com/api/v1?api_key={}&url={}",
    10
    api_key, target_url
    11
    );
    12
    13
    let resp = get(&fox_url)?.text()?;
    14
    let document = Html::parse_document(&resp);
    15
    let selector = Selector::parse("td.titleColumn a")?;
    16
    17
    let titles: Vec<String> = document
    18
    .select(&selector)
    19
    .take(10)
    20
    .map(|x| x.text().collect::<String>())
    21
    .collect();
    22
    23
    println!("Top 10 IMDb Movies (via FoxScrape):");
    24
    for (i, title) in titles.iter().enumerate() {
    25
    println!("{}. {}", i + 1, title);
    26
    }
    27
    28
    Ok(())
    29
    }

    Optional JS rendering:

    RUST
    1
    let fox_url = format!(
    2
    "https://www.foxscrape.com/api/v1?api_key={}&url={}&render_js=true",
    3
    api_key, target_url
    4
    );

    Why FoxScrape helps:

  • No need to implement async retries manually
  • Handles JavaScript-heavy pages
  • Reduces IP blocking risks
  • Lets you reuse your existing CSS parsing code
  • 💾 6. Exporting Results

    Rust makes it simple to save scraped data to JSON or CSV:

    RUST
    1
    use std::fs::File;
    2
    use serde_json::to_writer_pretty;
    3
    4
    let file = File::create("top_movies.json")?;
    5
    to_writer_pretty(file, &titles)?;

    Now your data is ready for analysis or further processing.

    ⚖️ 7. Comparing Methods in Rust

    MethodJS SupportAnti-Bot HandlingConcurrencyComplexity
    Blocking IO + scraper⚡ Low🟢 Simple
    Async + reqwest + scraper⚠️ Partial⚠️ Partial⚡⚡ High🟡 Medium
    FoxScrape API✅ Automatic⚡⚡ High🟢 Simple

    🧠 8. Best Practices

  • Respect robots.txt and rate limits
  • Cache responses locally for debugging
  • Validate extracted data before saving
  • Use FoxScrape for high-volume scraping or JS-heavy pages
  • Leverage Tokio for scalable async requests
  • 🎯 9. Conclusion

    In this tutorial, you learned:

  • Blocking scraping with reqwest::blocking for small tasks
  • Asynchronous scraping with Tokio for multiple pages
  • How to parse HTML with CSS selectors using scraper
  • How to simplify protected or dynamic scraping with FoxScrape
  • Exporting structured results for further analysis
  • Rust provides high performance, safety, and concurrency. Adding FoxScrape removes the complexity of dealing with dynamic pages, anti-bot blocks, and proxies — letting you focus on your parsing logic and data extraction.

    🦊 Try FoxScrape with Rust:

    RUST
    1
    let api_key = "YOUR_API_KEY";
    2
    let url = "https://www.imdb.com/chart/top/";
    3
    let fox_url = format!("https://www.foxscrape.com/api/v1?api_key={}&url={}", api_key, url);

    Fetch any page, static or dynamic, and parse it with your existing Rust code — simple, fast, and reliable.