Web scraping in Elixir is a bit like having a high-performance data engine at your fingertips. Thanks to Elixir’s concurrency and Crawly, a dedicated web scraping framework, you can build scalable crawlers that fetch, parse, and store data efficiently.

In this tutorial, we’ll go from setting up a simple project to building a multi-page spider, extracting product prices, and finally leveraging FoxScrape to handle tricky pages or anti-scraping protections.

By the end, you’ll know how to:

Build spiders in Elixir with Crawly

Parse HTML with Floki

Handle multi-page crawling and structured data

Use FoxScrape to simplify scraping of protected or JS-heavy pages

🛠️ 1. Why Scrape with Elixir?

Elixir is built on the Erlang VM, which provides lightweight processes, fault tolerance, and concurrency out of the box. For scraping, this means:

Feature	Benefit
Concurrency	Crawl multiple pages at once efficiently
Fault-tolerance	Crashed spiders don’t bring down your process
OTP support	Makes building supervised scrapers easier
Erlang VM speed	Handle thousands of requests without overhead

Popular scraping libraries in Elixir:

Crawly: Full-featured web scraping framework

Floki: HTML parser and selector library

HTTPoison / Finch: HTTP clients

Jason: JSON serialization

⚙️ 2. Setting Up Your Project

Create a new Elixir project with OTP support:

BASH

1mix new price_spider --sup
2cd price_spider

Add dependencies to mix.exs:

ELIXIR

1defp deps do
2  [
3    {:crawly, "~> 0.13"},
4    {:floki, "~> 0.33"},
5    {:httpoison, "~> 2.1"},
6    {:jason, "~> 1.4"}
7  ]
8end

Install them:

BASH

1mix deps.get

This gives you Crawly for crawling, Floki for parsing, and HTTP clients for fetching pages manually or via APIs.

🕷️ 3. Creating a Spider

Crawly spiders are just modules implementing the Crawly.Spider behavior. A minimal spider looks like:

ELIXIR

1defmodule PriceSpider.BasicSpider do
2  use Crawly.Spider
3
4  @impl Crawly.Spider
5  def base_url(), do: "https://books.toscrape.com"
6
7  @impl Crawly.Spider
8  def init() do
9    [start_urls: [base_url()]]
10  end
11
12  @impl Crawly.Spider
13  def parse_item(response) do
14    []
15  end
16end

base_url/0 defines the domain to crawl

init/0 sets starting URLs

parse_item/1 processes the HTTP response and extracts items

At this point, running the spider yields empty results — now let’s extract data.

📄 4. Extracting Data with Floki

Floki lets you parse HTML and extract elements with CSS selectors or XPath-like queries.

Example: scraping book titles and prices:

ELIXIR

1def parse_item(response) do
2  html = response.body
3  {:ok, document} = Floki.parse_document(html)
4
5  items =
6    document
7    |> Floki.find(".product_pod")
8    |> Enum.map(fn product ->
9      title = product |> Floki.find("h3 a") |> Floki.attribute("title") |> List.first()
10      price = product |> Floki.find(".price_color") |> Floki.text()
11      %{title: title, price: price}
12    end)
13
14  Crawly.Engine.save_items(items)
15end

✅ What’s happening:

Floki.parse_document/1 parses HTML into a queryable structure

Floki.find/2 selects elements by CSS selector

We map over the nodes to extract structured data

🔍 5. Handling Multi-Page Crawling

Crawly makes it easy to follow links and paginate:

ELIXIR

1def parse_item(response) do
2  html = response.body
3  {:ok, document} = Floki.parse_document(html)
4
5  items =
6    document
7    |> Floki.find(".product_pod")
8    |> Enum.map(fn product -> ... end)
9
10  # Enqueue next page
11  next_page =
12    document
13    |> Floki.find("li.next a")
14    |> Floki.attribute("href")
15    |> List.first()
16
17  if next_page do
18    Crawly.Engine.enqueue_request("#{base_url()}/#{next_page}")
19  end
20
21  Crawly.Engine.save_items(items)
22end

This setup ensures your spider automatically follows pagination and collects data across multiple pages.

🦊 6. Introducing FoxScrape for Anti-Bot & JS Pages

Some websites implement anti-scraping measures:

Require JavaScript rendering

Block repeated requests from the same IP

Return partial or empty HTML

Manually handling these in Crawly is possible but cumbersome. Instead, FoxScrape can fetch fully-rendered pages for you, letting you continue using Crawly and Floki without additional browser automation.

🔧 Example: Fetching via FoxScrape

ELIXIR

1defmodule PriceSpider.FoxSpider do
2  use Crawly.Spider
3
4  @api_key "YOUR_API_KEY"
5  @target_url "https://books.toscrape.com"
6
7  def base_url(), do: @target_url
8
9  def init(), do: [start_urls: [@target_url]]
10
11  def parse_item(_response) do
12    fox_url = "https://www.foxscrape.com/api/v1?api_key=#{@api_key}&url=#{@target_url}"
13    {:ok, resp} = HTTPoison.get(fox_url)
14    {:ok, document} = Floki.parse_document(resp.body)
15
16    items =
17      document
18      |> Floki.find(".product_pod")
19      |> Enum.map(fn product ->
20        title = product |> Floki.find("h3 a") |> Floki.attribute("title") |> List.first()
21        price = product |> Floki.find(".price_color") |> Floki.text()
22        %{title: title, price: price}
23      end)
24
25    Crawly.Engine.save_items(items)
26  end
27end
28

You can also enable JS rendering for dynamic pages:

ELIXIR

1fox_url = "https://www.foxscrape.com/api/v1?api_key=#{@api_key}&url=#{@target_url}&render_js=true"
2

✅ Why use FoxScrape here:

No manual proxy rotation or headless browser setup

Automatically retries failed requests

Returns clean HTML ready for Floki parsing

💾 7. Exporting Data

Crawly supports multiple output formats — for simplicity, save items to JSON Lines:

ELIXIR

1Crawly.Engine.start_spider(PriceSpider.FoxSpider)
2# Output: data in _build/dev/lib/price_spider/output/items.jsonl

Alternatively, you can post-process the JSON or write to CSV using Elixir’s CSV library.

⚖️ 8. Comparison: Crawly vs FoxScrape

Method	JS Support	Anti-Bot Handling	Concurrency	Complexity
Crawly + Floki	❌	⚠️ Partial	✅ High	🟡 Medium
FoxScrape API	✅	✅ Automatic	✅ High	🟢 Simple

FoxScrape essentially offloads the network and JS rendering layer while letting Crawly remain the parsing engine.

🧠 9. Best Practices

Respect robots.txt and site rate limits

Don’t overwhelm servers; use Crawly’s built-in throttling

Use FoxScrape for protected or JS-heavy pages

Always validate extracted items before storage

Parallelize crawls responsibly with OTP supervision

🎯 10. Conclusion

In this tutorial, you learned:

How to set up Elixir and Crawly for web scraping

How to create spiders and parse pages with Floki

Techniques for multi-page crawling and structured data extraction

How to integrate FoxScrape to handle anti-scraping and dynamic content

Elixir + Crawly is already a high-performance scraping solution. Adding FoxScrape makes it easier to scale, handle JS-heavy sites, and bypass anti-bot protections — all while keeping your parsing code clean and familiar.

🦊 Try FoxScrape in Elixir:

ELIXIR

1api_key = "YOUR_API_KEY"
2url = "https://www.amazon.com/s?k=graphics+cards"
3fox_url = "https://www.foxscrape.com/api/v1?api_key=#{api_key}&url=#{url}"
4{:ok, resp} = HTTPoison.get(fox_url)
5IO.puts String.slice(resp.body, 0..500)

Happy scraping — fast, scalable, and ethical.

Web Scraping with Elixir

🛠️ 1. Why Scrape with Elixir?

⚙️ 2. Setting Up Your Project

🕷️ 3. Creating a Spider

📄 4. Extracting Data with Floki

🔍 5. Handling Multi-Page Crawling

🦊 6. Introducing FoxScrape for Anti-Bot & JS Pages

🔧 Example: Fetching via FoxScrape

💾 7. Exporting Data

⚖️ 8. Comparison: Crawly vs FoxScrape

🧠 9. Best Practices

🎯 10. Conclusion

Further Reading

Web Scraping with C++

Web Scraping with Rust

Web Scraping with Golang