Web Scraping with Elixir

Web scraping in Elixir is a bit like having a high-performance data engine at your fingertips. Thanks to Elixir’s concurrency and Crawly, a dedicated web scraping framework, you can build scalable crawlers that fetch, parse, and store data efficiently.
In this tutorial, we’ll go from setting up a simple project to building a multi-page spider, extracting product prices, and finally leveraging FoxScrape to handle tricky pages or anti-scraping protections.
By the end, you’ll know how to:
🛠️ 1. Why Scrape with Elixir?
Elixir is built on the Erlang VM, which provides lightweight processes, fault tolerance, and concurrency out of the box. For scraping, this means:
| Feature | Benefit |
|---|---|
| Concurrency | Crawl multiple pages at once efficiently |
| Fault-tolerance | Crashed spiders don’t bring down your process |
| OTP support | Makes building supervised scrapers easier |
| Erlang VM speed | Handle thousands of requests without overhead |
Popular scraping libraries in Elixir:
⚙️ 2. Setting Up Your Project
Create a new Elixir project with OTP support:
1mix new price_spider --sup
2cd price_spiderAdd dependencies to mix.exs:
1defp deps do
2 [
3 {:crawly, "~> 0.13"},
4 {:floki, "~> 0.33"},
5 {:httpoison, "~> 2.1"},
6 {:jason, "~> 1.4"}
7 ]
8endInstall them:
1mix deps.getThis gives you Crawly for crawling, Floki for parsing, and HTTP clients for fetching pages manually or via APIs.
🕷️ 3. Creating a Spider
Crawly spiders are just modules implementing the Crawly.Spider behavior. A minimal spider looks like:
1defmodule PriceSpider.BasicSpider do
2 use Crawly.Spider
3
4 @impl Crawly.Spider
5 def base_url(), do: "https://books.toscrape.com"
6
7 @impl Crawly.Spider
8 def init() do
9 [start_urls: [base_url()]]
10 end
11
12 @impl Crawly.Spider
13 def parse_item(response) do
14 []
15 end
16endbase_url/0 defines the domain to crawlinit/0 sets starting URLsparse_item/1 processes the HTTP response and extracts itemsAt this point, running the spider yields empty results — now let’s extract data.
📄 4. Extracting Data with Floki
Floki lets you parse HTML and extract elements with CSS selectors or XPath-like queries.
Example: scraping book titles and prices:
1def parse_item(response) do
2 html = response.body
3 {:ok, document} = Floki.parse_document(html)
4
5 items =
6 document
7 |> Floki.find(".product_pod")
8 |> Enum.map(fn product ->
9 title = product |> Floki.find("h3 a") |> Floki.attribute("title") |> List.first()
10 price = product |> Floki.find(".price_color") |> Floki.text()
11 %{title: title, price: price}
12 end)
13
14 Crawly.Engine.save_items(items)
15end✅ What’s happening:
Floki.parse_document/1 parses HTML into a queryable structureFloki.find/2 selects elements by CSS selector🔍 5. Handling Multi-Page Crawling
Crawly makes it easy to follow links and paginate:
1def parse_item(response) do
2 html = response.body
3 {:ok, document} = Floki.parse_document(html)
4
5 items =
6 document
7 |> Floki.find(".product_pod")
8 |> Enum.map(fn product -> ... end)
9
10 # Enqueue next page
11 next_page =
12 document
13 |> Floki.find("li.next a")
14 |> Floki.attribute("href")
15 |> List.first()
16
17 if next_page do
18 Crawly.Engine.enqueue_request("#{base_url()}/#{next_page}")
19 end
20
21 Crawly.Engine.save_items(items)
22endThis setup ensures your spider automatically follows pagination and collects data across multiple pages.
🦊 6. Introducing FoxScrape for Anti-Bot & JS Pages
Some websites implement anti-scraping measures:
Manually handling these in Crawly is possible but cumbersome. Instead, FoxScrape can fetch fully-rendered pages for you, letting you continue using Crawly and Floki without additional browser automation.
🔧 Example: Fetching via FoxScrape
1defmodule PriceSpider.FoxSpider do
2 use Crawly.Spider
3
4 @api_key "YOUR_API_KEY"
5 @target_url "https://books.toscrape.com"
6
7 def base_url(), do: @target_url
8
9 def init(), do: [start_urls: [@target_url]]
10
11 def parse_item(_response) do
12 fox_url = "https://www.foxscrape.com/api/v1?api_key=#{@api_key}&url=#{@target_url}"
13 {:ok, resp} = HTTPoison.get(fox_url)
14 {:ok, document} = Floki.parse_document(resp.body)
15
16 items =
17 document
18 |> Floki.find(".product_pod")
19 |> Enum.map(fn product ->
20 title = product |> Floki.find("h3 a") |> Floki.attribute("title") |> List.first()
21 price = product |> Floki.find(".price_color") |> Floki.text()
22 %{title: title, price: price}
23 end)
24
25 Crawly.Engine.save_items(items)
26 end
27end
28You can also enable JS rendering for dynamic pages:
1fox_url = "https://www.foxscrape.com/api/v1?api_key=#{@api_key}&url=#{@target_url}&render_js=true"
2✅ Why use FoxScrape here:
💾 7. Exporting Data
Crawly supports multiple output formats — for simplicity, save items to JSON Lines:
1Crawly.Engine.start_spider(PriceSpider.FoxSpider)
2# Output: data in _build/dev/lib/price_spider/output/items.jsonlAlternatively, you can post-process the JSON or write to CSV using Elixir’s CSV library.
⚖️ 8. Comparison: Crawly vs FoxScrape
| Method | JS Support | Anti-Bot Handling | Concurrency | Complexity |
|---|---|---|---|---|
| Crawly + Floki | ❌ | ⚠️ Partial | ✅ High | 🟡 Medium |
| FoxScrape API | ✅ | ✅ Automatic | ✅ High | 🟢 Simple |
FoxScrape essentially offloads the network and JS rendering layer while letting Crawly remain the parsing engine.
🧠 9. Best Practices
robots.txt and site rate limits🎯 10. Conclusion
In this tutorial, you learned:
Elixir + Crawly is already a high-performance scraping solution. Adding FoxScrape makes it easier to scale, handle JS-heavy sites, and bypass anti-bot protections — all while keeping your parsing code clean and familiar.
🦊 Try FoxScrape in Elixir:
1api_key = "YOUR_API_KEY"
2url = "https://www.amazon.com/s?k=graphics+cards"
3fox_url = "https://www.foxscrape.com/api/v1?api_key=#{api_key}&url=#{url}"
4{:ok, resp} = HTTPoison.get(fox_url)
5IO.puts String.slice(resp.body, 0..500)Happy scraping — fast, scalable, and ethical.
Further Reading

Web Scraping with C++
Web scraping is one of those timeless developer tasks — equal parts fascinating and frustrating. The ability to automate data extraction from websi...

Web Scraping with Rust
Rust is increasingly popular for web scraping because of its speed, memory safety, and concurrency capabilities. In this guide, we’ll build a scrap...

Web Scraping with Golang
Go, also known as Golang, is a language built for speed, simplicity, and concurrency. It’s particularly well-suited for tasks like web scraping, wh...