Web Scraping with Ruby

Web scraping is one of those quiet superpowers every developer eventually picks up. Whether you’re building a price tracker, collecting research data, or just automating boring work, scraping lets you read the web like a machine — and Ruby happens to make that surprisingly elegant.
In this tutorial, we’ll build your skills step-by-step:
from making a single HTTP request and parsing HTML,
to automating entire browsers, and finally, simplifying it all with FoxScrape — a fast, API-based scraping solution for sites that fight back.
You’ll walk away knowing:
Let’s start from the beginning.
🧰 1. Why Ruby for Web Scraping?
Ruby’s not just for Rails — it’s also great for scripting, data parsing, and automation.
Its expressive syntax and ecosystem of gems make it ideal for building readable, maintainable scrapers.
Here’s what makes Ruby shine for scraping:
| Feature | Why It Matters |
|---|---|
| Nokogiri | Fast, reliable HTML/XML parsing |
| Faraday | Modern, flexible HTTP client |
| Capybara + Selenium | Automate browsers like Chrome or Firefox |
| CSV & JSON | Built-in data export |
| Threads / Parallel gem | Simple concurrency for multiple pages |
Throughout this guide, we’ll use these gems to demonstrate the full scraping workflow — from raw HTML to structured CSV output.
🧑💻 2. Setting Up Your Ruby Environment
You’ll need:
gem install bundler)Create a new folder for your scraper:
1mkdir ruby-scraper && cd ruby-scraper
2bundle initNow open your Gemfile and add the following gems:
1gem "faraday"
2gem "nokogiri"
3gem "selenium-webdriver"
4gem "capybara"
5gem "csv"
6gem "parallel"Then run:
1bundle installThat’s it — you’re ready to start scraping.
🌐 3. Making Your First HTTP Request with Faraday
Let’s warm up by fetching a webpage.
1require 'faraday'
2
3response = Faraday.get("https://example.com")
4puts response.status
5puts response.body[0..200]This performs a simple GET request and prints the first 200 characters of the response.
✅ What’s happening here:
Faraday.get() fetches the page.response.status tells you if the request succeeded (200 = OK).response.body is the raw HTML you’ll parse next.🧩 4. Parsing HTML with Nokogiri
Nokogiri is the Swiss Army knife of Ruby scraping. It lets you navigate HTML like a tree — select tags, extract text, and manipulate content easily.
Let’s extract links from example.com:
1require 'nokogiri'
2require 'faraday'
3
4html = Faraday.get("https://example.com").body
5doc = Nokogiri::HTML(html)
6
7links = doc.css("a").map { |a| a['href'] }.compact
8puts linksWhat’s happening:
a) to find all <a> tags..map collects their href attributes.compact removes nil values.💡 Pro tip: You can also use doc.at_css("h1").text to grab single elements, like titles or headers.
🍿 5. Real-World Example: Scraping Movie Titles
Let’s make it real by scraping movie data from https://books.toscrape.com — a safe test site for scraping.
1require 'nokogiri'
2require 'faraday'
3require 'csv'
4
5url = "https://books.toscrape.com"
6html = Faraday.get(url).body
7doc = Nokogiri::HTML(html)
8
9books = doc.css(".product_pod").map do |book|
10 title = book.at_css("h3 a")["title"]
11 price = book.at_css(".price_color").text
12 { title: title, price: price }
13end
14
15CSV.open("books.csv", "w") do |csv|
16 csv << ["Title", "Price"]
17 books.each { |b| csv << [b[:title], b[:price]] }
18end
19
20puts "✅ Saved #{books.size} books to books.csv"Now you’ve scraped, parsed, and exported structured data — the full basic cycle.
But as you’ll see next, real websites rarely make it this easy.
🧱 6. When the Web Fights Back: Anti-Scraping Measures
Sooner or later, you’ll hit problems like:
Static scraping tools like Faraday + Nokogiri don’t handle these cases well — they only fetch raw HTML, not JavaScript-rendered pages or protected endpoints.
That’s where you have two main options:
Let’s explore both paths.
🧭 7. Scraping Dynamic Sites with Selenium & Capybara
When JavaScript gets in the way, you can use browser automation.
Install ChromeDriver or GeckoDriver first, then try this:
1require 'selenium-webdriver'
2
3options = Selenium::WebDriver::Chrome::Options.new
4options.add_argument("--headless")
5driver = Selenium::WebDriver.for(:chrome, options: options)
6
7driver.navigate.to "https://quotes.toscrape.com/js/"
8sleep 2 # wait for JS to load
9puts driver.title
10puts driver.page_source[0..300]
11
12driver.quitThis launches a headless Chrome browser, loads a JS-heavy site, and prints its HTML.
✅ Pros: Works anywhere, real browser context
❌ Cons: Slow, memory-hungry, and sometimes brittle
Wouldn’t it be nice if you could get the same results without running a full browser?
🦊 8. Simplifying It All with FoxScrape
Here’s where FoxScrape comes in — a powerful API that fetches fully-rendered pages from any URL, so you don’t have to deal with:
With a single HTTP call, you get clean, ready-to-parse HTML.
Let’s adapt your previous Nokogiri scraper to use FoxScrape:
1require 'faraday'
2require 'nokogiri'
3
4FOX_API_KEY = "YOUR_API_KEY"
5target_url = "https://books.toscrape.com"
6fox_url = "https://www.foxscrape.com/api/v1?api_key=#{FOX_API_KEY}&url=#{target_url}"
7
8response = Faraday.get(fox_url)
9doc = Nokogiri::HTML(response.body)
10
11books = doc.css(".product_pod h3 a").map { |a| a["title"] }
12puts "Found #{books.size} books!"💡 You can even enable JS rendering:
1fox_url = "https://www.foxscrape.com/api/v1?api_key=#{FOX_API_KEY}&url=#{target_url}&render_js=true"The beauty?
You still use your familiar parsing code — FoxScrape only replaces the network layer.
Your Nokogiri logic stays exactly the same.
💾 9. Handling Retries and Saving Data
Sometimes requests fail — networks drop, sites throttle. Add retries and CSV output for resilience:
1require 'faraday'
2require 'faraday/retry'
3
4conn = Faraday.new do |f|
5 f.request :retry, max: 3, interval: 1
6 f.adapter Faraday.default_adapter
7end
8
9response = conn.get("https://www.foxscrape.com/api/v1", {
10 api_key: FOX_API_KEY,
11 url: "https://example.com"
12})
13
14if response.success?
15 File.write("page.html", response.body)
16 puts "Saved HTML snapshot."
17else
18 puts "Error: #{response.status}"
19end⚙️ 10. Putting It All Together — A Mini Project
Let’s combine everything into a small, practical scraper that:
1require 'faraday'
2require 'nokogiri'
3require 'csv'
4require 'parallel'
5
6FOX_API_KEY = "YOUR_API_KEY"
7base_url = "https://books.toscrape.com/catalogue/page-"
8
9pages = (1..5).map { |i| "#{base_url}#{i}.html" }
10
11results = Parallel.map(pages, in_threads: 4) do |url|
12 fox_url = "https://www.foxscrape.com/api/v1?api_key=#{FOX_API_KEY}&url=#{url}"
13 html = Faraday.get(fox_url).body
14 doc = Nokogiri::HTML(html)
15
16 doc.css(".product_pod").map do |b|
17 {
18 title: b.at_css("h3 a")["title"],
19 price: b.at_css(".price_color").text
20 }
21 end
22end.flatten
23
24CSV.open("books_combined.csv", "w") do |csv|
25 csv << ["Title", "Price"]
26 results.each { |r| csv << [r[:title], r[:price]] }
27end
28
29puts "✅ Scraped #{results.size} books across 5 pages!"That’s a multi-threaded, fault-tolerant scraper running through a robust API — production-ready and concise.
⚖️ 11. Choosing the Right Approach
Here’s a quick reference comparing the main scraping methods:
| Method | Handles JS? | Handles Blocks? | Speed | Complexity |
|---|---|---|---|---|
| Nokogiri + Faraday | ❌ | ❌ | ⚡ Fast | 🟢 Simple |
| Selenium / Capybara | ✅ | ⚠️ Partial | 🐢 Slow | 🔴 Complex |
| FoxScrape API | ✅ | ✅ | ⚡⚡ Fast | 🟢 Simple |
If your target site is static — stick with Nokogiri.
If it’s dynamic or protected — FoxScrape is your easiest route.
🧠 12. Best Practices & Ethics
A few golden rules of scraping:
FoxScrape helps here too — its backend automatically throttles and retries responsibly, so your IPs (and conscience) stay clean.
🎯 13. Conclusion
You’ve now built a complete scraping toolkit in Ruby — from raw HTML fetching to full-scale, parallel data collection.
You learned how to:
When scraping goes from “fun experiment” to “daily pipeline,” that’s when FoxScrape truly shines — because you’ll spend less time fighting blocks and more time working with your data.
So go ahead:
Run your first FoxScrape request, grab your results, and watch how easy scraping can be when the hard parts are already handled.
🦊 Try it yourself:
1FOX_API_KEY = "YOUR_API_KEY"
2url = "https://en.wikipedia.org/wiki/Ruby_(programming_language)"
3fox_url = "https://www.foxscrape.com/api/v1?api_key=#{FOX_API_KEY}&url=#{url}"
4puts Faraday.get(fox_url).body[0..500]Happy scraping — ethically, efficiently, and effortlessly.
Further Reading

Web Scraping Without Getting Blocked
Web scraping is the automated process of extracting data from websites by parsing HTML and other web content. It's a powerful technique used by bus...

Web Scraping With JavaScript and Node.js
Web scraping has become an essential skill for developers who need to extract data from websites efficiently. Whether you're building a price compa...

Python Web Scraping: Full Tutorial With Examples
Hey there, data enthusiast! 👋 Welcome to your ultimate guide to web scraping with Python. Whether you're building a price comparison tool, gatheri...