Web scraping sits at the intersection of curiosity and automation. It’s what happens when developers stop copying data manually and start thinking: couldn’t I just write code for this?

Whether you’re tracking prices, gathering research data, or analyzing trends across multiple websites, scraping is often the most direct way to collect public information at scale. But the web you scrape today is not the same as the web of a decade ago.

Modern sites use JavaScript frameworks, lazy loading, and aggressive bot protection. So to extract data reliably, we need to combine the right techniques, libraries, and occasionally — a little bit of infrastructure help.

In this article, we’ll explore how to scrape the web effectively using Scala, comparing three major methods:

jsoup – for lightweight, static HTML parsing

Scala Scraper – a Scala-idiomatic wrapper around jsoup

Selenium – for interacting with JavaScript-heavy pages

And finally, we’ll look at when it makes sense to offload complexity using an external API — introducing FoxScrape, a scraping API that handles browser rendering and proxy management for you.

But let’s not get ahead of ourselves. We’ll start from scratch — building your first scraping project in Scala, one line of code at a time.

⚙️ 1. Preparing Your Scala Environment

Before we scrape anything, we need a clean development setup.

🧩 Installing Scala and sbt

Ensure you have:

Scala 3.6.3

sbt 1.10.7

Check your versions:

BASH

1scala -version
2sbt sbtVersion

If you’re missing either, install via scala-lang.org or use Coursier (cs setup) for an all-in-one toolchain.

🏗️ Project Structure

Let’s create a new project:

BASH

1sbt new scala/scala3.g8
2cd my-scraper

This gives you a minimal template with /src/main/scala/Main.scala.

Now add our dependencies.

📦 build.sbt

SCALA

1ThisBuild / scalaVersion := "3.6.3"
2
3libraryDependencies ++= Seq(
4  "org.jsoup" % "jsoup" % "1.18.1",
5  "net.ruippeixotog" %% "scala-scraper" % "3.1.0",
6  "org.seleniumhq.selenium" % "selenium-java" % "4.25.0",
7  "com.lihaoyi" %% "requests" % "0.8.0" // for HTTP calls later
8)

With this, we can experiment with all three approaches — from static HTML parsing to full browser automation — without switching environments.

🌐 2. Your First Scrape with jsoup

The simplest way to scrape is to fetch a page and parse it directly.

🧩 What jsoup Does

jsoup is a Java library (fully compatible with Scala) that:

Downloads and parses HTML into a traversable DOM.

Lets you select elements using familiar CSS selectors.

Cleans and normalizes HTML for easier data extraction.

Perfect for static pages — like Wikipedia, blogs, or documentation sites.

✏️ Example: Scrape Wikipedia’s Title

SCALA

1import org.jsoup.Jsoup
2
3@main def scrapeWikipedia(): Unit =
4  val doc = Jsoup.connect("https://en.wikipedia.org/").get()
5  println(doc.title())

Run it:

PLAIN TEXT

1sbt run

Output:

PLAIN TEXT

1Wikipedia, the free encyclopedia

That’s jsoup in a nutshell — simple, fast, and excellent for HTML that’s already rendered server-side.

🎯 3. Selecting Elements with jsoup

Once you have a document, you’ll want to extract specific sections — headlines, links, or data fields.

🔍 CSS Selectors in jsoup

jsoup uses CSS-like selectors:

Selector	Meaning
`#id`	element with id
`.class`	element with class
`div p`	`<p>` inside `<div>`
`a[href]`	`<a>` with `href` attribute

🧠 Example: Wikipedia’s “In the news” section

SCALA

1import org.jsoup.Jsoup
2
3@main def wikipediaNews(): Unit =
4  val doc = Jsoup.connect("https://en.wikipedia.org/").get()
5
6  val news = doc.select("#mp-itn b a")
7  for (n <- news.asScala)
8    println(s"${n.text()} → ${n.absUrl("href")}")

We grab the bolded links in the “In the news” section using the #mp-itn ID and print their text and URLs.

This is where scraping starts to feel magical — the ability to programmatically traverse the same structure you inspect in your browser’s “View Source.”

🧩 4. Extracting Text and Attributes

jsoup makes it trivial to extract values:

Method	Description
`.text()`	Returns the text content
`.attr("href")`	Returns the attribute value
`.children()`	Gets child elements
`.first()` / `.last()`	Traverse element nodes

Example:

SCALA

1val headlines = doc.select("#mp-itn li")
2
3for (item <- headlines.asScala) {
4  val link = item.select("a").first()
5  val title = link.text()
6  val href = link.absUrl("href")
7  println(s"$title → $href")
8}

This gives you structured pairs of text and URLs — perfect for later saving into JSON, CSV, or a database.

🧰 5. Scraping with Scala Scraper

While jsoup is powerful, its Java-style syntax isn’t very Scala-idiomatic.

That’s where the Scala Scraper library comes in — it wraps jsoup in a more functional, expressive API.

📦 Setup

We already added it to our build.sbt.

✏️ Example

SCALA

1import net.ruippeixotog.scalascraper.browser.JsoupBrowser
2import net.ruippeixotog.scalascraper.dsl.DSL.*
3
4@main def scrapeWithScalaScraper(): Unit =
5  val browser = JsoupBrowser()
6  val doc = browser.get("https://en.wikipedia.org/")
7
8  val titles = doc >> elementList("#mp-itn b a")
9  titles.foreach { el =>
10    println(el.text + " → " + el.attr("href"))
11  }

Notice how concise it feels. The >> operator is part of the DSL — a nice abstraction that lets you extract elements declaratively.

Scala Scraper is especially helpful when you want to map extracted nodes directly into Scala collections.

🧮 6. Extracting Structured Data

Let’s get a little more structured — turning HTML into Scala data.

SCALA

1case class Headline(title: String, link: String)
2
3@main def scrapeNewsStructured(): Unit =
4  val browser = JsoupBrowser()
5  val doc = browser.get("https://en.wikipedia.org/")
6
7  val headlines =
8    for el <- doc >> elementList("#mp-itn li a")
9    yield Headline(el.text, el.absUrl("href"))
10
11  headlines.take(5).foreach(println)

Output:

PLAIN TEXT

1Headline(Election results announced, https://en.wikipedia.org/wiki/Election_results)
2Headline(New species discovered, https://en.wikipedia.org/wiki/New_species)
3...

Clean, typed, and immediately usable for further analysis or storage.

⚠️ 7. When Static Scraping Isn’t Enough

At some point, you’ll try to scrape a modern site — say, a single-page React app or an infinite-scroll product list — and your jsoup code will suddenly return… nothing.

Why?

Because jsoup only sees HTML the server sends, not what JavaScript renders afterward.

Modern websites use client-side frameworks (React, Vue, Angular) to populate content after load. That means your scraper receives an empty skeleton, and your selectors find zero elements.

Here’s a quick rule of thumb:

Page Type	Works with jsoup?	Why
Wikipedia	✅	Static HTML
Amazon	⚠️	Some static, some dynamic
LinkedIn	❌	JS-rendered + login required
Twitter	❌	Fully dynamic SPA

So when jsoup or Scala Scraper fail, you need to simulate a real browser.

🧭 8. Enter Selenium: A Full Browser at Your Command

Selenium automates browsers — allowing you to control Chrome, Firefox, or Edge through code. It’s heavier than jsoup but essential when pages require JavaScript execution.

⚙️ Setup

You’ll need:

A browser (Chrome or Firefox)

Its matching WebDriver binary (chromedriver, geckodriver)

Make sure they’re on your PATH.

✏️ Example: Wikipedia Again (Headless Mode)

SCALA

1import org.openqa.selenium.chrome.ChromeDriver
2import org.openqa.selenium.chrome.ChromeOptions
3
4@main def seleniumWikipedia(): Unit =
5  val options = ChromeOptions()
6  options.addArguments("--headless=new", "--disable-gpu")
7  val driver = ChromeDriver(options)
8
9  driver.get("https://en.wikipedia.org/")
10  println(driver.getTitle())
11
12  driver.quit()

Selenium runs an actual browser instance, loads all scripts, and renders the page. You can then query the DOM (via findElementByCssSelector) or even scroll and click.

For example:

SCALA

1import org.openqa.selenium.By
2
3val headlines = driver.findElements(By.cssSelector("#mp-itn b a"))
4headlines.forEach(h => println(h.getText))

Powerful, but heavier. A dozen Selenium sessions consume gigabytes of memory. That’s fine for testing, but scraping at scale becomes impractical.

💡 9. Static vs Dynamic vs API Scraping

At this point, we’ve seen three clear approaches. Each has trade-offs.

Method	Strengths	Weaknesses	Use When
jsoup	Lightweight, fast	Fails on JS pages	Simple static sites
Scala Scraper	Idiomatic, concise	Same JS limits	Data mapping in Scala
Selenium	Handles dynamic JS	Slow, complex setup	Sites needing rendering

Most production pipelines combine these:

jsoup for simple pages

Selenium for the stubborn ones

And increasingly, API-based scraping for everything else

Let’s explore that last option.

☁️ 10. Simplifying Scraping with an API (Example: FoxScrape)

When scraping grows beyond one or two pages, it’s not code complexity that gets you — it’s infrastructure.

You start juggling:

Proxy rotation to avoid IP blocks

User-Agent spoofing

CAPTCHA solving

JavaScript rendering

Rate limiting

Each adds overhead, cost, and maintenance.

Wouldn’t it be better if you could just say:

“Here’s the URL I want — please give me the final rendered HTML.”

That’s the idea behind FoxScrape — a developer-friendly web scraping API.

Instead of running browsers yourself, you delegate that work to FoxScrape’s infrastructure. It fetches, renders (if needed), and returns clean HTML to your code.

⚙️ Integration Example (with Scala)

Let’s fetch and parse Wikipedia using FoxScrape’s API.

SCALA

1import requests._
2import org.jsoup.Jsoup
3
4@main def foxscrapeExample(): Unit =
5  val api = "https://www.foxscrape.com/api/v1"
6  val response = requests.get(api, params = Map(
7    "url" -> "https://en.wikipedia.org/",
8    "render_js" -> "false"
9  ))
10
11  val html = response.text()
12  val doc = Jsoup.parse(html)
13  println("Page title: " + doc.title())

That’s it — no proxies, no headless Chrome, no extra dependencies.

Want JavaScript content rendered? Just set "render_js" -> "true".

You can still use all your jsoup or Scala Scraper logic to parse the returned HTML exactly as before — because FoxScrape’s output is just clean, ready-to-parse markup.

🧱 11. When (and Why) to Use a Scraping API

FoxScrape and similar APIs shine when you need to scale or when sites are hostile to automated access.

Typical triggers to switch:

Frequent IP bans or 403 errors

JavaScript rendering required

Pages changing HTML layout frequently

High concurrency requirements

🧮 Comparison

Feature	jsoup / Scala Scraper	Selenium	FoxScrape API
JavaScript support	❌	✅	✅
Setup	Easy	Complex	None
Speed	Fast	Slow	Fast
Proxy handling	Manual	Manual	Automatic
Scale	High	Low	Very high
Cost	Free	High CPU	Pay-as-you-go

FoxScrape essentially gives you Selenium-level scraping with jsoup-level simplicity — an elegant hybrid.

🧩 12. Best Practices for Robust Scrapers

Regardless of which tool you use, a few universal best practices keep your scrapers ethical and reliable.

⏳ Respect rate limits

Insert delays between requests or use FoxScrape’s built-in throttling.

🧱 Handle pagination gracefully

Scrape multiple pages with small pauses and consistent logic.

SCALA

1for (i <- 1 to 5)
2  val page = s"https://example.com/page/$i"
3  // scrape each page

💾 Cache HTML locally

During testing, save raw pages to files so you can debug parsing logic offline.

⚖️ Follow site policies

Always read robots.txt, avoid personal data, and use scraping responsibly.

🧭 13. Conclusion: Building Smarter Scrapers

We’ve traveled from simple static scraping with jsoup to full browser automation with Selenium — and finally to API-driven scraping that abstracts away the headaches.

To recap:

Scenario	Best Tool
Simple, static HTML	jsoup
Functional Scala syntax	Scala Scraper
Dynamic JavaScript content	Selenium or FoxScrape (render_js)
Scalable, anti-bot scraping	FoxScrape API

Ultimately, the best approach depends on your use case and scale.

For quick one-offs, jsoup is perfect. For production-grade scraping that needs reliability, using an API like FoxScrape saves hours of maintenance.

It’s not about writing more scraping code — it’s about writing less infrastructure code.

🦊 Learn more and try it at FoxScrape.com — the simplest way to fetch any page, static or dynamic, straight into your Scala project.

Web Scraping with Scala

⚙️ 1. Preparing Your Scala Environment

🧩 Installing Scala and sbt

🏗️ Project Structure

📦 build.sbt

🌐 2. Your First Scrape with jsoup

🧩 What jsoup Does

✏️ Example: Scrape Wikipedia’s Title

🎯 3. Selecting Elements with jsoup

🔍 CSS Selectors in jsoup

🧠 Example: Wikipedia’s “In the news” section

🧩 4. Extracting Text and Attributes

🧰 5. Scraping with Scala Scraper

📦 Setup

✏️ Example

🧮 6. Extracting Structured Data

⚠️ 7. When Static Scraping Isn’t Enough

🧭 8. Enter Selenium: A Full Browser at Your Command

⚙️ Setup

✏️ Example: Wikipedia Again (Headless Mode)

💡 9. Static vs Dynamic vs API Scraping

☁️ 10. Simplifying Scraping with an API (Example: FoxScrape)

⚙️ Integration Example (with Scala)

🧱 11. When (and Why) to Use a Scraping API

🧮 Comparison

🧩 12. Best Practices for Robust Scrapers

⏳ Respect rate limits

🧱 Handle pagination gracefully

💾 Cache HTML locally

⚖️ Follow site policies

🧭 13. Conclusion: Building Smarter Scrapers

Further Reading

A Complete Guide to Web Scraping in R

Web Scraping with PHP

Web Scraping with Java Made Easy