Web Scraping with Java Made Easy

Web scraping is one of those essential developer skills that sits somewhere between art and engineering. Whether you’re collecting product data, monitoring competitors, or automating a data feed — understanding how to extract information from the web efficiently can give you a serious edge.
In this guide, we’ll explore how to build web scrapers in Java, step-by-step, using the most popular libraries available — from the simple and elegant Jsoup, to HtmlUnit and Selenium for more dynamic scenarios.
Along the way, we’ll also look at a simpler alternative for those who want to avoid complex setups and anti-bot headaches: using a hosted scraping API like FoxScrape.
🧠 What Is Web Scraping, Really?
At its core, web scraping means programmatically loading a web page and extracting specific data — such as product names, prices, or links — so that it can be reused or analyzed.
Example use cases:
⚖️ Always remember: scraping should be done ethically.
Respect robots.txt, obey site terms, and don’t overload servers with unnecessary requests.
⚙️ Choosing the Right Tools for Java Web Scraping
Java offers a rich set of tools for different scraping needs. Each tool serves a purpose depending on whether a website is static, dynamic, or heavily reliant on JavaScript.
| Library | Ideal Use Case | Key Strength |
|---|---|---|
| Jsoup | Static pages with structured HTML | Lightweight and elegant HTML parser |
| HtmlUnit | Simulating form interactions or logins | Acts like a lightweight headless browser |
| Selenium | Full JavaScript rendering and browser control | Ideal for dynamic, JS-heavy websites |
We’ll explore all three, starting with the most straightforward: Jsoup.
🧾 Scraping Static Websites with Jsoup
For static HTML pages (sites where the content is available in the HTML itself), Jsoup is the gold standard. It’s fast, simple, and reads almost like natural language.
🧩 Example: Extracting Product Titles
Let’s scrape all product titles from a sample store page.
1import org.jsoup.Jsoup;2import org.jsoup.nodes.Document;3import org.jsoup.select.Elements;45public class JsoupExample {6public static void main(String[] args) throws Exception {7Document doc = Jsoup.connect("https://example.com/products").get();8Elements titles = doc.select(".product-title");9titles.forEach(t -> System.out.println(t.text()));10}11}
This works beautifully — but only if the website is static.
If the content is rendered by JavaScript, you’ll end up with empty results because Jsoup never executes client-side scripts.
💻 Scraping Forms and Simulating Actions with HtmlUnit
Some sites require interaction — like filling out a search form or logging in before you can access data.
That’s where HtmlUnit comes in handy.
It’s a headless browser written in Java, capable of managing sessions, cookies, and form submissions.
Example: Submitting a Search Form
1import com.gargoylesoftware.htmlunit.*;2import com.gargoylesoftware.htmlunit.html.*;34public class HtmlUnitExample {5public static void main(String[] args) throws Exception {6try (final WebClient client = new WebClient(BrowserVersion.CHROME)) {7HtmlPage page = client.getPage("https://example.com/search");8HtmlForm form = page.getForms().get(0);9HtmlTextInput input = form.getInputByName("query");10input.setValueAttribute("laptops");11HtmlSubmitInput submit = form.getInputByName("submit");12HtmlPage result = submit.click();13System.out.println(result.asText());14}15}16}
This code performs a real search — just like a browser would — and prints the result.
It’s a great approach for sites with basic interactivity, but it can’t handle modern, JavaScript-heavy frontends.
⚡ Dealing with JavaScript-Heavy Websites
And here’s where many Java developers hit the wall.
Modern websites rely heavily on frameworks like React, Vue, or Angular. These sites load data dynamically, meaning the content doesn’t exist in the raw HTML source — it’s generated later in the browser.
In these cases, Jsoup and HtmlUnit can’t help much.
The Traditional Fix: Selenium
Selenium allows Java to control a real browser — load the page, wait for JS to execute, and then extract the rendered HTML.
1import org.openqa.selenium.*;2import org.openqa.selenium.chrome.ChromeDriver;34public class SeleniumExample {5public static void main(String[] args) {6WebDriver driver = new ChromeDriver();7driver.get("https://example.com/dynamic");8String html = driver.getPageSource();9System.out.println(html);10driver.quit();11}12}
This works, but it’s heavy. You’ll need:
If you only need to retrieve data — not control the browser — this setup can be excessive.
🦊 The Smarter Alternative: Using FoxScrape API
Let’s pause here and think practically.
What if you could:
That’s what FoxScrape is built for.
FoxScrape acts as a cloud-based scraping layer — you send a URL, and it returns the rendered HTML or API response, ready to parse with Jsoup or Jackson.
Here’s how the same Selenium task looks with FoxScrape:
1import org.jsoup.Jsoup;2import org.jsoup.nodes.Document;34public class FoxScrapeExample {5public static void main(String[] args) throws Exception {6String foxUrl = "https://www.foxscrape.com/api/v1?url=https://example.com/dynamic&render_js=true";7Document doc = Jsoup.connect(foxUrl).get();8System.out.println(doc.title());9}10}
That’s it — one call, one response.
No browser drivers. No proxies. No waiting for rendering manually.
FoxScrape takes care of:
It returns the final rendered HTML, which you can parse using the same Jsoup logic as before.
This approach is perfect for production-scale scraping or cloud deployments where simplicity and reliability matter more than controlling a local browser.
🔁 Handling Infinite Scroll and AJAX Requests
Infinite scroll pages are another tricky scenario.
When you scroll, the site sends background (AJAX) requests to load new data.
You can handle this in two ways:
1JavascriptExecutor js = (JavascriptExecutor) driver;2for (int i = 0; i < 5; i++) {3js.executeScript("window.scrollTo(0, document.body.scrollHeight)");4Thread.sleep(2000);5}
You’ll often find a JSON endpoint like:
1https://api.example.com/products?page=3
You can then call this directly:
1String jsonUrl = "https://api.example.com/products?page=3";2String response = Jsoup.connect(jsonUrl).ignoreContentType(true).execute().body();3System.out.println(response);
If the site hides or dynamically generates this API, you can use FoxScrape to render and extract the full scrolled content without writing scrolling logic:
1https://www.foxscrape.com/api/v1?url=https://example.com/products&render_js=true
🧮 Parsing JSON Data with Jackson
When your scraped data is in JSON format, use a library like Jackson to process it.
1import com.fasterxml.jackson.databind.*;2import java.net.*;34public class JsonParseExample {5public static void main(String[] args) throws Exception {6String json = "{\"product\": \"Laptop\", \"price\": 1200}";7ObjectMapper mapper = new ObjectMapper();8JsonNode node = mapper.readTree(json);9System.out.println(node.get("product").asText());10}11}
You can chain this with any request — including one from FoxScrape — to directly parse structured data.
💾 Saving and Structuring Your Data
Once you have your parsed data, store it in a format that suits your workflow.
Example: Writing to CSV
1import java.io.FileWriter;2import com.opencsv.CSVWriter;34public class CsvWriterExample {5public static void main(String[] args) throws Exception {6try (CSVWriter writer = new CSVWriter(new FileWriter("data.csv"))) {7String[] header = {"Name", "Price"};8writer.writeNext(header);9writer.writeNext(new String[]{"Laptop", "1200"});10}11}12}
For larger projects, consider:
🧭 Best Practices for Web Scraping
Building a good scraper isn’t just about code — it’s about being efficient and ethical.
Do:
Don’t:
🦊 Pro tip: FoxScrape automatically manages rate limiting, IP rotation, and JavaScript rendering, so you can scale safely without managing infrastructure yourself.
🏁 Wrapping It Up
By now, you’ve seen the full range of Java’s web scraping capabilities:
| Use Case | Recommended Tool |
|---|---|
| Static HTML | Jsoup |
| Form Submissions / Light JS | HtmlUnit |
| Full JS Rendering | Selenium |
| Automated Managed Scraping | FoxScrape |
If you enjoy building scrapers manually — Jsoup, HtmlUnit, and Selenium give you full control.
But if your goal is speed, simplicity, and reliability, FoxScrape provides a powerful shortcut: an all-in-one scraping API that handles browsers, proxies, and rendering for you.
In short, use your code for logic, not logistics.
Happy scraping, responsibly and efficiently.
Further Reading

Web Scraping With JavaScript and Node.js
Web scraping has become an essential skill for developers who need to extract data from websites efficiently. Whether you're building a price compa...

Python Web Scraping: Full Tutorial With Examples
Hey there, data enthusiast! 👋 Welcome to your ultimate guide to web scraping with Python. Whether you're building a price comparison tool, gatheri...

A Complete Guide to Web Scraping in R
If you're already using R for data analysis, you have a powerful secret weapon: you can scrape, clean, analyze, and visualize data all in the same ...