Web Scraping with C#

If you’ve ever tried scraping a modern website, you’ve probably experienced a full emotional arc: excitement, frustration, triumph, and then despair when the site suddenly changes structure overnight.
Web scraping used to be simple.
Grab HttpClient, download HTML, parse it with HtmlAgilityPack, export to CSV — done. That was the era of clean HTML and predictable markup.
Fast-forward to today’s web, and everything is dynamic, JavaScript-rendered, geo-targeted, and wrapped in bot protection.
And yet, developers still need data.
We scrape not to annoy, but to understand — to collect prices, compare news sentiment, analyze public records, or monitor competition. Scraping remains one of the most pragmatic ways to automate access to information.
This article is a developer’s guide to building sane, maintainable scrapers in C#, step by step — while understanding why modern scraping is difficult and how to architect your code so it doesn’t crumble under real-world complexity.
We’ll go from:
Grab some coffee. Let’s scrape smarter.
🧱 1. Why Scraping is (Still) a Developer’s Superpower
At its core, web scraping is a form of automation.
You’re not “hacking” — you’re structuring what’s already public, making it digestible for analysis.
A few use cases that show up in real engineering teams:
| Use Case | Example | Benefit |
|---|---|---|
| Price Monitoring | Track competitors on e-commerce sites | Dynamic pricing and alerts |
| Market Research | Extract product metadata or reviews | Sentiment analysis |
| Content Aggregation | Collect articles, job listings, or forum posts | Build dashboards or newsletters |
| SEO/Marketing | Audit structured data or meta tags | Improve visibility |
| Public Data | Scrape government or NGO datasets | Research or compliance |
These are all legitimate automation patterns — but they rely on the ability to fetch and parse web data reliably.
The problem? The web keeps fighting back.
⚔️ 2. The New Reality of Web Scraping
When you run this simple C# snippet:
1using var http = new HttpClient();
2var html = await http.GetStringAsync("https://example.com");
3Console.WriteLine(html);You’d expect HTML.
But today, you might get:
Let’s unpack why.
🧩 The Obstacles
| Challenge | Description | Why It Matters |
|---|---|---|
| JavaScript Rendering | Most sites generate content dynamically after load. | You can’t see data in the raw HTML. |
| Anti-Bot Systems | Services like Cloudflare or PerimeterX detect automation. | Requests get blocked or challenged. |
| Rate Limiting | Too many requests from one IP triggers throttling. | Data stops after a few pages. |
| Geolocation Walls | Region-specific content or pricing. | Wrong or missing data. |
| HTML Variability | Different layouts for mobile, AB tests, etc. | Your XPath breaks constantly. |
Developers often try to fix these with:
This all works… until it doesn’t.
Scraping at scale isn’t a code problem — it’s an infrastructure problem.
🧰 3. Your C# Toolset: The Essentials
Before solving infrastructure, let’s build a good scraper foundation.
We’ll use C# — a fantastic language for web tasks because of its async capabilities, ecosystem, and type safety.
Here’s the minimal stack you need:
| Library | Purpose |
|---|---|
HtmlAgilityPack | Parse HTML using XPath or CSS-like queries |
CsvHelper | Export structured data easily |
HttpClient | Make web requests asynchronously |
System.Text.Json | Handle JSON APIs (bonus for hybrid scraping) |
Install them via:
1dotnet new console -n WebScraperDemo
2cd WebScraperDemo
3dotnet add package HtmlAgilityPack
4dotnet add package CsvHelperWe’ll use Books to Scrape (https://books.toscrape.com/) — a static, educational site — as our target dataset.
🧩 4. Building the Base: Fetch and Parse HTML
A minimal scraper looks like this:
1using HtmlAgilityPack;
2
3var url = "https://books.toscrape.com/";
4using var http = new HttpClient();
5var html = await http.GetStringAsync(url);
6
7var doc = new HtmlDocument();
8doc.LoadHtml(html);
9
10var titles = doc.DocumentNode.SelectNodes("//article[@class='product_pod']//h3/a");
11
12foreach (var t in titles)
13{
14 Console.WriteLine(t.InnerText.Trim());
15}Output:
1A Light in the Attic
2Tipping the Velvet
3Soumission
4Sharp ObjectsSuccess! You’ve scraped your first data.
Now, let’s turn that text into something useful.
🧩 5. Structuring Your Data
Instead of dumping everything to console, define a model:
1public sealed class Product
2{
3 public string Title { get; set; } = "";
4 public decimal Price { get; set; }
5 public string Url { get; set; } = "";
6}Then extract clean values:
1var products = new List<Product>();
2
3var nodes = doc.DocumentNode.SelectNodes("//article[@class='product_pod']");
4foreach (var n in nodes)
5{
6 var a = n.SelectSingleNode(".//h3/a");
7 var title = HtmlEntity.DeEntitize(a?.GetAttributeValue("title", "") ?? "").Trim();
8
9 var priceText = n.SelectSingleNode(".//p[@class='price_color']")?.InnerText ?? "£0.00";
10 decimal.TryParse(priceText.Replace("£", ""), out var price);
11
12 var href = a?.GetAttributeValue("href", "") ?? "";
13 var productUrl = new Uri(new Uri(url), href).ToString();
14
15 products.Add(new Product { Title = title, Price = price, Url = productUrl });
16}You now have a strongly typed dataset ready for export.
💾 6. Exporting to CSV
1using CsvHelper;
2using CsvHelper.Configuration;
3using System.Globalization;
4using System.Text;
5
6using var writer = new StreamWriter("products.csv", false, new UTF8Encoding(true));
7var csv = new CsvWriter(writer, new CsvConfiguration(CultureInfo.InvariantCulture));
8csv.WriteRecords(products);
9
10Console.WriteLine("Saved products.csv");Running this gives you:
1Title,Price,Url
2A Light in the Attic,51.77,https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html
3Tipping the Velvet,53.74,https://books.toscrape.com/catalogue/tipping-the-velvet_999/index.htmlSimple, clean, portable data.
But again — this works only because Books to Scrape is static HTML.
Try techcrunch.com or twitter.com, and you’ll see the limitations instantly.
🧭 7. Dealing with Pagination and Scale
For multi-page scraping:
1var allProducts = new List<Product>();
2var http = new HttpClient();
3
4for (int page = 1; page <= 5; page++)
5{
6 var pagedUrl = $"https://books.toscrape.com/catalogue/page-{page}.html";
7 var html = await http.GetStringAsync(pagedUrl);
8
9 var doc = new HtmlDocument();
10 doc.LoadHtml(html);
11
12 // parse like before and add to allProducts
13}This works — but what happens if one page times out?
Or if the site throttles your IP mid-loop?
You’ll either lose data or crash your scraper. That’s why serious scraping systems include:
You can implement these manually — but each adds complexity.
🧹 8. Cleaning and Normalizing Data
Scraped HTML often contains messy characters, line breaks, and entities.
1var clean = HtmlEntity.DeEntitize(rawText).Replace("\n", "").Trim();Normalize URLs and numbers early, not after export.
A small investment in cleanup logic saves hours of data repair later.
💡 9. The Moment You Hit a Wall
Eventually, every scraper hits that website.
You’ve added retries. You’ve used user-agent headers. You’ve throttled requests.
And yet, half your responses are empty or blocked.
You spend an afternoon debugging network traces and realize:
the page only renders after JavaScript executes.
At that point, you reach for Selenium or Playwright — spinning up full browsers, waiting for page load, grabbing page.Content(), and closing tabs. It works, but it’s heavy, slow, and painful to scale.
What you really need isn’t more code — it’s a way to delegate infrastructure.
☁️ 10. When to Use a Scraping API
Scraping APIs emerged to solve precisely this:
They run the browser, rotate proxies, spoof headers, and return the HTML you wish HttpClient could.
They’re not magic — they’re specialized infrastructure as a service.
A good scraping API should:
url parameterIn other words, it lets you keep your scraper logic, while offloading the plumbing.
🦊 11. Example: Using FoxScrape to Simplify Everything
Let’s replace all the messy parts of our scraper with a single, reliable API call.
FoxScrape is a developer-friendly scraping API built for exactly this:
you give it a URL (and your API key), and it returns clean, optionally rendered HTML — no proxy lists, no CAPTCHA handling, no JS engines on your side.
Same parameters as typical scraping APIs — so you don’t need to rewrite your scraper at all.
Here’s how our improved scraper looks:
1using HtmlAgilityPack;
2using CsvHelper;
3using CsvHelper.Configuration;
4using System.Globalization;
5using System.Text;
6
7public sealed class Product
8{
9 public string Title { get; set; } = "";
10 public decimal Price { get; set; }
11 public string Url { get; set; } = "";
12}
13
14var apiKey = "YOUR_API_KEY"; // get from https://www.foxscrape.com
15var baseUrl = "https://books.toscrape.com/";
16
17var requestUrl = $"https://www.foxscrape.com/api/v1?api_key={apiKey}&url={Uri.EscapeDataString(baseUrl)}";
18
19using var http = new HttpClient { Timeout = TimeSpan.FromSeconds(20) };
20var html = await http.GetStringAsync(requestUrl);
21
22var doc = new HtmlDocument();
23doc.LoadHtml(html);
24
25var products = new List<Product>();
26var cards = doc.DocumentNode.SelectNodes("//article[@class='product_pod']") ?? [];
27
28foreach (var card in cards)
29{
30 var a = card.SelectSingleNode(".//h3/a");
31 var title = HtmlEntity.DeEntitize(a?.GetAttributeValue("title", "") ?? "").Trim();
32 var href = a?.GetAttributeValue("href", "") ?? "";
33 var url = new Uri(new Uri(baseUrl), href).ToString();
34
35 var priceText = HtmlEntity.DeEntitize(card.SelectSingleNode(".//p[@class='price_color']")?.InnerText ?? "£0.00");
36 decimal.TryParse(priceText.Replace("£", "").Trim(), NumberStyles.Any, CultureInfo.InvariantCulture, out var price);
37
38 products.Add(new Product { Title = title, Price = price, Url = url });
39}
40
41var csvPath = Path.Combine(AppContext.BaseDirectory, "products.csv");
42using var writer = new StreamWriter(csvPath, false, new UTF8Encoding(true));
43var csv = new CsvWriter(writer, new CsvConfiguration(CultureInfo.InvariantCulture));
44csv.WriteRecords(products);
45
46Console.WriteLine($"Saved {products.Count} products to {csvPath}");That’s it.
One request → full HTML → parsed → exported.
Want to render JavaScript?
Just add &render_js=true.
Need to forward headers? Add them to HttpClient.
Need to scale? FoxScrape handles concurrency and rate limits server-side.
No Selenium. No proxies. No tears.
🧭 12. Testing, Scaling, and Keeping Your Scrapers Alive
Once your scraper works, the next challenge is keeping it working.
Sites evolve, selectors break, and structures shift subtly.
Some field-tested practices:
🧱 Use Semantic Selectors
Prefer class names or attribute markers over absolute XPath chains.
1"//div[contains(@class,'product')]//a"is more robust than
1"/html/body/div[2]/div[1]/div/a"🕓 Add Retry Logic and Backoff
Even with a scraping API, transient errors happen.
1for (int attempt = 1; attempt <= 3; attempt++)
2{
3 try
4 {
5 html = await http.GetStringAsync(requestUrl);
6 break;
7 }
8 catch
9 {
10 await Task.Delay(1000 * attempt);
11 }
12}🧮 Track Selectors in Config Files
Store your XPath expressions in JSON or a config class, so you can tweak them without rebuilding.
📦 Cache Raw HTML
Save copies of fetched HTML during development to debug parsing logic offline.
It also helps when you want to test changes without burning API calls.
🧩 13. Ethical and Legal Notes
Scraping is powerful — but with great power comes… yes, you know.
Always follow these principles:
| Rule | Description |
|---|---|
| Respect robots.txt | Some sites explicitly disallow automated access. |
| Use rate limiting | Don’t hammer servers — throttle requests. |
| Scrape public data only | Never collect private or copyrighted material. |
| Credit your sources | Especially for academic or journalistic use. |
| Use APIs when available | They’re faster, safer, and more stable. |
FoxScrape helps here too — by rate-limiting requests, managing concurrency, and keeping your traffic “browser-like.”
🧠 14. A Smarter Way to Scrape
After you’ve written a few scrapers, a realization hits:
the scraping logic itself isn’t the hard part — it’s keeping the pipeline alive.
You can spend weeks perfecting selectors, proxies, and error handling — or you can let a dedicated service manage that, while you stay focused on why you’re scraping in the first place.
That’s why tools like FoxScrape exist — not to replace developers, but to remove friction from data acquisition.
So, instead of spending nights debugging 403 Forbidden, you can spend them building something valuable with the data.
🦊 Learn more at FoxScrape.com.
🧩 15. Final Thoughts
Scraping in 2025 is both an art and a systems problem.
Static HTML scraping still has its place, but the modern web demands browser-grade tooling, clean architecture, and reliable infrastructure.
If you’re serious about scraping:
Your goal isn’t to fight websites — it’s to build insight pipelines.
And once you’re free from the mechanical pain, you can focus on the creative work: what you’ll do with all that beautiful, structured data.
Further Reading

Web Scraping with C++
Web scraping is one of those timeless developer tasks — equal parts fascinating and frustrating. The ability to automate data extraction from websi...

Web Scraping with Elixir
Web scraping in Elixir is a bit like having a high-performance data engine at your fingertips. Thanks to Elixir’s concurrency and Crawly, a dedicat...

Web Scraping with Rust
Rust is increasingly popular for web scraping because of its speed, memory safety, and concurrency capabilities. In this guide, we’ll build a scrap...