Web Scraping with Golang

Published on
Written by
Mantas Kemėšius
Web Scraping with Golang

Go, also known as Golang, is a language built for speed, simplicity, and concurrency. It’s particularly well-suited for tasks like web scraping, where you want your scrapers to run efficiently, potentially in parallel, without the overhead of heavier languages.

In this guide, we’ll start with Go’s standard library to scrape basic HTML pages, then step up to Colly, a popular Go framework for more structured and scalable scrapers. Finally, we’ll explore FoxScrape, an API-based solution for scraping complex or JavaScript-heavy pages without worrying about proxies, headless browsers, or anti-bot measures.

By the end, you’ll have a full toolkit for scraping websites in Go — from simple HTTP requests to production-ready crawlers.

⚙️ 1. Why Go for Web Scraping?

Go’s advantages for scraping include:

FeatureBenefit
Concurrency (goroutines)Run multiple scrapers simultaneously without blocking
Simple syntaxEasy to read and maintain scraping logic
SpeedFast HTTP requests and parsing
Cross-platformWorks on Windows, macOS, Linux

Popular libraries:

  • net/http + golang.org/x/net/html: Low-level, gives full control over requests and parsing.
  • Colly: High-level, handles crawling, selectors, concurrency, and storage.
  • goquery: jQuery-like syntax for HTML parsing; can be combined with Colly.
  • 🛠️ 2. Prerequisites

    You’ll need:

  • Go 1.23.2 or newer
  • Basic knowledge of Go syntax and packages
  • A text editor or IDE (VS Code with Go extension recommended)
  • 📦 3. Setting Up Your Project

    Create a folder and initialize your module:

    BASH
    1mkdir go-scraper && cd go-scraper
    2go mod init github.com/yourusername/go-scraper
    3

    This creates go.mod and go.sum files — Go’s dependency management system.

    🏗️ 4. Building a Basic Scraper with net/http

    First, let’s scrape static HTML using only Go’s standard library.

    GO
    1package main
    2
    3import (
    4	"fmt"
    5	"net/http"
    6	"golang.org/x/net/html"
    7)
    8
    9func main() {
    10	resp, err := http.Get("https://en.wikipedia.org/wiki/Web_scraping")
    11	if err != nil {
    12		panic(err)
    13	}
    14	defer resp.Body.Close()
    15
    16	doc, err := html.Parse(resp.Body)
    17	if err != nil {
    18		panic(err)
    19	}
    20
    21	var f func(*html.Node)
    22	f = func(n *html.Node) {
    23		if n.Type == html.ElementNode && n.Data == "a" {
    24			for _, attr := range n.Attr {
    25				if attr.Key == "href" {
    26					fmt.Println(attr.Val)
    27				}
    28			}
    29		}
    30		for c := n.FirstChild; c != nil; c = c.NextSibling {
    31			f(c)
    32		}
    33	}
    34	f(doc)
    35}

    What’s happening here:

  • http.Get fetches the page.
  • html.Parse turns the HTML into a tree structure.
  • Recursive function traverses the DOM, extracting all <a> tag href attributes.
  • This is simple and works for static pages, but JavaScript-driven content won’t show up — we’ll address that later.

    🌐 5. Introducing Colly

    Colly is a higher-level scraping framework for Go. It provides:

  • Automatic request handling
  • Easy HTML selection (OnHTML)
  • Concurrency with goroutines
  • Callbacks for events like OnRequest, OnError
  • Install Colly:

    BASH
    1go get -u github.com/gocolly/colly/v2
    2

    ✏️ 6. Building a Wikipedia Scraper with Colly

    Rewriting the previous example using Colly:

    GO
    1package main
    2
    3import (
    4	"fmt"
    5	"github.com/gocolly/colly/v2"
    6)
    7
    8func main() {
    9	c := colly.NewCollector(
    10		colly.AllowedDomains("en.wikipedia.org"),
    11	)
    12
    13	c.OnHTML("div.mw-parser-output a[href]", func(e *colly.HTMLElement) {
    14		fmt.Println(e.Attr("href"))
    15	})
    16
    17	c.OnRequest(func(r *colly.Request) {
    18		fmt.Println("Visiting", r.URL.String())
    19	})
    20
    21	c.Visit("https://en.wikipedia.org/wiki/Web_scraping")
    22}

    Colly automatically handles request management, concurrency, and parsing. Using OnHTML, you can extract elements using CSS selectors rather than manually traversing the DOM.

    💡 Tip: Use your browser’s developer tools to inspect IDs, classes, or structure before writing selectors.

    📊 7. Scraping Table Data

    Let’s extract tabular data and save it to CSV. Example: a sample HTML table from W3Schools.

    GO
    1package main
    2
    3import (
    4	"encoding/csv"
    5	"log"
    6	"os"
    7	"github.com/gocolly/colly/v2"
    8)
    9
    10func main() {
    11	c := colly.NewCollector()
    12	file, _ := os.Create("table.csv")
    13	writer := csv.NewWriter(file)
    14	defer writer.Flush()
    15
    16	writer.Write([]string{"Name", "Country", "Age"})
    17
    18	c.OnHTML("table tr", func(e *colly.HTMLElement) {
    19		cells := e.ChildTexts("td")
    20		if len(cells) >= 3 {
    21			writer.Write(cells[:3])
    22		}
    23	})
    24
    25	c.Visit("https://www.w3schools.com/html/html_tables.asp")
    26	log.Println("✅ Saved table to table.csv")
    27}
    28

    This is straightforward for structured tables and works well with Colly.

    ⚡ 8. Scaling Up with Concurrency

    Colly supports parallel scraping out-of-the-box:

    GO
    1c := colly.NewCollector(colly.Async(true))
    2c.Limit(&colly.LimitRule{Parallelism: 4})
    3

    You can now fetch multiple pages simultaneously without manually spawning goroutines.

    🦊 9. Handling JavaScript and Anti-Scraping with FoxScrape

    At this point, you might hit pages that:

  • Load content dynamically with JS
  • Rate-limit requests or block unknown IPs
  • Require proxy rotation
  • Running a headless browser manually in Go is possible but cumbersome. Enter FoxScrape — an API that handles all these challenges for you.

    🔧 Example: Fetch a Rendered Page

    GO
    1package main
    2
    3import (
    4	"fmt"
    5	"github.com/go-resty/resty/v2"
    6	"github.com/PuerkitoBio/goquery"
    7)
    8
    9func main() {
    10	apiKey := "YOUR_API_KEY"
    11	targetURL := "https://books.toscrape.com"
    12	foxURL := fmt.Sprintf("https://www.foxscrape.com/api/v1?api_key=%s&url=%s", apiKey, targetURL)
    13
    14	client := resty.New()
    15	resp, err := client.R().Get(foxURL)
    16	if err != nil {
    17		panic(err)
    18	}
    19
    20	doc, _ := goquery.NewDocumentFromReader(resp.RawBody())
    21	doc.Find(".product_pod h3 a").Each(func(i int, s *goquery.Selection) {
    22		title := s.Text()
    23		href, _ := s.Attr("href")
    24		fmt.Printf("%d: %s → %s\n", i+1, title, href)
    25	})
    26}
    27

    Benefits:

  • No manual proxy rotation
  • Option to render JavaScript (render_js=true)
  • Built-in retries and anti-blocking
  • The parsing logic remains identical — you just replace the network layer with a FoxScrape request.

    📁 10. Saving Results to CSV

    Combine FoxScrape with Go’s encoding/csv to persist data:

    GO
    1file, _ := os.Create("products.csv")
    2writer := csv.NewWriter(file)
    3defer writer.Flush()
    4
    5writer.Write([]string{"Title", "URL"})
    6
    7doc.Find(".product_pod h3 a").Each(func(i int, s *goquery.Selection) {
    8	title := s.Text()
    9	href, _ := s.Attr("href")
    10	writer.Write([]string{title, href})
    11})
    12

    Now you have structured data ready for analysis.

    ⚖️ 11. Comparison Table: Methods in Go

    MethodJavaScript SupportHandles BlocksSpeedComplexity
    net/http + html⚡ Fast🟢 Simple
    Colly + goquery⚠️⚠️ Partial⚡ Fast🟡 Medium
    FoxScrape API⚡⚡ Fast🟢 Simple

    🧠 12. Best Practices

  • Respect site rate limits and robots.txt
  • Cache pages locally for debugging
  • Use structured parsing (CSS selectors or XPath)
  • For large-scale scraping, consider a queue system or goroutine pools
  • Use FoxScrape to avoid IP bans or complex headless browser setups
  • 🎯 13. Conclusion

    In this guide, you learned how to:

  • Scrape static pages with Go’s standard library
  • Use Colly for higher-level scraping, concurrency, and HTML selection
  • Handle dynamic or protected pages with FoxScrape API
  • Export structured data to CSV for analysis
  • With Go’s speed and concurrency plus FoxScrape’s reliability, you can scale scraping workflows without the usual headaches — no complex proxy setup, no Selenium maintenance, just clean HTML and structured data ready for your applications.

    🦊 Try FoxScrape for Go: https://www.foxscrape.com — fetch any page, static or dynamic, straight into your Go scraper with minimal setup.