Selenium Web Scraping (Step-by-Step Guide)

Published on
Written by
Mantas Kemėšius
Selenium Web Scraping (Step-by-Step Guide)

Web scraping means collecting information from websites automatically using code.

While libraries like BeautifulSoup or requests are perfect for simple pages, they struggle when websites rely on JavaScript to load data dynamically — which is where Selenium shines.

Selenium is a browser automation framework that simulates real user behavior: clicking, scrolling, and interacting with elements — just like a human would. It’s primarily used for testing web apps, but it’s also a powerful tool for scraping modern, dynamic websites.

In this tutorial, you’ll learn:

  • How to set up Selenium for Python
  • How to launch and control a browser
  • How to extract and save data
  • How to handle pagination
  • When to switch to an API-based approach like FoxScrape

  • 2. Setting Up Selenium

    To get started, you’ll need Python (3.10 or higher) and a compatible browser driver.

    Install Selenium

    BASH
    1
    pip install selenium

    Install a WebDriver

    Each browser needs its own driver:

  • Chrome: chromedriver.chromium.org
  • Firefox: geckodriver
  • Edge: msedgedriver
  • Find your Chrome version → download the matching driver → place it in your PATH or specify its path manually.

    Example Setup

    PYTHON
    1
    from selenium import webdriver
    2
    from selenium.webdriver.chrome.service import Service
    3
    from selenium.webdriver.common.by import By
    4
    import pandas as pd
    5
    6
    service = Service("/path/to/chromedriver")
    7
    driver = webdriver.Chrome(service=service)

    ✅ You can use Firefox or Edge by replacing webdriver.Chrome() accordingly.


    3. Opening a Website with Selenium

    Let’s load a simple demo site and verify the title:

    PYTHON
    1
    driver.get("https://quotes.toscrape.com/")
    2
    print("Page title:", driver.title)

    Selenium fully renders the page (including JavaScript content).

    You can inspect the raw HTML with:

    PYTHON
    1
    print(driver.page_source[:500])

    This is extremely useful when debugging or confirming whether data is rendered server-side or via JavaScript.


    4. Locating and Extracting Elements

    You can target elements using:

  • By.CLASS_NAME
  • By.CSS_SELECTOR
  • By.XPATH
  • By.ID or By.TAG_NAME
  • Example: Extracting Quotes and Authors

    PYTHON
    1
    quotes = driver.find_elements(By.CLASS_NAME, "text")
    2
    authors = driver.find_elements(By.CLASS_NAME, "author")
    3
    4
    for q, a in zip(quotes, authors):
    5
    print(f"{q.text}{a.text}")

    Using CSS Selectors

    PYTHON
    1
    tags = driver.find_elements(By.CSS_SELECTOR, ".tags .tag")
    2
    for t in tags[:5]:
    3
    print("Tag:", t.text)

    🧭 Tip:

    Use find_elements (plural) to capture multiple results.

    Use DevTools (Right-click → Inspect) to identify class names or XPaths accurately.

    By.CSS_SELECTOR is often the cleanest and most efficient approach.


    5. Scraping Multiple Pages

    Let’s extend the example to scrape all pages:

    PYTHON
    1
    import time
    2
    3
    all_quotes = []
    4
    5
    while True:
    6
    quotes = driver.find_elements(By.CLASS_NAME, "text")
    7
    authors = driver.find_elements(By.CLASS_NAME, "author")
    8
    9
    for q, a in zip(quotes, authors):
    10
    all_quotes.append({"quote": q.text, "author": a.text})
    11
    12
    try:
    13
    next_btn = driver.find_element(By.XPATH, '//li[@class="next"]/a')
    14
    next_btn.click()
    15
    time.sleep(1)
    16
    except:
    17
    break

    This code loops until there’s no “Next” button left.

    Always include short delays (time.sleep(1)) to prevent overwhelming the website.


    6. Saving Data to CSV or DataFrame

    Organize and store your results neatly:

    PYTHON
    1
    df = pd.DataFrame(all_quotes)
    2
    df.to_csv("selenium_quotes.csv", index=False)
    3
    print("Saved", len(df), "quotes.")

    ✅ Clean your text (.strip()) and verify your column names before exporting.


    7. Advanced Example — Scraping Product Data

    Let’s use a fictional e-commerce site to simulate a real-world case.

    PYTHON
    1
    driver.get("https://books.toscrape.com/")
    2
    3
    books = driver.find_elements(By.CSS_SELECTOR, "article.product_pod")
    4
    records = []
    5
    6
    for book in books:
    7
    title = book.find_element(By.TAG_NAME, "h3").text
    8
    price = book.find_element(By.CLASS_NAME, "price_color").text
    9
    availability = book.find_element(By.CLASS_NAME, "instock").text.strip()
    10
    records.append({"Title": title, "Price": price, "Availability": availability})
    11
    12
    pd.DataFrame(records).to_csv("books.csv", index=False)

    💡 Why Selenium?

    Some e-commerce pages load data dynamically or require interactions (clicks, scrolling).

    Selenium mimics a real browser, ensuring that all content is captured exactly as seen by users.

    Best practices:

  • Use headless mode for faster scraping.
  • Use WebDriverWait for elements that load slowly.

  • 8. Adding Delays and Avoiding Detection

    Be polite and responsible when scraping.

    Randomize delays to mimic human browsing:

    PYTHON
    1
    import time, random
    2
    time.sleep(random.uniform(1, 3))

    Add headers if needed (though Selenium usually includes them automatically).

    Avoid overloading the target site — throttle requests and respect robots.txt.


    9. Headless Mode and Options

    Run Chrome invisibly in the background for performance and automation:

    PYTHON
    1
    from selenium.webdriver.chrome.options import Options
    2
    3
    options = Options()
    4
    options.add_argument("--headless")
    5
    options.add_argument("--disable-gpu")
    6
    7
    driver = webdriver.Chrome(service=service, options=options)

    This setup is ideal for cloud servers, CI/CD environments, or batch scraping jobs.


    10. When to Use FoxScrape Instead of Selenium

    For large-scale or cloud-based scraping, Selenium can become heavy.

    FoxScrape API offers a simpler solution — rendered HTML in one API call.

    PYTHON
    1
    import requests
    2
    3
    response = requests.get(
    4
    "https://www.foxscrape.com/api/v1",
    5
    params={"url": "https://books.toscrape.com/", "render_js": "true"}
    6
    )
    7
    print(response.text[:500])

    Why it’s better for scaling:

  • No drivers or browsers needed
  • Handles JavaScript automatically
  • Bypasses most blocking and proxy issues
  • Use Selenium for full browser automation, but prefer FoxScrape when you just need clean rendered HTML fast.


    11. Common Errors and Troubleshooting

    ErrorCauseFix
    NoSuchElementExceptionWrong selectorRecheck your XPath or CSS
    TimeoutExceptionElement loads slowlyUse WebDriverWait
    SessionNotCreatedExceptionDriver mismatchUpdate ChromeDriver
    Access DeniedBot protectionUse proxies or FoxScrape

    Example: Using WebDriverWait

    PYTHON
    1
    from selenium.webdriver.support.ui import WebDriverWait
    2
    from selenium.webdriver.support import expected_conditions as EC
    3
    4
    WebDriverWait(driver, 10).until(
    5
    EC.presence_of_element_located((By.CLASS_NAME, "text"))
    6
    )

    12. Conclusion

    Selenium is one of the most powerful tools for web scraping dynamic websites.

    You learned how to:

  • Install and configure Selenium
  • Load and navigate websites
  • Extract data and handle pagination
  • Save results and clean your dataset
  • ⚖️ Trade-offs:

    Pros: Full browser automation, handles JavaScript perfectly.

    Cons: Slower and more resource-intensive.

    When you need speed and scale — skip browsers and use FoxScrape instead.