Web scraping means collecting information from websites automatically using code.

While libraries like BeautifulSoup or requests are perfect for simple pages, they struggle when websites rely on JavaScript to load data dynamically — which is where Selenium shines.

Selenium is a browser automation framework that simulates real user behavior: clicking, scrolling, and interacting with elements — just like a human would. It’s primarily used for testing web apps, but it’s also a powerful tool for scraping modern, dynamic websites.

In this tutorial, you’ll learn:

How to set up Selenium for Python

How to launch and control a browser

How to extract and save data

How to handle pagination

When to switch to an API-based approach like FoxScrape

2. Setting Up Selenium

To get started, you’ll need Python (3.10 or higher) and a compatible browser driver.

Install Selenium

BASH

1pip install selenium

Install a WebDriver

Each browser needs its own driver:

Chrome: chromedriver.chromium.org

Firefox: geckodriver

Edge: msedgedriver

Find your Chrome version → download the matching driver → place it in your PATH or specify its path manually.

Example Setup

PYTHON

1from selenium import webdriver
2from selenium.webdriver.chrome.service import Service
3from selenium.webdriver.common.by import By
4import pandas as pd
5
6service = Service("/path/to/chromedriver")
7driver = webdriver.Chrome(service=service)

✅ You can use Firefox or Edge by replacing webdriver.Chrome() accordingly.

3. Opening a Website with Selenium

Let’s load a simple demo site and verify the title:

PYTHON

1driver.get("https://quotes.toscrape.com/")
2print("Page title:", driver.title)

Selenium fully renders the page (including JavaScript content).

You can inspect the raw HTML with:

PYTHON

1print(driver.page_source[:500])

This is extremely useful when debugging or confirming whether data is rendered server-side or via JavaScript.

4. Locating and Extracting Elements

You can target elements using:

By.CLASS_NAME

By.CSS_SELECTOR

By.XPATH

By.ID or By.TAG_NAME

Example: Extracting Quotes and Authors

PYTHON

1quotes = driver.find_elements(By.CLASS_NAME, "text")
2authors = driver.find_elements(By.CLASS_NAME, "author")
3
4for q, a in zip(quotes, authors):
5    print(f"{q.text} — {a.text}")

Using CSS Selectors

PYTHON

1tags = driver.find_elements(By.CSS_SELECTOR, ".tags .tag")
2for t in tags[:5]:
3    print("Tag:", t.text)

🧭 Tip:

Use find_elements (plural) to capture multiple results.

Use DevTools (Right-click → Inspect) to identify class names or XPaths accurately.

By.CSS_SELECTOR is often the cleanest and most efficient approach.

5. Scraping Multiple Pages

Let’s extend the example to scrape all pages:

PYTHON

1import time
2
3all_quotes = []
4
5while True:
6    quotes = driver.find_elements(By.CLASS_NAME, "text")
7    authors = driver.find_elements(By.CLASS_NAME, "author")
8
9    for q, a in zip(quotes, authors):
10        all_quotes.append({"quote": q.text, "author": a.text})
11
12    try:
13        next_btn = driver.find_element(By.XPATH, '//li[@class="next"]/a')
14        next_btn.click()
15        time.sleep(1)
16    except:
17        break

This code loops until there’s no “Next” button left.

Always include short delays (time.sleep(1)) to prevent overwhelming the website.

6. Saving Data to CSV or DataFrame

Organize and store your results neatly:

PYTHON

1df = pd.DataFrame(all_quotes)
2df.to_csv("selenium_quotes.csv", index=False)
3print("Saved", len(df), "quotes.")

✅ Clean your text (.strip()) and verify your column names before exporting.

7. Advanced Example — Scraping Product Data

Let’s use a fictional e-commerce site to simulate a real-world case.

PYTHON

1driver.get("https://books.toscrape.com/")
2
3books = driver.find_elements(By.CSS_SELECTOR, "article.product_pod")
4records = []
5
6for book in books:
7    title = book.find_element(By.TAG_NAME, "h3").text
8    price = book.find_element(By.CLASS_NAME, "price_color").text
9    availability = book.find_element(By.CLASS_NAME, "instock").text.strip()
10    records.append({"Title": title, "Price": price, "Availability": availability})
11
12pd.DataFrame(records).to_csv("books.csv", index=False)

💡 Why Selenium?

Some e-commerce pages load data dynamically or require interactions (clicks, scrolling).

Selenium mimics a real browser, ensuring that all content is captured exactly as seen by users.

Best practices:

Use headless mode for faster scraping.

Use WebDriverWait for elements that load slowly.

8. Adding Delays and Avoiding Detection

Be polite and responsible when scraping.

Randomize delays to mimic human browsing:

PYTHON

1import time, random
2time.sleep(random.uniform(1, 3))

Add headers if needed (though Selenium usually includes them automatically).

Avoid overloading the target site — throttle requests and respect robots.txt.

9. Headless Mode and Options

Run Chrome invisibly in the background for performance and automation:

PYTHON

1from selenium.webdriver.chrome.options import Options
2
3options = Options()
4options.add_argument("--headless")
5options.add_argument("--disable-gpu")
6
7driver = webdriver.Chrome(service=service, options=options)

This setup is ideal for cloud servers, CI/CD environments, or batch scraping jobs.

10. When to Use FoxScrape Instead of Selenium

For large-scale or cloud-based scraping, Selenium can become heavy.

FoxScrape API offers a simpler solution — rendered HTML in one API call.

PYTHON

1import requests
2
3response = requests.get(
4    "https://www.foxscrape.com/api/v1",
5    params={"url": "https://books.toscrape.com/", "render_js": "true"}
6)
7print(response.text[:500])

✅ Why it’s better for scaling:

No drivers or browsers needed

Handles JavaScript automatically

Bypasses most blocking and proxy issues

Use Selenium for full browser automation, but prefer FoxScrape when you just need clean rendered HTML fast.

11. Common Errors and Troubleshooting

Error	Cause	Fix
`NoSuchElementException`	Wrong selector	Recheck your XPath or CSS
`TimeoutException`	Element loads slowly	Use `WebDriverWait`
`SessionNotCreatedException`	Driver mismatch	Update ChromeDriver
`Access Denied`	Bot protection	Use proxies or FoxScrape

Example: Using WebDriverWait

PYTHON

1from selenium.webdriver.support.ui import WebDriverWait
2from selenium.webdriver.support import expected_conditions as EC
3
4WebDriverWait(driver, 10).until(
5    EC.presence_of_element_located((By.CLASS_NAME, "text"))
6)

12. Conclusion

Selenium is one of the most powerful tools for web scraping dynamic websites.

You learned how to:

Install and configure Selenium

Load and navigate websites

Extract data and handle pagination

Save results and clean your dataset

⚖️ Trade-offs:

Pros: Full browser automation, handles JavaScript perfectly.

Cons: Slower and more resource-intensive.

When you need speed and scale — skip browsers and use FoxScrape instead.

Selenium Web Scraping (Step-by-Step Guide)

2. Setting Up Selenium

Install Selenium

Install a WebDriver

Example Setup

3. Opening a Website with Selenium

4. Locating and Extracting Elements

Example: Extracting Quotes and Authors

Using CSS Selectors

5. Scraping Multiple Pages

6. Saving Data to CSV or DataFrame

7. Advanced Example — Scraping Product Data

8. Adding Delays and Avoiding Detection

9. Headless Mode and Options

10. When to Use FoxScrape Instead of Selenium

11. Common Errors and Troubleshooting

12. Conclusion

⚖️ Trade-offs:

Further Reading

How to Web Scrape a Table in Python (Step-by-Step Guide)

How to Scrape Data from a Website

A Complete Guide to Web Scraping in R