How to Scrape Data from a Website

Web scraping means automatically collecting information from web pages using code — no manual copying, no spreadsheets. Whether you’re tracking prices, analyzing markets, or gathering research data, Python makes it surprisingly easy to turn websites into structured datasets.
In this guide, you’ll learn how to scrape data from a website using Python, step by step.
We’ll explore:
By the end, you’ll know how to extract data, clean it, and save it — all using Python.
🌍 Why Web Scraping Matters
Web scraping powers countless real-world applications:
Used responsibly, web scraping helps developers and analysts make data-driven decisions faster and at scale.
⚖️ Important: Always scrape only public, non-sensitive data, and follow the website’s terms of service. Avoid private or restricted information.
🔍 Understanding How Websites Work
Before you can scrape data, you need to know what you’re looking at.
Every website is built from HTML — a structured document containing elements like <div>, <p>, <span>, <table>, and so on. These tags define where data lives.
Here’s a simple example:
1<div class="product">2<h2>Blue T-shirt</h2>3<span class="price">$15.99</span>4</div>
When you scrape data, your goal is to read this structure and extract the parts you need — such as product titles, prices, or links.
To find the right elements:
<div>, <h2>, etc.) and class (e.g., "product").This inspection process is the secret to writing accurate scrapers.
⚙️ Setting Up Your Python Environment
Before you start coding, make sure your environment is ready.
🧰 You’ll need:
📦 Install required packages:
1pip install requests beautifulsoup4 pandas lxml
Optional (for advanced scraping):
1pip install selenium
🧩 What these tools do:
| Package | Purpose |
|---|---|
requests | Downloads web pages (HTML). |
BeautifulSoup | Parses and extracts content from HTML. |
pandas | Cleans and structures scraped data. |
selenium | Automates browsers to load dynamic content. |
🧾 Scraping Data from a Static Website
Let’s start with the simplest and most common scenario — scraping a static webpage.
Imagine a product listing page like this:
1<div class="product">2<h2>Blue T-shirt</h2>3<span class="price">$15.99</span>4</div>5<div class="product">6<h2>Red Hoodie</h2>7<span class="price">$29.99</span>8</div>
We can extract both titles and prices using requests and BeautifulSoup.
🧑💻 Example Code
1import requests2from bs4 import BeautifulSoup34url = "https://example.com/products"5html = requests.get(url).text6soup = BeautifulSoup(html, "lxml")78items = soup.find_all("div", class_="product")910for item in items:11title = item.find("h2").text.strip()12price = item.find("span", class_="price").text.strip()13print(title, price)
🧩 How It Works:
requests.get(url) → Fetches the raw HTML from the page.BeautifulSoup(html, "lxml") → Parses the HTML.find_all("div", class_="product") → Finds all product containers.item.find("h2") and .find("span") → Extract title and price text.Output:
1Blue T-shirt $15.992Red Hoodie $29.99
✅ Tip: If your script doesn’t find any results, double-check the class name in your browser’s “Inspect” view — even a small mismatch breaks the selector.
💾 Saving and Structuring the Data
Extracting data is only half the job — you’ll usually want to save it for later use.
1import pandas as pd23data = []4for item in items:5title = item.find("h2").text.strip()6price = item.find("span", class_="price").text.strip()7data.append({"Title": title, "Price": price})89df = pd.DataFrame(data)10df.to_csv("products.csv", index=False)1112print("Data saved to products.csv")
Output file:
1Title,Price2Blue T-shirt,$15.993Red Hoodie,$29.99
You can also export data to:
1df.to_excel("products.xlsx", index=False)2df.to_json("products.json", orient="records")
🧩 Handling Common Issues
When scraping, not everything goes smoothly. Here’s how to fix the usual culprits:
| Problem | Cause | Solution |
|---|---|---|
| Empty data | Page uses JavaScript | Use Selenium or FoxScrape |
| HTTP 403 | Site blocks bots | Add headers or rotate proxies |
| Missing values | Wrong selector | Recheck HTML structure |
| Slow scraping | Too many requests | Add delays or batching |
| Encoding error | Non-UTF-8 content | Set response.encoding = 'utf-8' |
Adding a Custom User-Agent
Many sites block requests without a browser signature.
1headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"}2html = requests.get(url, headers=headers).text
This simple trick avoids basic bot blocks.
⚡ Scraping Dynamic Websites (JavaScript-Rendered Data)
Some sites load data dynamically with JavaScript — meaning the data isn’t present in the initial HTML.
If you inspect the page source and don’t see the data, but it appears in the browser, you’re dealing with a dynamic page.
Option 1: Selenium (Browser Automation)
Selenium opens a real browser window, loads the page, runs scripts, and lets you access the fully rendered HTML.
1from selenium import webdriver2from bs4 import BeautifulSoup3import time45driver = webdriver.Chrome()6driver.get("https://example.com/products")7time.sleep(3) # wait for JS to load89html = driver.page_source10soup = BeautifulSoup(html, "lxml")1112items = soup.find_all("div", class_="product")13for item in items:14print(item.text)1516driver.quit()
✅ Pros: Works for most dynamic pages.
⚠️ Cons: Slow, requires browser setup, not ideal for large-scale scraping.
Option 2: Using FoxScrape API (Simple & Scalable)
If you don’t want to deal with browser automation or proxy headaches, the FoxScrape API is a modern alternative.
It acts like a cloud browser, executes JavaScript, rotates IPs, and returns rendered HTML in one API call.
1import requests2from bs4 import BeautifulSoup34response = requests.get(5"https://www.foxscrape.com/api/v1",6params={7"url": "https://example.com/products",8"render_js": "true"9}10)1112html = response.text13soup = BeautifulSoup(html, "lxml")1415products = soup.find_all("div", class_="product")16for p in products:17print(p.text)
Why it’s useful:
If you’re scraping hundreds of pages or facing anti-bot systems, this approach saves hours of maintenance time.
🧹 Cleaning, Transforming, and Exporting Data
Once your data is loaded into pandas, you can easily clean it:
1# Example cleaning operations2df["Price"] = df["Price"].str.replace("$", "").astype(float)3df = df.drop_duplicates()4df = df.fillna("N/A")56# Export7df.to_csv("cleaned_products.csv", index=False)8df.to_excel("cleaned_products.xlsx", index=False)9df.to_json("cleaned_products.json", orient="records")
This turns raw HTML text into a dataset ready for analysis or visualization.
🧭 Best Practices for Ethical Scraping
Responsible scraping keeps your scripts efficient and compliant.
✅ Do:
robots.txttime.sleep(1) between requests❌ Don’t:
🦊 Pro Tip: FoxScrape automatically respects rate limits and rotates proxies — a simple way to stay safe while scraping at scale.
🧩 Advanced Example: Scraping and Analyzing Data Together
Here’s a practical mini-project: Scrape a site’s product prices and analyze them with pandas.
1import requests2from bs4 import BeautifulSoup3import pandas as pd45url = "https://example.com/products"6html = requests.get(url).text7soup = BeautifulSoup(html, "lxml")89data = []10for item in soup.find_all("div", class_="product"):11title = item.find("h2").text.strip()12price = float(item.find("span", class_="price").text.strip().replace("$", ""))13data.append({"Title": title, "Price": price})1415df = pd.DataFrame(data)16print(df.describe())
Output:
1Price2count 12.00003mean 28.49004min 10.99005max 49.9900
You’ve just gone from raw HTML to usable statistics — all with under 30 lines of Python.
🏁 Conclusion
Let’s recap the three main approaches:
| Type | Best Tool | Description |
|---|---|---|
| Static pages | BeautifulSoup + requests | Simple, fast, and lightweight |
| JavaScript-rendered | Selenium | Reliable but slower |
| Protected or dynamic | FoxScrape API | Cloud-powered, scalable, effortless |
With these methods, you can extract almost any data — product listings, articles, prices, tables, reviews — from any public website.
The key is to start small, understand your targets, and scale responsibly.
⚡ Next step: Try scraping your favorite site.
For complex pages, skip browser setup — just send the URL to
https://www.foxscrape.com/api/v1?url=<your-site>&render_js=true
and get clean, rendered HTML instantly.
Happy scraping — ethically, efficiently, and with a little help from 🦊 FoxScrape.
Further Reading

How to Web Scrape a Table in Python (Step-by-Step Guide)
Web scraping is one of the most practical skills a Python developer can learn. From price monitoring to academic research, tables are everywhere on...

Web Scraping With JavaScript and Node.js
Web scraping has become an essential skill for developers who need to extract data from websites efficiently. Whether you're building a price compa...

Python Web Scraping: Full Tutorial With Examples
Hey there, data enthusiast! 👋 Welcome to your ultimate guide to web scraping with Python. Whether you're building a price comparison tool, gatheri...