How to Web Scrape a Table in Python (Step-by-Step Guide)

Web scraping is one of the most practical skills a Python developer can learn. From price monitoring to academic research, tables are everywhere on the web — and being able to extract them cleanly can save you hours of manual work.
In this guide, you’ll learn how to scrape HTML tables in Python, step by step.
We’ll cover:
By the end, you’ll be able to turn any web table — even those hidden behind JavaScript — into a clean, structured dataset.
🧠 What Is Web Scraping (and Why Tables?)
Web scraping means programmatically collecting data from websites.
Tables are particularly useful because they often hold structured information — like financial data, product lists, or rankings.
Common examples include:
⚖️ Always scrape publicly available data and respect each site’s robots.txt.
Responsible scraping is key to maintaining ethical, legal data collection practices.
⚙️ Setting Up Your Python Environment
Before scraping, make sure you have Python 3.10+ installed and a code editor (like VS Code or PyCharm).
Install the following packages via pip:
1pip install requests beautifulsoup4 pandas lxml
Optional tools:
That’s all you need to start.
🧱 Understanding HTML Tables
HTML tables are made up of nested tags:
<table> — the main container<tr> — a table row<th> — a header cell<td> — a data cellHere’s a simple example:
1<table>2<tr><th>Name</th><th>Age</th></tr>3<tr><td>Alice</td><td>25</td></tr>4<tr><td>Bob</td><td>30</td></tr>5</table>
Before writing code, it’s always good to inspect the table’s HTML structure in your browser (Right-click → Inspect Element).
You’ll need the table’s class, ID, or other identifiers for accurate extraction.
🥣 Scraping Static Tables with BeautifulSoup
Let’s start with a real example — scraping the Wikipedia list of countries by GDP (nominal).
This is a static page (its data is already present in the HTML), making it ideal for BeautifulSoup.
1import requests2from bs4 import BeautifulSoup34url = "https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)"5html = requests.get(url).text6soup = BeautifulSoup(html, "lxml")78# Locate the first table with class 'wikitable'9table = soup.find("table", {"class": "wikitable"})10rows = table.find_all("tr")1112data = []13for row in rows:14cols = [td.text.strip() for td in row.find_all(["th", "td"])]15data.append(cols)1617# Display the first 5 rows18for row in data[:5]:19print(row)
Output (truncated):
1['Country/Territory', 'GDP(US$million)', 'Year']2['United States', '26,949,643', '2024']3['China', '17,821,771', '2024']4['Germany', '4,684,484', '2024']5['Japan', '4,231,141', '2024']
Converting to a DataFrame
With a few lines of pandas, you can turn it into a structured dataset:
1import pandas as pd23df = pd.DataFrame(data[1:], columns=data[0])4print(df.head())
Output:
1Country/Territory GDP(US$million) Year20 United States 26,949,643 202431 China 17,821,771 202442 Germany 4,684,484 202453 Japan 4,231,141 2024
That’s the power of BeautifulSoup — flexible, explicit, and reliable for static HTML.
🧮 Extracting Tables Automatically with Pandas
For simpler pages, pandas can scrape tables in just one line.
1import pandas as pd23url = "https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)"4tables = pd.read_html(url)56print(f"Found {len(tables)} tables")7df = tables[0]8print(df.head())
Pandas uses the lxml or html5lib parsers internally to read <table> elements automatically.
This makes it perfect for fast analysis workflows.
⚠️ Note: pd.read_html() only works for static HTML.
It won’t load content that’s rendered with JavaScript after the page loads.
⚡ Scraping Dynamic or JavaScript-Rendered Tables
Here’s where things get tricky.
Many modern websites — especially finance or analytics dashboards — load their tables after the page has loaded, using JavaScript or AJAX.
If you run a simple requests.get() on these pages, you’ll get an empty <table> or no data at all.
There are two main ways to handle this:
🧭 Option 1: Use Selenium (Manual Browser Automation)
Selenium can launch a headless browser (like Chrome or Firefox), render JavaScript, and then let you extract the final HTML.
1from selenium import webdriver2from bs4 import BeautifulSoup3import time45driver = webdriver.Chrome()6driver.get("https://example.com/dynamic-table")7time.sleep(3) # wait for JS to load8html = driver.page_source910soup = BeautifulSoup(html, "lxml")11table = soup.find("table")12print(table.prettify())1314driver.quit()
This works — but it’s slow, requires local browser drivers, and doesn’t scale easily.
🦊 Option 2: Use FoxScrape API (Simpler, Scalable, and Faster)
If you’d rather avoid running browsers and handling proxies, the FoxScrape API gives you a faster alternative.
It runs a headless browser in the cloud, executes JavaScript, rotates IPs, and returns the fully rendered HTML — all from a single HTTP request.
1import requests2from bs4 import BeautifulSoup34response = requests.get(5"https://www.foxscrape.com/api/v1",6params={7"url": "https://example.com/dynamic-table",8"render_js": "true"9}10)1112html = response.text13soup = BeautifulSoup(html, "lxml")14table = soup.find("table")1516print(table.prettify())
You can then use the same BeautifulSoup or pandas logic to parse and clean the data.
Why this helps:
For developers scraping large datasets or multiple pages, this approach is significantly faster and more reliable.
🧹 Cleaning and Exporting Your Data
Once you have your data in a pandas DataFrame, you can clean and export it easily.
1# Clean column names and fill missing values2df.columns = [c.strip() for c in df.columns]3df = df.fillna("N/A")45# Export to CSV6df.to_csv("gdp_data.csv", index=False)78# Optional: export to JSON or Excel9df.to_json("gdp_data.json", orient="records")10df.to_excel("gdp_data.xlsx", index=False)
This lets you take scraped table data directly into your data analysis or visualization pipelines.
🐛 Common Errors & Troubleshooting
Here are the most common issues (and fixes):
| Problem | Cause | Solution |
|---|---|---|
UnicodeDecodeError | Encoding mismatch | Add response.encoding = 'utf-8' |
| Empty table | JavaScript rendering | Use Selenium or FoxScrape |
| Missing headers | Nested HTML | Manually extract <th> elements |
| CAPTCHA or 403 | Anti-bot protection | Rotate proxies or use FoxScrape |
| Slow scraping | Too many requests | Add time.sleep() or cache results |
🧭 Best Practices & Ethical Guidelines
A good scraper doesn’t just work — it’s also responsible.
Do:
robots.txtUser-AgentDon’t:
🦊 FoxScrape already manages rate limiting and proxy rotation, so you can focus on extracting and analyzing data — not fighting anti-bot systems.
🏁 Conclusion
Let’s recap what you’ve learned:
| Goal | Best Tool |
|---|---|
| Static tables | BeautifulSoup |
| Quick one-liner parsing | pandas |
| JavaScript-rendered tables | FoxScrape API |
BeautifulSoup gives you precision and control.
pandas provides speed and simplicity.
And FoxScrape makes complex, dynamic scraping effortless — without browsers, proxies, or sleepless nights.
So next time you need to scrape a table in Python, start simple, then scale smart.
🚀 Try It Yourself
Pick any table online — a Wikipedia list, a financial chart, or a dynamic table — and try scraping it using the methods above.
If it’s static, BeautifulSoup or pandas will do the trick.
If it’s dynamic or protected, send the URL to:
1https://www.foxscrape.com/api/v1?url=<your-url>&render_js=true
You’ll get the rendered HTML instantly — ready to parse, clean, and export.
Happy scraping — responsibly, efficiently, and with a little help from 🦊 FoxScrape.
Further Reading

How to Scrape Data from a Website
Web scraping means automatically collecting information from web pages using code — no manual copying, no spreadsheets. Whether you’re tracking pri...

Web Scraping with Perl
Perl has long been a favorite language for text processing and automation. Its rich ecosystem of libraries makes it easy to scrape websites, parse ...

Web Scraping Without Getting Blocked
Web scraping is the automated process of extracting data from websites by parsing HTML and other web content. It's a powerful technique used by bus...