Web Scraping with PHP

Web scraping is one of the most powerful ways to collect structured data from the internet — and PHP remains a surprisingly capable tool for the job.
In this 2025 guide, we’ll explore how to perform web scraping with PHP, starting with native techniques and gradually moving to modern APIs. Along the way, we’ll build a fun mini-project: scraping famous birthdays from popular websites like Wikipedia and IMDb.
We’ll begin by learning core PHP scraping methods (like cURL, regex, DOM, and XPath) and end with a truly scalable solution using FoxScrape — a modern web scraping API that handles JavaScript rendering, proxy rotation, and anti-bot protection automatically.
1. PHP Web Scraping Libraries
Before diving into code, let’s explore some of the most popular PHP scraping tools and frameworks available in 2025.
| Library / Framework | Purpose | Notes |
|---|---|---|
| Guzzle | Modern HTTP client | Great for sending requests and managing concurrency. |
| Goutte / Symfony HttpBrowser | Crawling and DOM parsing | Built on top of BrowserKit + DomCrawler. |
| Simple HTML DOM Parser / DiDOM / phpQuery / hQuery | HTML parsers | Parse HTML easily using CSS selectors. |
| Php-webdriver / Panther / Puphpeteer | Browser automation | Ideal for scraping JavaScript-heavy pages. |
| Roach PHP / PHP-Spider | Full scraping frameworks | Similar to Python’s Scrapy. |
| Embed / Httpful / Chrome PHP | Specialized tools | Handle media embedding, simplified HTTP, or Chrome control. |
| Crawler Detect | Detect bot user-agents | Avoid scraping blocks. |
Each tool fits a different use case — from lightweight scraping to full browser automation.
2. Birthday Scraping Mini-Project 🎂
Let’s make this practical.
We’ll build a small PHP project that scrapes lists of famous birthdays from Wikipedia and IMDb.
Our Goal
| Site | Target | Output Example |
|---|---|---|
| Wikipedia | Births section | "November 10 – Miranda Lambert (1983), American singer" |
| IMDb | Actor birthdays | "Josh Peck (1986), Actor, USA" |
3. Raw HTTP Requests (Low-Level Approach)
At its core, scraping is just sending an HTTP request and reading the response.
Here’s how to do that in PHP using native functions like fsockopen() and cURL.
1<?php
2// Using cURL to fetch raw HTML
3$url = "https://en.wikipedia.org/wiki/November_10";
4$ch = curl_init($url);
5curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
6$response = curl_exec($ch);
7curl_close($ch);
8
9echo substr($response, 0, 500); // Print first 500 chars
10?>
11This gives us raw HTML — the same as if you “view source” in your browser.
It’s not elegant yet, but understanding this low-level approach helps you appreciate what libraries like Guzzle and FoxScrape do for you behind the scenes.
4. Scraping with Strings & Regex (Wikipedia Example)
Now let’s try to extract some real data — like the “Births” section on Wikipedia.
1<?php
2$html = file_get_contents("https://en.wikipedia.org/wiki/November_10");
3
4// Match items like: <li>1983 – Miranda Lambert, American singer-songwriter</li>
5preg_match_all('/<li>(\d{4}) – (.+?)<\/li>/', $html, $matches);
6
7foreach ($matches[0] as $i => $line) {
8 echo "{$matches[1][$i]} - {$matches[2][$i]}\n";
9}
10?>
11⚠️ Regex scraping works, but it’s fragile.
If the HTML structure changes, your code breaks. For production scraping, always use DOM parsing.
5. Scraping with Guzzle, DOM, and XPath (IMDb Example)
Now, let’s upgrade our scraper using Guzzle and DOMDocument.
1<?php
2require 'vendor/autoload.php';
3use GuzzleHttp\Client;
4
5$client = new Client();
6$response = $client->get('https://www.imdb.com/search/name/?birth_monthday=11-10');
7$html = (string) $response->getBody();
8
9$dom = new DOMDocument();
10@$dom->loadHTML($html);
11$xpath = new DOMXPath($dom);
12
13$nodes = $xpath->query('//h3[@class="lister-item-header"]/a');
14
15foreach ($nodes as $node) {
16 echo $node->textContent . "\n";
17}
18?>
19This gives you actor names from IMDb’s “Born Today” page.
🧩 Challenge: IMDb uses JavaScript and may block bots.
To scrape reliably, we need an API that handles rendering, proxy rotation, and anti-bot detection — that’s where FoxScrape shines.
6. Scraping IMDb with FoxScrape API 🚀
Now let’s replace all that complexity with a single API call using FoxScrape.
FoxScrape handles:
Example 1: Simple XPath Scraper
1<?php
2$endpoint = "https://foxscrape.com/api/v1";
3$url = "https://www.imdb.com/search/name/?birth_monthday=11-10";
4
5$payload = [
6 "url" => $url,
7 "render_js" => true,
8 "extract" => [
9 "actors" => [
10 "selector" => "//h3[@class='lister-item-header']/a",
11 "type" => "text"
12 ]
13 ]
14];
15
16$ch = curl_init("$endpoint/scrape");
17curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($payload));
18curl_setopt($ch, CURLOPT_HTTPHEADER, ['Content-Type: application/json']);
19curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
20$response = curl_exec($ch);
21curl_close($ch);
22
23echo $response;
24?>
25✅ Output Example:
1{
2 "actors": [
3 "Josh Peck",
4 "Ellen Pompeo",
5 "Brittany Murphy"
6 ]
7}
8Example 2: AI-Powered Extraction
No need to write selectors — FoxScrape can auto-detect structured data:
1$payload = [
2 "url" => $url,
3 "ai_extract" => true
4];
5🦊 With FoxScrape, you get clean, structured data in one line — no dealing with proxies, captchas, or HTML parsing.
7. Goutte / Symfony Example
For more control, you can use Symfony’s scraping components like DomCrawler and CssSelector.
1<?php
2require 'vendor/autoload.php';
3use Goutte\Client;
4
5$client = new Client();
6$crawler = $client->request('GET', 'https://www.imdb.com/search/name/?birth_monthday=11-10');
7
8$crawler->filter('.lister-item-header a')->each(function ($node) {
9 echo $node->text() . PHP_EOL;
10});
11?>
12Goutte makes your code cleaner, but it still can’t handle dynamic content or anti-bot systems — again, FoxScrape solves both effortlessly.
8. Headless Browsers (Dynamic Content)
Sites that load data via JavaScript need headless browsers.
You can use Symfony Panther or Puphpeteer:
1<?php
2require 'vendor/autoload.php';
3use Symfony\Component\Panther\Client;
4
5$client = Client::createChromeClient();
6$crawler = $client->request('GET', 'https://www.imdb.com/search/name/?birth_monthday=11-10');
7
8$crawler->filter('.lister-item-header a')->each(function ($node) {
9 echo $node->text() . PHP_EOL;
10});
11?>
12While this works, headless browsers are slow and resource-heavy — and still get blocked easily.
That’s why developers increasingly use APIs like FoxScrape to scale safely and efficiently.
9. Summary & Optimization Ideas
We’ve covered a full spectrum — from manual HTTP to full browser scraping.
Here’s how they compare:
| Method | Speed | Handles JS | Avoids Blocks | Best For |
|---|---|---|---|---|
| cURL + Regex | 🚀 Fast | ❌ No | ❌ No | Basic HTML pages |
| Guzzle + DOM | ⚡ Fast | ❌ No | ⚠️ Partial | Static pages |
| Headless Browser | 🐢 Slow | ✅ Yes | ⚠️ Limited | Dynamic pages |
| FoxScrape API | ⚡⚡ Fast | ✅ Yes | ✅ Yes | Scalable, production scraping |
Optimization Tips
10. Conclusion
PHP offers multiple levels of scraping sophistication — from file_get_contents() to full browser automation.
But as websites grow smarter and anti-bot systems evolve, manual scraping gets harder.
That’s why modern developers use FoxScrape.
🦊 FoxScrape lets you focus on data, not infrastructure.
With built-in IP rotation, JS rendering, and AI-powered extraction, you can scrape at scale with one simple API call.
💡 Try FoxScrape for Free
👉 Scrape smarter, not harder.
Sign up for FoxScrape and start extracting structured data instantly.
1curl -X POST https://foxscrape.com/api/v1/scrape \
2 -H "Content-Type: application/json" \
3 -d '{"url":"https://www.imdb.com/search/name/?birth_monthday=11-10"}'
4⚡ Start scraping in seconds — no proxies, no headless browsers, no headaches.
Further Reading

Web Scraping with Perl
Perl has long been a favorite language for text processing and automation. Its rich ecosystem of libraries makes it easy to scrape websites, parse ...

How To Scrape Website: A Comprehensive Guide
Web scraping is a powerful technique that allows you to extract data from websites automatically. Whether you need to gather information for resear...

Web Scraping Without Getting Blocked
Web scraping is the automated process of extracting data from websites by parsing HTML and other web content. It's a powerful technique used by bus...