Data Extraction

Extract data with CSS or XPath selectors

💡 Important:

This page explains how to use a specific feature of our main web scraping API! If you are not yet familiar with FoxScrape web scraping API, you can read the documentation here.

Basic usage

If you want to extract data from pages and don't want to parse the HTML on your side, you can add extraction rules to your API call.

The simplest way to use extraction rules is to use the following format:

JSON

1{"key_name" : "css_or_xpath_selector"}

For example, if you wish to extract the title and subtitle of our blog, you will need to use those rules:

JSON

1{
2  "title": "h1",
3  "subtitle": "#subtitle"
4}

And this will be the JSON response:

JSON

1{
2  "title": "The FoxScrape Blog",
3  "subtitle": "We help you get better at web-scraping: detailed tutorial, case studies and writing by industry experts"
4}

You can also extract HTML attribute by using the @ prefix.

Meaning that if you want to extract some link from the page, you can use the following rule:

JSON

1{"link": "a@href"}

Important: extraction rules are JSON formatted, and in order to pass them to a GET request, you need to stringify them.

Here is how to extract the above information in your favorite language:

Node.js

JAVASCRIPT

1// Using Axios
2const axios = require('axios');
3
4axios.get('https://www.foxscrape.com/api/v1', {
5    params: {
6        'api_key': 'YOUR_API_KEY',
7        'url': 'https://www.foxscrape.com/blog',
8        'extract_rules': '{"title":"h1","subtitle":"#subtitle"}',
9    }
10}).then(function (response) {
11    // handle success
12    console.log(response);
13})

Please note that using:

JSON

1{
2  "title": "h1",
3  "link": "a@href"
4}

Is the same as using:

JSON

1{
2  "title": {
3    "selector": "h1",
4    "output": "text",
5    "type": "item"
6  },
7  "link": {
8    "selector": "a",
9    "output": "@href",
10    "type": "item"
11  }
12}

Below are more details about all those different options.