Response Format

By default, the API will transparently return you the resource you want to scrape.

But you can do way more.

Downloading Pictures and Files

The API will transparently download images, PDF or anything that is not HTML.

We recommend downloading files with render_js=false.

There is a 2 MB limit per request.

Data extraction with AI (BETA)

If you want to extract specific information from a webpage using AI, you can use the ai_query and ai_selector parameters.

The ai_query parameter allows you to specify the information you want to extract, while the optional ai_selector parameter lets you focus the AI extraction on a specific part of the page.

ai_querystring""

The ai_query parameter allows you to specify the information you want to extract from the webpage using natural language. For example:

TEXT

1ai_query="price of the product"

This instructs the AI to find and extract the price of the product from the page content.

Cost: The AI extraction parameters (ai_query and ai_extract_rules) incur an additional 5 credits cost on top of the regular API cost. To speed up the process of your request we encourage you to use a relevant ai_selector value.

ai_selectorstring""

The ai_selector parameter is optional and allows you to specify a CSS selector to focus the AI extraction on a specific part of the page. This can help improve accuracy and reduce processing time. For example:

TEXT

1ai_selector="#product-details"

This tells the AI to only consider the content within the element with the ID "product-details" when extracting the information specified in the ai_query.

Using the ai_selector can help speed up the request by limiting the amount of content the AI needs to process.

Using both parameters together can provide more precise and efficient data extraction:

1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&ai_query=price+of+the+product&ai_selector=%23product-details"

ai_extract_rulesstringified json""

If you want to extract data from pages and don't want to parse the HTML on your side, you can add AI extraction rules to your API call.

The simplest way to use JSON rules is to use the following format:

JSON

1{"key_name" : "what you want to extract"}

If you wish to extract the title and a summary of some of our blog posts, you will just need to use those rules:

JSON

1{
2  "title": "title of the blog post",
3  "summary": "a short summary of the blog post"
4}

And this will be the JSON response:

JSON

1{
2  "title": "How to web scrape",
3  "summary": "We help you get better at web-scraping: detailed tutorial, case studies and writing by industry experts"
4}

Important: extraction rules are JSON formatted, and in order to pass them to a GET request, you need to stringify them.

We've just described the easiest and quickest way to use this feature. You can use more advanced options as described below:

JSON

1{
2  "name": {
3    "description": "the product name",
4    "type": "string"
5  },
6  "categories": {
7    "description": "all product categories",
8    "type": "list"
9  },
10  "price": {
11    "description": "the product price in dollars",
12    "type": "number"
13  },
14  "in_stock": {
15    "description": "whether the product is currently available",
16    "type": "boolean"
17  },
18  "shipping_info": {
19    "description": "shipping details including delivery time and cost",
20    "type": "item",
21    "output": {
22      "delivery_time": "estimated delivery in days",
23      "shipping_cost": "shipping cost in dollars"
24    }
25  },
26  "size": {
27    "description": "product size",
28    "type": "list",
29    "enum": ["XS", "S", "M", "L", "XL"]
30  }
31}

Do not hesitate to check out the full documentation here.

1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&extract_rules=%7B%22title%22%3A+%22title+of+the+blog+post%22%2C+%22summary%22%3A+%22a+5+sentences+summary+of+the+blog+post%22%7D"

extract_rulesstringified JSON""

If you want to extract data from pages and don't want to parse the HTML on your side, you can add extraction rules to your API call.

The simplest way to use JSON rules is to use the following format:

JSON

1{"key_name" : "css_or_xpath_selector"}

If you wish to extract the title, subtitle and intro of our blog, you will just need to use those rules:

JSON

1{
2  "title": "h1",
3  "subtitle": "#subtitle"
4}

And this will be the JSON response:

JSON

1{
2  "title": "The Foxscrape Blog",
3  "subtitle": "We help you get better at web-scraping: detailed tutorial, case studies and writing by industry experts"
4}

Important: extraction rules are JSON formatted, and in order to pass them to a GET request, you need to stringify them.

We've just described the easiest and quickest way to use this feature. If you want to read more about it, check out our full guide.

1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&extract_rules=%7B%22title%22%3A+%22h1%22%2C+%22subtitle%22%3A+%22%23subtitle%22%7D"

json_responseboolFalse

If you are planning to integrate Foxscrape with third-party tools that only accept JSON response, or want to intercept the response of some XHR / Ajax requests, you can send your API call with json_response=True.

1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&json_response=True"

The following is the received response when using this parameter:

JSON

1{
2  // Headers sent by the server
3  "headers": {
4    "Date": "Fri, 16 Apr 2021 15:03:54 GMT",
5    "Access-Control-Allow-Credentials": "true"
6  },
7  // Credit cost of your request
8  "cost": 1,
9  // Initial status code of the server
10  "initial-status-code": 200,
11  // Resolved URL (following redirection)
12  "resolved-url": "https://httpbin.org/",
13  // Type of the response "html" or "json" or "b64_bytes" for file, image, pdf,...
14  "type": "html",
15  // Content of the answer. Content will be base 64 encoded if is a file, image, pdf,...
16  "body": "<html>... </body>",
17  // base 64 encoded screenshot of the page, if screenshot=true is used
18  "screenshot": "b0918...aef",
19  // Cookies sent back by the server
20  "cookies": [
21    {
22      "name": "cookie_name",
23      "value": "cookie_value",
24      "domain": "test.com"
25    }
26  ],
27  // Results of the JS scenario "evaluate" instructions
28  "evaluate_results": [],
29  // Content and source of iframes in the page
30  "iframes": [
31    {
32      "content": "<html>... </body>",
33      "src": "https://site.com/iframe"
34    }
35  ],
36  // XHR / Ajax requests sent by the browser
37  "xhr": [
38    {
39      // URL
40      "url": "https://",
41      // status code of the server
42      "status_code": 200,
43      // Method of the request
44      "method": "POST",
45      // Headers of the XHR / Ajax request
46      "headers": {
47        "pragma": "no-cache"
48      },
49      // Response of the XHR / Ajax request
50      "body": "2d,x"
51    }
52  ],
53  // js_scenario detailed report ( only useful if using render_js=True and js_scenario=...)
54  "js_scenario_report": {
55    "task_executed": 1,
56    "task_failure": 0,
57    "task_success": 1,
58    "tasks": [
59      {
60        "duration": 3.042,
61        "params": 3000,
62        "success": true,
63        "task": "wait"
64      }
65    ],
66    "total_duration": 3.042
67  },
68  // Metadata / Schema data
69  "metadata": {
70    "microdata": {},
71    "json-ld": {}
72  }
73}

If the requested content is json, then the answers will look like this:

JSON

1{
2  // Headers sent by the server
3  "headers": {
4    "Date": "Fri, 16 Apr 2021 15:13:02 GMT",
5    "Access-Control-Allow-Credentials": "true"
6  },
7  // Credit cost of your request
8  "cost": 1,
9  // Initial status code of the server
10  "initial-status-code": 200,
11  // Resolved URL (following redirection)
12  "resolved-url": "https://httpbin.org/anything?json",
13  // Type of the response "html" of "json"
14  "type": "json",
15  // Content of the answer
16  "body": {
17    "args": {}
18  },
19  // Results of the JS scenario "evaluate" instructions
20  "evaluate_results": [],
21  // XHR / Ajax requests sent by the browser
22  "xhr": [],
23  // js_scenario detailed report ( only useful if using render_js=True and js_scenario=...)
24  "js_scenario_report": {},
25  // Metadata / Schema data
26  "metadata": {
27    "microdata": {},
28    "json-ld": {}
29  }
30}

return_page_sourcebooleanFalse

To have HTML returned by the server and unaltered by the browser (before the JavaScript execution), use return_page_source=true.

This parameter is unnecessary if JavaScript rendering is disabled.

1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&return_page_source=True"

return_page_markdownbooleanFalse

Return the main content of the page in markdown format, using return_page_markdown=true, which is easier to read for LLMs.

Content will be stripped of all HTML tags and unnecessary information.

1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&return_page_markdown=True"

return_page_textbooleanFalse

Return the main content of the page in plain text format, using return_page_text=true, which is easier to read for LLMs.

Content will be stripped of all HTML tags and unnecessary information.

1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&return_page_text=True"

scraping_configstring""

To use a pre-saved scraping configuration, use scraping_config=[Scraping Configuration Name].

This parameter allows you to apply a settings configuration for any website, without having to type the settings each time you send a request. And you can find more information about it here: Preconfigured Requests.

1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&scraping_config=Configuration-Name"