Response Format
By default, the API will transparently return you the resource you want to scrape.
But you can do way more.
Downloading Pictures and Files
The API will transparently download images, PDF or anything that is not HTML.
We recommend downloading files with render_js=false.
There is a 2 MB limit per request.
Data extraction with AI (BETA)
If you want to extract specific information from a webpage using AI, you can use the ai_query and ai_selector parameters.
The ai_query parameter allows you to specify the information you want to extract, while the optional ai_selector parameter lets you focus the AI extraction on a specific part of the page.
ai_querystring""
The ai_query parameter allows you to specify the information you want to extract from the webpage using natural language. For example:
1ai_query="price of the product"
This instructs the AI to find and extract the price of the product from the page content.
Cost: The AI extraction parameters (ai_query and ai_extract_rules) incur an additional 5 credits cost on top of the regular API cost. To speed up the process of your request we encourage you to use a relevant ai_selector value.
ai_selectorstring""
The ai_selector parameter is optional and allows you to specify a CSS selector to focus the AI extraction on a specific part of the page. This can help improve accuracy and reduce processing time. For example:
1ai_selector="#product-details"
This tells the AI to only consider the content within the element with the ID "product-details" when extracting the information specified in the ai_query.
Using the ai_selector can help speed up the request by limiting the amount of content the AI needs to process.
Using both parameters together can provide more precise and efficient data extraction:
1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&ai_query=price+of+the+product&ai_selector=%23product-details"
ai_extract_rulesstringified json""
If you want to extract data from pages and don't want to parse the HTML on your side, you can add AI extraction rules to your API call.
The simplest way to use JSON rules is to use the following format:
1{"key_name" : "what you want to extract"}
If you wish to extract the title and a summary of some of our blog posts, you will just need to use those rules:
1{2"title": "title of the blog post",3"summary": "a short summary of the blog post"4}
And this will be the JSON response:
1{2"title": "How to web scrape",3"summary": "We help you get better at web-scraping: detailed tutorial, case studies and writing by industry experts"4}
Important: extraction rules are JSON formatted, and in order to pass them to a GET request, you need to stringify them.
We've just described the easiest and quickest way to use this feature. You can use more advanced options as described below:
1{2"name": {3"description": "the product name",4"type": "string"5},6"categories": {7"description": "all product categories",8"type": "list"9},10"price": {11"description": "the product price in dollars",12"type": "number"13},14"in_stock": {15"description": "whether the product is currently available",16"type": "boolean"17},18"shipping_info": {19"description": "shipping details including delivery time and cost",20"type": "item",21"output": {22"delivery_time": "estimated delivery in days",23"shipping_cost": "shipping cost in dollars"24}25},26"size": {27"description": "product size",28"type": "list",29"enum": ["XS", "S", "M", "L", "XL"]30}31}
Do not hesitate to check out the full documentation here.
Cost: The AI extraction parameters (ai_query and ai_extract_rules) incur an additional 5 credits cost on top of the regular API cost. To speed up the process of your request we encourage you to use a relevant ai_selector value.
1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&extract_rules=%7B%22title%22%3A+%22title+of+the+blog+post%22%2C+%22summary%22%3A+%22a+5+sentences+summary+of+the+blog+post%22%7D"
extract_rulesstringified JSON""
If you want to extract data from pages and don't want to parse the HTML on your side, you can add extraction rules to your API call.
The simplest way to use JSON rules is to use the following format:
1{"key_name" : "css_or_xpath_selector"}
If you wish to extract the title, subtitle and intro of our blog, you will just need to use those rules:
1{2"title": "h1",3"subtitle": "#subtitle"4}
And this will be the JSON response:
1{2"title": "The Foxscrape Blog",3"subtitle": "We help you get better at web-scraping: detailed tutorial, case studies and writing by industry experts"4}
Important: extraction rules are JSON formatted, and in order to pass them to a GET request, you need to stringify them.
We've just described the easiest and quickest way to use this feature. If you want to read more about it, check out our full guide.
1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&extract_rules=%7B%22title%22%3A+%22h1%22%2C+%22subtitle%22%3A+%22%23subtitle%22%7D"
json_responseboolFalse
If you are planning to integrate Foxscrape with third-party tools that only accept JSON response, or want to intercept the response of some XHR / Ajax requests, you can send your API call with json_response=True.
1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&json_response=True"
The following is the received response when using this parameter:
1{2// Headers sent by the server3"headers": {4"Date": "Fri, 16 Apr 2021 15:03:54 GMT",5"Access-Control-Allow-Credentials": "true"6},7// Credit cost of your request8"cost": 1,9// Initial status code of the server10"initial-status-code": 200,11// Resolved URL (following redirection)12"resolved-url": "https://httpbin.org/",13// Type of the response "html" or "json" or "b64_bytes" for file, image, pdf,...14"type": "html",15// Content of the answer. Content will be base 64 encoded if is a file, image, pdf,...16"body": "<html>... </body>",17// base 64 encoded screenshot of the page, if screenshot=true is used18"screenshot": "b0918...aef",19// Cookies sent back by the server20"cookies": [21{22"name": "cookie_name",23"value": "cookie_value",24"domain": "test.com"25}26],27// Results of the JS scenario "evaluate" instructions28"evaluate_results": [],29// Content and source of iframes in the page30"iframes": [31{32"content": "<html>... </body>",33"src": "https://site.com/iframe"34}35],36// XHR / Ajax requests sent by the browser37"xhr": [38{39// URL40"url": "https://",41// status code of the server42"status_code": 200,43// Method of the request44"method": "POST",45// Headers of the XHR / Ajax request46"headers": {47"pragma": "no-cache"48},49// Response of the XHR / Ajax request50"body": "2d,x"51}52],53// js_scenario detailed report ( only useful if using render_js=True and js_scenario=...)54"js_scenario_report": {55"task_executed": 1,56"task_failure": 0,57"task_success": 1,58"tasks": [59{60"duration": 3.042,61"params": 3000,62"success": true,63"task": "wait"64}65],66"total_duration": 3.04267},68// Metadata / Schema data69"metadata": {70"microdata": {},71"json-ld": {}72}73}
If the requested content is json, then the answers will look like this:
1{2// Headers sent by the server3"headers": {4"Date": "Fri, 16 Apr 2021 15:13:02 GMT",5"Access-Control-Allow-Credentials": "true"6},7// Credit cost of your request8"cost": 1,9// Initial status code of the server10"initial-status-code": 200,11// Resolved URL (following redirection)12"resolved-url": "https://httpbin.org/anything?json",13// Type of the response "html" of "json"14"type": "json",15// Content of the answer16"body": {17"args": {}18},19// Results of the JS scenario "evaluate" instructions20"evaluate_results": [],21// XHR / Ajax requests sent by the browser22"xhr": [],23// js_scenario detailed report ( only useful if using render_js=True and js_scenario=...)24"js_scenario_report": {},25// Metadata / Schema data26"metadata": {27"microdata": {},28"json-ld": {}29}30}
return_page_sourcebooleanFalse
To have HTML returned by the server and unaltered by the browser (before the JavaScript execution), use return_page_source=true.
This parameter is unnecessary if JavaScript rendering is disabled.
1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&return_page_source=True"
return_page_markdownbooleanFalse
Return the main content of the page in markdown format, using return_page_markdown=true, which is easier to read for LLMs.
Content will be stripped of all HTML tags and unnecessary information.
1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&return_page_markdown=True"
return_page_textbooleanFalse
Return the main content of the page in plain text format, using return_page_text=true, which is easier to read for LLMs.
Content will be stripped of all HTML tags and unnecessary information.
1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&return_page_text=True"
scraping_configstring""
To use a pre-saved scraping configuration, use scraping_config=[Scraping Configuration Name].
This parameter allows you to apply a settings configuration for any website, without having to type the settings each time you send a request. And you can find more information about it here: Preconfigured Requests.
1curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&scraping_config=Configuration-Name"