Headless Browser

render_jsbooleanTrue

By default, Foxscrape fetches the URL to scrape via a headless browser that will execute the JavaScript code on the page. This is the default behavior and costs 5 credits per request.

This can be useful for scraping a Single Page Application built with frameworks such as React.js, Angular.js, JQuery or Vue.

To fetch the URL without using a headless browser, use the render_js=false parameter in the GET request.

The following is an example with a dummy Single Page Application (SPA):

If you use render_js=true (default behavior):

1
curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL"

The following result is returned:

HTML
1
<html>
2
<head>
3
...
4
</head>
5
<body>
6
<content>
7
</content>
8
<content>
9
</content>
10
<content>
11
</content>
12
<content>
13
</content>
14
<content>
15
</content>
16
</body>
17
</html>

But if you use render_js=False instead:

1
curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&render_js=False"

This is what is returned:

HTML
1
<html>
2
<head>
3
..
4
</head>
5
<body>
6
</body>
7
</html>

js_scenariostringified JSON{}

If you want to interact with pages you want to scrape before we return you the HTML you can add JavaScript scenario to your API call.

For example, if you wish to click on a button, you will need to use this scenario:

JSON
1
{
2
"instructions": [
3
{"click": "#buttonId"}
4
]
5
}

And so our scraper will scrape the webpage, click on the button #buttonId and then return you the HTML of the page.

Important: JavaScript scenario are JSON formatted, and in order to pass them to a GET request, you need to stringify them.

You can add multiple instructions to the scenario, they will get executed one by one on our end.

Below is a quick overview of all the different instruction you can use:

  • {click: "#button_id"} - Click on an element
  • {wait: 1000} - Wait for a fixed duration in ms
  • {wait_for: "#slow_div"} - Wait for an element to appear
  • {wait_for_and_click: "#slow_div"} - Wait for an element to appear and then click on it
  • {scroll_x: 1000} - Scroll the screen in the horizontal axis, in px
  • {scroll_y: 1000} - Scroll the screen in the vertical axis, in px
  • {fill: ["#input_1", "value_1"]} - Fill some input
  • {evaluate: "console.log('toto')"} - Run custom JavaScript code
  • {infinite_scroll: {...}} - Scroll the page until the end

The infinite_scroll instruction accepts the following parameters:

  • max_count - Maximum number of scroll, 0 for infinite
  • delay - Delay between each scroll, in ms
  • end_click - (optional) Click on a button when the end of the page is reached, usually a "load more" button. Format: {selector: "#button_id"}
1
curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&js_scenario=%7B%22instructions%22%3A+%5B%7B%22click%22%3A+%22%23buttonId%22%7D%5D%7D"

waitinteger0

Some code-heavy websites need time to fully "render". To direct Foxscrape to wait before it returns the fully rendered HTML, use the wait parameter with a value in milliseconds between 0 and 35000.

The Foxscrape headless browsers will then wait the duration of the time set in milliseconds before returning the page's HTML.

1
curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&wait=10000"

wait_forstring""

It's sometimes necessary to wait for a particular element to appear in the DOM before Foxscrape returns the HTML content.

Our headless browsers will wait for the CSS / Xpath selector passed in the parameter before returning the HTML.

For example, to wait for the element <div class="loading-done"></div> use wait_for=.loading-done in your request.

All selectors beginning with / will be treated as XPath selectors. All other selectors will be treated as CSS selectors.

Please note that if you use wait and wait_for, our system will first execute wait_for and then wait. And, after wait, js_scenario is executed. If you want to control the order of wait and wait_for, you can use it in js_scenario. In js_scenario, the execution is based on the order in which you specify the instructions.

1
curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&wait_for=.loading-done"

wait_browserstringdomcontentloaded

This advanced parameter tells the browser to wait until certain network condition are met.

It can take 4 different values:

  • domcontentloaded (default): Wait until the DOM is loaded
  • load: Wait until the page is fully loaded
  • networkidle0: Wait until there are no more than 0 network connections for at least 500 ms
  • networkidle2: Wait until there are no more than 2 network connections for at least 500 ms

For example, to wait until the page is fully loaded before getting the results, you can use wait_browser=load.

1
curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&wait_browser=load"

block_adsbooleanTrue

By default, Foxscrape does not block ads. To avoid scraping them (e.g., to speed up your request), use block_ads=true

This parameter is unnecessary if JavaScript rendering is disabled.

1
curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&block_ads=True"

block_resourcesbooleanTrue

By default, and to speed up requests, Foxscrape blocks all images and CSS in the scraped page, but to scrape them, use block_resources=false

This parameter is unnecessary if JavaScript rendering is disabled.

1
curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&block_resources=True"

window_widthint1920

If you need to change the dimension of the browser's viewport (window) when scraping the target page you can use the window_width and window_height parameters.

Only useful when using render_js=True.

1
curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&window_width=1500"

window_heightint1080

If you need to change the dimension of the browser's viewport (window) when scraping the target page you can use the window_width and window_height parameters.

Only useful when using render_js=True.

1
curl "https://www.foxscrape.com/api/v1?api_key=YOUR_API_KEY&url=YOUR-URL&window_height=500"