AI Extraction
If you want to extract data from pages and don't want to parse the HTML on your side and play with CSS or XPATH selectors, you can add AI extraction rules to your API call.
Those features allow to extract the information you need by using natural language. It also allows you to extract structured data from pages that are not well structured from an easy to write JSON schema.
Cost: The AI extraction parameters (ai_query and ai_extract_rules) incur an additional 5 credits cost on top of the regular API cost. To speed up the process of your request we encourage you to use a relevant ai_selector value.
Short notation
If you want to extract data from pages and don't want to parse the HTML on your side, you can add AI extraction rules to your API call.
The simplest way to use JSON rules is to use the following format:
1{"key_name" : "what you want to extract"}
If you wish to extract the title and a summary of some of our blog posts, you will just need to use those rules:
1{2"title" : "title of the blog post",3"summary" : "a 5 sentences summary of the blog post"4}
And this will be the JSON response:
1{2"title" : "How to web scrape with FoxScrape",3"summary" : "We help you get better at web-scraping: detailed tutorial, case studies and writing by industry experts"4}
Important: extraction rules are JSON formatted, and in order to pass them to a GET request, you need to stringify them.
Advanced notation
Types
You can force the type of the extracted data by using the type property. Here are the possible values:
- string: the value will be a string (default)
- list: the value will be an array of strings
- number: the value will be a number
- boolean: the value will be a boolean
- item: the value will be an object
In addition to those types you can use the enum key to specify a list of allowed values.
Here is an example of each type used to extract information from a product on an ecommerce website:
1{2"name": {3"description": "the product name",4"type": "string"5},6"categories": {7"description": "all product categories",8"type": "list"9},10"price": {11"description": "the product price in dollars",12"type": "number"13},14"in_stock": {15"description": "whether the product is currently available",16"type": "boolean"17},18"shipping_info": {19"description": "shipping details including delivery time and cost",20"type": "item",21"output": {22"delivery_time": "estimated delivery in days",23"shipping_cost": "shipping cost in dollars"24}25},26"size": {27"description": "product size",28"type": "list",29"enum": ["XS", "S", "M", "L", "XL"]30}31}
Speed up requests
The ai_selector parameter is optional and allows you to specify a CSS selector to focus the AI extraction on a specific part of the page. This can help improve accuracy and reduce processing time. For example:
1ai_selector="#product-details"
This tells the AI to only consider the content within the element with the ID "product-details" when extracting the information specified in the ai_query.
Using the ai_selector can help speed up the request by limiting the amount of content the AI needs to process.
Using both parameters together can provide more precise and efficient data extraction:
1ai_query="price of the product"2ai_selector="#product-details"