CSS or XPath selectors

You can use extract rules with CSS or XPath selectors. By default, the rules will work without the need to specify the kind of selector you are using.

The rules will consider any selector beginning with a / as an XPath selector, everything else will be considered a CSS selector.

JSON
1
{
2
"extract_rules": {
3
"title": "#title"
4
}
5
}

CSS selector

JSON
1
{
2
"extract_rules": {
3
"title": "//h1[@id="title"]"
4
}
5
}

XPath selector

JSON
1
{
2
"extract_rules": {
3
"title": "/html/body/h1[@id="title"]"
4
}
5
}

XPath selector

Sometimes, you might want to force this behavior if:

  • you use an XPath selector which doesn't begin with /
  • you use a CSS selector which begins with /
  • you simply want to make your code clearer

Then you can use the selector_type property.

JSON
1
{
2
"extract_rules": {
3
"title": {
4
"selector": "#title",
5
"selector_type": "css"
6
}
7
}
8
}

CSS selector

JSON
1
{
2
"extract_rules": {
3
"title": {
4
"selector": "./html/body/h1[@id="title"]",
5
"selector_type": "xpath"
6
}
7
}
8
}

XPath selector

selector_type: auto | css | xpath (default: auto)