Clean Text

By default, FoxScrape will return clean content. This means it will remove trailing spaces and empty characters from the results ('\n', '\t', etc.). If you don't want to enable this behavior, you should disable it by setting clean: false with your data extraction rule.

clean: true | false (default: true)

Example

Here is an example for extracting post description from our blog using clean: true:

JSON

1{
2  "extract_rules": {
3    "first_post_description": {
4      "selector": ".card > div",
5      "clean": true
6    }
7  }
8}

The information extracted by the above rules on FoxScrape's blog page would be:

JSON

1{
2  "first_post_description": "How to Use a Proxy with Python Requests? - (7min) By Maxine Meurer 13 October 2021 In this tutorial we will see how to use a proxy with the Requests package. We will also discuss on how to choose the right proxy provider.read more"
3}

If you use clean: false:

JSON

1{
2  "extract_rules": {
3    "first_post_description": {
4      "selector": ".card > div",
5      "clean": false
6    }
7  }
8}

You would get this result instead:

JSON

1{
2  "first_post_description": "\n                How to Use a Proxy with Python Requests? - (7min)\n        \n            \n            \n            By Maxine Meurer\n            \n            \n            13 October 2021\n            \n        \n        In this tutorial we will see how to use a proxy with the Requests package. We will also discuss on how to choose the right proxy provider.\n        read more\n        "
3}