Spider
A tutorial for using the Tourmaline spider command.
On this page, you'll learn how to:
Use the
tourmaline spider
commandManipulate the spider to accomplish specific tasks
Basics
A web spider is a tool that starts with a single path and scans it for more paths. It then scans the found pages for paths again, and so on until all paths have been exhausted.
The tourmaline spider
takes the target URL as it's only required/positional argument.
The command also takes various optional arguments:
-t|--threads
: The number of parallel threads to use (defaults to 12).-o|--outfile
: Path to an output dump file.-d|--depth <DEPTH>
: Specifies the maximum depth for the spider to reach (defaults to -1, meaning none).-r|--regex <REGEX>
: A regex all paths must fit to be added to output (paths not matching will still be added to the queue).-i|--ignore <IGNORE_REGEX>
: A regex all paths must not fit to be add to output (paths matching will still be added to the queue.--force-regex
: Specifies that any paths not fitting the-r <REGEX>
will not be added to the queue.--force-ignore
: Specifies that any paths fitting the-i <IGNORE_REGEX>
will not be added to the queue.-k|--known <KNOWN>
: A comma-seperated list of known paths for the spider to start with.--known-file <KNOWN_FILE>
: The path to a file containing known paths.-l|--limit <LIMIT>
: The maxmium number of paths for the spider to return.--force-limit
: Specifies that only-l <LIMIT>
amount of paths should be scanned.
Examples
Regexes Example
Let's say we're enumerating example.com
. Initially, you might run:
Now imagine that this is the output you started to get:
This means that the site has different pages for every language it supports. This is great for people trying to read what's on the site, but it's a little annoying for us. We can filter out non-english results with:
You can can change "/en/" to the letters of any language, not just english.
However, even though our output would look like this:
The spider is still using resources to scan through those pages, thus increasing the search time. We can negate this effect with:
Which will make sure that only english paths are added to the queue.
Ignore Regexes Example
Here, we'll be enumerating the made-up site hackme.com
with:
We just want to scout out the pages on the site and see if we get anything interesting. However, upon running the command we get:
Not only is this annoying, but it makes the search much longer than it really needs to be. We can fix this like we did in the previous example, just this time using an ignore regex:
Which ensures that images won't be added to the queue.
Last updated