Fractle

Robots Exclusion Protocol


Mandelbot implements the Robots Exclusion Protocol. This lets you control what files are crawled using a robots.txt file and lets you control what files are indexed using Robot Tags.


Crawling vs Indexing

Mandelbot crawls pages by visiting and collecting information from them. It then indexes pages by organizing what it knows about them so they can be found through Fractle.

Mandelbot doesn't have to crawl a page to index it. In fact, pages may appear in Fractle before being crawled based on information from other pages that have been crawled.


Should I use robots.txt or Robot Tags?

You don't have to use any. If you're not using them for other robots and you want Mandelbot to crawl your site, you don't need to do anything. If you already use them for other robots, you may need to update them to allow us to crawl your site. If you already use them or want to block Mandelbot from some parts of your site, continue reading.

A robots.txt file allows you to control what files Mandelbot requests from your server. If a file is blocked by a robots.txt file, Mandelbot will not request it from your server. However, blocked files may continue to appear in Fractle's index based on information from other pages.

Robot Tags allow you to control what files Mandelbot indexes. If a file is blocked by Robot Meta Tags or a X-Robots-Tag HTTP Header, Mandelbot will request it from your server and then exclude it from Fractle's index. Until crawled and the index updated, blocked files may appear in Fractle's index based on information from other pages. Mandelbot will continue to request the blocked pages as part of future crawls to check for changes to the Robot Tags.


Avoid Conflicting Directives

Attempts to use both robots.txt and Robot Tags together can cause conflicts if both act on the same files.

When a file is blocked by robots.txt, it won't be crawled and Mandelbot will never see or act on any Robot Meta Tags or X-Robots-Tag HTTP Headers that may be present on the file.


Implementation Details

Mandelbot supports disallow directives, allow directives, wildcards, and comments on robots.txt files. For specifications, details, and examples, read how Mandelbot supports robots.txt.

Mandelbot supports the noindex directive in both Robot Meta Tags and in the X-Robots-Tag HTTP Header. For specifications, details, and examples, read how Mandelbot supports Robot Meta Tags and the X-Robots-Tag HTTP Header.