Webmasters versus HTTrack

Guidelines/FAQ to avoid/limit abuse

Xavier has a useful page on the official website with:



Sometimes a point of contention. Robots.txt is used to restrict (guide) robotic crawling tools, e.g. search engines. HTTrack is designed to be an offline browser, so to mirror a website intact it needs to access the website in the same way as a browser would. This is why HTTrack provides the option to ignore robots.txt directions, but by default the directions are obeyed.

Forum discussions