HTTrack Help: Webmasters versus HTTrack

Guidelines/FAQ to avoid/limit abuse

Xavier has a useful page on the official website with:

Usage guidelines for users - rules to avoid abuse
Information for webmasters - how to limit abuse

http://www.httrack.com/html/abuse.html

robots.txt

Sometimes a point of contention. Robots.txt is used to restrict (guide) robotic crawling tools, e.g. search engines. HTTrack is designed to be an offline browser, so to mirror a website intact it needs to access the website in the same way as a browser would. This is why HTTrack provides the option to ignore robots.txt directions, but by default the directions are obeyed.

Forum discussions

Why do you allow robots.txt to be overriden?
HTTrack Crashed Our Server Last Night
HTTrack vs webmasters
Your software is a danger and a scourge
RISKS?