Webmasters versus HTTrack

Difference (from prior minor revision)

Changed: 7c7

< Sometimes a point of contention. Robots.txt is used to restrict (guide) robotic crawling tools, e.g. search engines. HTTrack is designed to be an offline browser, so to mirror a website intact it needs to access the website in the same way as a browser would. This is why HTTrack provides the option to ignore robots.txt directions.

to

> Sometimes a point of contention. Robots.txt is used to restrict (guide) robotic crawling tools, e.g. search engines. HTTrack is designed to be an offline browser, so to mirror a website intact it needs to access the website in the same way as a browser would. This is why HTTrack provides the option to ignore robots.txt directions, but **by default** the directions are obeyed.


Guidelines/FAQ to avoid/limit abuse

Xavier has a useful page on the official website with:

http://www.httrack.com/html/abuse.html

robots.txt

Sometimes a point of contention. Robots.txt is used to restrict (guide) robotic crawling tools, e.g. search engines. HTTrack is designed to be an offline browser, so to mirror a website intact it needs to access the website in the same way as a browser would. This is why HTTrack provides the option to ignore robots.txt directions, but by default the directions are obeyed.

Forum discussions