Difference (from prior minor revision)
Changed: 7c7
< Sometimes a point of contention. Robots.txt is used to restrict (guide) robotic crawling tools, e.g. search engines. HTTrack is designed to be an offline browser, so to mirror a website intact it needs to access the website in the same way as a browser would. This is why HTTrack provides the option to ignore robots.txt directions.
to
> Sometimes a point of contention. Robots.txt is used to restrict (guide) robotic crawling tools, e.g. search engines. HTTrack is designed to be an offline browser, so to mirror a website intact it needs to access the website in the same way as a browser would. This is why HTTrack provides the option to ignore robots.txt directions, but **by default** the directions are obeyed.
Xavier has a useful page on the official website with:
http://www.httrack.com/html/abuse.html
Sometimes a point of contention. Robots.txt is used to restrict (guide) robotic crawling tools, e.g. search engines. HTTrack is designed to be an offline browser, so to mirror a website intact it needs to access the website in the same way as a browser would. This is why HTTrack provides the option to ignore robots.txt directions, but by default the directions are obeyed.