Showing revision 2

URL number sequences

The problem: There is a list of pages or files, where the URLs contain a range of numbers, and you want to mirror all within a certain range. For example:

www.domain.com/folder/img[001-999].jpg

You will need to add to the HTTrack project one URL for each file/page you wish to mirror. Of course you would not want to manually create this list because it could be very long.

See the possible solutions below to create a text file containing the URLs you want. Use that text file in your HTTrack project (URL list option in WinHTTrack).

To ensure you get only the pages/files in your URL list, set—or prepend—your filters (scan rules) to the following, changing "www.domain.com" to whatever the actual website is:

-* +www.domain.com/*

Solution: URL Generator

There is a tool for Windows called URL Generator (http://www.spadixbd.com/freetools/urlgen.htm). This allows you to create lists of URLs containing simple number ranges.

Please note that the HTTrack Help site does not endorse this product nor guarantee that it will work for you.

Save the URL list to a text file and add it to your project.

Source: http://forum.httrack.com/readmsg/9986/9970/index.html

Solution: Bash shell

> is it possible to download all "id's" - to download
> all pages - increasing id=0001 to id=0002 and so on
> each cycle?

Nope. But you can do this very easily in bash shell:

(i=0;
while test "$i" -lt 100; do
echo "http://www.example.com/?id=${i}"
i=$[$i + 1]
done) > list.txt

Source: http://forum.httrack.com/readmsg/12917/index.html?pid=12879