(Based on Fred Cohen's guide. Updated by Jargoon, edited by Leto.)
Filters gives you control over what files HTTrack will or will not download. You can both expand and restrict the access to websites you are trying to mirror.
Whenever you make a mirror of a website HTTrack tries to download everything inside the starting directory and in any sub-directory associated. It ignores anything outside that domain by default. If you need any different behaviour, you may wish to have a try with filters: they will let you add some parts of other websites, or deny certain sub-directories of current website, and offer an opportunity to get only certain kind of files.
Let's start by learning three important terms:
http://www.httrack.com/html/filters.html) and need to be explicitly added to the project. They are almost always starting points for HTTrack to follow.
-N1001) (used with the command-line), or the many settings available in the WinHTTrack GUI. They control what HTTrack does with the web addresses given.
To include certain kind of things use a plus sign (+). To exclude anything you don't need, use a minus sign (-). Asterisks (*) work like wildcards to match any number of characters.
+www.all.net/i_want_this/* -www.all.net/i_dont_want_that/* +*.edu.au/*.jpg
A list of filters works from least important to most important (latter filters take precedence over previous filters).
In the following example, even though a restriction has been added with the minus filter, all GIF files found will be downloaded because the second filter is overriding the first filter (more specifically, the first filter is not even applied due to the second filter):
However in this example, the two filters work together nicely. The first filter is initially allowing GIF files from any domain/server, but the second filter restricts that to deny any GIF files that are inside an "images" directory. Another way to think of what these filters are doing is: if a GIF file (on any domain) is not in an "images" directory then permit it to be downloaded