MIME types (Multi-Purpose Internet Mail Extensions) is an Internet standard originally created to send and receive almost any kind of data via e-mail, but now also commonly used by web servers for all file transfers.
It is structured in types of data that include subtypes specifying different kinds of formated data (e.g.: type image and subtype gif). There are a huge amount of MIME types/subtypes and they are constantly growing in number, as new standards are created or new file formats become popular.
HTTrack works as follows:
www.someweb.com/index.cgi?read=20580
is found HTTrack asks for the right filetype to the server, but starts the download automatically. When the server sends an answer the link is modified, in spite of file maybe is already downloadedHTTrack uses MIME types as the known filetypes. On the contrary, server-side files won't be sent as MIME types, as neither those files will be sent to browsers at all, because browsers wouldn't know what to do with their code.
A problem may exist when HTTrack asks the server for unknown filetypes (such as cgi, asp, php, …): the robot depends on right answers from the server, and upon receipt of a wrong type, the local file will be incorrectly renamed. For example: server says a file is an image, despite it's actually an HTML file, so HTTrack will rename the local file as an image.
-%A <param>
or --assume <param>
: this option will force HTTrack to rename certain server filetypes to a given file type. E.g.: -%A asp=text/html
will force all ASP files to be renamed as .html files. The problem arises when ASP files may be pointing to image types, for example. On the other hand, this option should speed up a project just because HTTrack doesn't have to wait to know the file type of a link (note: HTTrack version 3.3 onward has improvements to filetype handling and should not require these MIME settings).There are several equivalences already built in: -%A php2 php3 php4 php cgi asp jsp pl cfm nsf=text/html
-%NN
(delayed type check) : the default modus operandi has been already explained and is based on option -%N2
. Although it has several advantages (such as transparent handling of redirections) you may wish the program to work in a different fashion: -%N0
will wait for the file type before starting any download; -%N1
will wait only for known filetypes (html, jpg, …)-%DN
or --cached-delayed-type-check=N
: as a default when updating a project HTTrack won't ask the server for filetypes of already downloaded files (-%D1
). If you suspect that files could be different from previous downloaded ones, use -%D0
to re-ask the server the filetype of links (maybe index.cgi?read=20580
is not an image anymore)-%M
or --mime-html
: will save a complete project inside a single file with an mht extension. It's a kind of MIME type, similar to eml files that Internet Explorer will load without a problem, as if it was a standard website image. If your intention is opening it with Mozilla/Firefox you will need an extension called MAF, that you could find in the Mozilla Archive Format website.