Flash

Difference (from prior minor revision)

Changed: 1c1

< =How can I get Flash websites?=

to

> ==How can I get Flash websites?==

Changed: 4,9c4,8

< ==The website uses server ressources==
< Any Flash site using a database or a server ressource cannot be browsed offline.
< ==The website doesn't use server ressources==
< By default HTTrack downloads swf files and explores them.
< Unfortunately HTTrack will not extract all the hyperlinks in the Flash file. The code from Macromedia will only return html files.
< ===How can I get the missing links?===

to

> ===The website uses server resources===
> Any Flash site using a database or a server resource cannot be browsed offline.
> ===The website doesn't use server resources===
> By default HTTrack downloads swf files and explores them. Unfortunately HTTrack will not extract all the hyperlinks in the Flash file. The code from Macromedia will only return html files.
> ==How can I get the missing links?==

Changed: 16c15

< ====The missing links are absolute====

to

> ===The missing links are absolute===

Changed: 22c21

< ====The missing links are relative====

to

> ===The missing links are relative===

Changed: 30c29

< ===Some of the missing links are relative===

to

> ==Fixing missing relative links==

Added: 36a36,37

> == Resources ==
> See Dan's site for [http://danzcontrib2.free.fr/en/captures.php#flash specific Flash capture examples] and more tips.


How can I get Flash websites?

Flash sites can be very difficult to mirror as they may embed a scripting language (ActionScript), they may use external javascript, objects or streaming and as they can communicate with a database.

Sometimes, Flash is used by authors trying to protect their work. Ask for authorization first.

The website uses server resources

Any Flash site using a database or a server resource cannot be browsed offline.

The website doesn't use server resources

By default HTTrack downloads swf files and explores them. Unfortunately HTTrack will not extract all the hyperlinks in the Flash file. The code from Macromedia will only return html files.

How can I get the missing links?

The best Open Source utility to explore a swf file is SWFRIP.

Open the swf file with SWFRIP.

It will create a subfolder with two files: info.txt and actions.txt.
If the file is compressed or protected, it will let you save it uncompressed.
The file actions.txt will list the URLs called by the file and show the script.

If the URLs are not obscured by functions or by an anti-decompiler utility, the links will appear in the instruction getURL("link","frame").

The missing links are absolute

Add them in the scan rules:
Set options > Scan Rules

+www.website_to_mirror.ext/path/filename1.ext

The missing links are relative

In this case, add the full path to the file names and add them in the scan rules:

+www.website_to_mirror.ext/path/filename2.ext
+www.website_to_mirror.ext/path/filename3.ext

Httrack will download the files and you may have to do the same with the swf files that have been fetched!
If all the links are relative, the mirror should be in working order.

Fixing missing relative links

The only solution is to modify the swf file with a hexadecimal editor. Ask the author for authorization first.

  1. Save the file with SWFRIP
  2. Open the uncompressed swf file
  3. Locate the links. You will find the full path followed by a 0. The mirrored files (php, asp, cfm, jsp…) will have an html extension
  4. Move the string with its 0 so that it corresponds to the name of the mirrored file and becomes relative

This method may or may not work for all the files as using ActionScript is becoming more and more common.

Resources

See Dan's site for specific Flash capture examples and more tips.