Differences
This shows you the differences between two versions of the page.
wget [2009/05/25 00:35] |
wget [2009/05/25 00:35] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== WGET ====== | ||
+ | |||
+ | ===== Mirroring a web site on Linux ===== | ||
+ | |||
+ | You almost certainly have wget already. Try wget --help at the command line. If you get an error message, install wget with your Linux distribution' | ||
+ | |||
+ | Once you have wget installed correctly, the command line to mirror a web site is: | ||
+ | |||
+ | wget -m -k -K -E http:// | ||
+ | |||
+ | See man '' | ||
+ | |||
+ | If this command seems to run forever, there may be parts of the site that generate an infinite series of different URLs. You can combat this in many ways, the simplest being to use the -l option to specify how many links " | ||
+ | |||
+ | **Note: some web servers may be set up to " | ||
+ | |||
+ | ===== Advance options ===== | ||
+ | |||
+ | # Collect only the specific links listed line by line in | ||
+ | # the local file " | ||
+ | # Use a random wait of 0 to 33 seconds between files. | ||
+ | # When there is a failure, retry for up to 22 times with 48 seconds | ||
+ | # between each retry. | ||
+ | # Place all the captured files in the "/ | ||
+ | # and collect the access results to the local file " | ||
+ | # Good for just downloading specific known images or other files. | ||
+ | wget -t 22 --waitretry=48 --wait=33 --random-wait --user-agent="" | ||
+ | -e robots=off -o ./ | ||