The Best Way To Download A Website For Offline Use By Using Wget

There are two ways - the first way is just one command run plainly in front of you; the second one runs in the background and in a different instance so you can get out of your ssh session and it will continue.

First make a folder to download the websites to and begin your downloading: (note if downloading www.SOME_WEBSITE.com, you will get a folder like this: /websitedl/www.SOME_WEBSITE.com/)

STEP 1:

mkdir ~/websitedl/ 
cd ~/websitedl/

Now choose for Step 2 whether you want to download it simply (1st way) or if you want to get fancy (2nd way).

STEP 2:

1st way:

wget --limit-rate=200k --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla http://www.SOME_WEBSITE.com

2nd way:

TO RUN IN THE BACKGROUND:
nohup wget --limit-rate=200k --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla http://www.SOME_WEBSITE.com &
THEN TO VIEW OUTPUT (there will be a nohup.out file in whichever directory you ran the command from):
tail -f nohup.out

WHAT DO ALL THE SWITCHES MEAN:

--limit-rate=200k limit download to 200 Kb /sec

--no-clobber don’t overwrite any existing files (used in case the download is interrupted and resumed).

--convert-links convert links so that they work locally, off-line, instead of pointing to a website online

--random-wait random waits between download - websites dont like their websites downloaded

-r recursive - downloads full website

-p downloads everything even pictures (same as --page-requsites, downloads the images, css stuff and so on)

-E gets the right extension of the file, without most html and other files have no extension

-e robots=off act like we are not a robot - not like a crawler - websites dont like robots/crawlers unless they are google/or other famous search engine

-U mozilla pretends to be just like a browser Mozilla is looking at a page instead of a crawler like wget

PURPOSELY DIDN’T INCLUDE THE FOLLOWING:

-o=/websitedl/wget1.txt log everything to wget_log.txt - didn’t do this because it gave me no output on the screen and I don’t like that.

-b runs it in the background and I can’t see progress… I like “nohup &” better

--domain=steviehoward.com didn’t include because this is hosted by Google so it might need to step into Google’s domains

--restrict-file-names=windows modify filenames so that they will work in Windows as well. Seems to work okay without this.

tested with zsh 5.0.5 (x86_64-apple-darwin14.0) on Apple MacBook Pro (Late 2011) running OS X 10.10.3

Source: gist.github, @stvhwrd

13 Likes

Why you started from a very hard way?
I am using Command promopt in windows 7 , and that tells that your command is invalid? so how we can use wget?

You can get wget in this site: https://www.gnu.org/software/wget/

I have downloaded and install it, but how i will use it.?