Website Offline Copy

Ever wanted to have an entire website available as a local copy? I did yesterday.

There's a website with a forum that is about to be closed and that contains lots of useful info. So I thought: "Why not save it?"

I gave some tools a try but I could not get them to get all the files. Then I decided to use the good old wget.

I logged in on an old linux machine and after reading the man page for a while I used the following command to download the site:

$ wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --restrict-file-names=windows \
     --domains example.com \
     --no-parent \
         www.example.com


In case the part of the site is protected with a log on page you may use cookies to access the content.

Firstly get the cookie with the following command: 

$ wget \
          --save-cookies cookies.txt \
          --post-data 'user=yourusername&password=yourpassword' \
          http://authurl


You can get the names of the user and password field variables that you need to post and the authentication URL by examining the logon form.
Then you can add --load-cookies cookies.txt and --keep-session-cookies parameters to use the cookies.

Popular posts from this blog

Domain Controller Machine Password Reset

Managing Active Directory User Certificates using PowerShell

Configuring a Certificate on Exchange Receive Connector