Few Wget tricks

Default featured post

Wget is one of the most popular download manager applications for Linux users. Originally, it is console based download manager but GUI versions of it such as Gwget, Winwget (Windows version) are also available. In this article I will introduce combination of some Wget useful parameters which could be used for specific purposes.

For downloading the entire website with all images, styles (CSS, Java script, etc.), and URLs, you can use the following command.

$ wget -r -p https://domain.com

The -p parameters indicates that, Wget should download all files and -r parameter tells Wget to follow robots.txt file. If you do not want Wget to follow robot file, easily you can turn it off with such as below example,

$ wget -r -p -e robots=off https://domain.com

Now for download a single complete HTML page which contains all images, styles and URLs, use the below command,

$ wget -E -H -k -K -p https://www.google.com

According to Wget manual page -E is adjust extension which means if a file of type application/xhtml+xml or text/html is downloaded and the URL does not end with the regexp \. [Hh][Tt][Mm][Ll]?, this option will cause the suffix .html to be appended to the local filename.

Also, -p switch is used to to download all the files that are necessary to properly display a given html page. This includes such things as inline images, sounds, and referenced style sheets.

  • -k or --convert-links – converts the links in the downloaded document to make them suitable for local viewing
  • -K or --backup-converting – backups the original file version with a ‘.org’ suffix, while converting a file
  • -H or --span-hosts – enables spanning across hosts when doing recursive retrieving

Finally, for fetching specific types of the files from the website you can use Wget with the following commands.

$ wget -nd -r -A jpeg,jpg,gif,png https://www.google.com
  • -nd causes no hierarchy directories of the website will be created, in other word, all images will be saved in one directory
  • -r turns on recursive retrieving
  • -A indicates specific file names or file suffixes

You also can set username and password for Wget for those websites which need username and password like below.

$ wget --user=MyUserName --password='MyPassword' http://myWebsite.com/MyMusic.mp3

More information can be found from Wget manual which is accessible via this link.