Tardis Beginner Tutorials/10
Tutorial 10: Using Links to browse the internet and wget to download files
The tardis firewall is quite restrictive to protect both tardis from the internet and the internet from potentially malicious tardis users. This includes restricting access to webservers directly, except some selected servers such as the debian mirror sites. However, tardis can also access the university proxy internally, and this is what you need to use to browse the internet from inside tardis.
Links is a very good non-graphical browser. It is just like internet explorer or whatever you normally use, but it runs from the console and doesn't display pictures - just text, and ascii interpretations of page features and layout. This sounds a bit limited, but you'll be amazed how easy it is to browse with it. It is descended from lynx which is also available on tardis - the main differences are that links has a menu, a download manager, supports more page features such as frames, and is generally more fully featured and useful. Run it now by typing links. You will get a daunting black screen, which may initially make you think it is loading - in fact this is the program. Press escape and a menu appears at the top of the screen. Use your arrow keys to move along to Setup, then down to Network Options. You will need to configure links to use the university http proxy. In both proxy fields, type in "wwwcache.cache.ed.ac.uk:3128" and press OK. You may find it surprising and very useful to know that you can use your mouse in links, just by clicking in the right place in putty. So you can simply click the ok button with your mouse, and navigate webpages the same way. Now press g and a dialogue box will pop up. Type in a familiar url, such as google.com - it will load very quickly, as there are no images! You can type things into the input boxes as usual, and click the buttons. There are few sites on the internet that cannot be browsed with links in fact. It also has an internal help system if you ever get lost. You should save your network settings so the proxy is always configured when you run links - press escape to go to the menu again, then setup, and save options. To scroll up and down in a long page, press ^N and ^P. You can also go to a url directly from the commandline by typing links [URL]. That's all there is to it!
The very handy program wget is as old as the stars but incredibly powerful. It is capable of retrieving files from both http and ftp, use proxies, download entire directories recursively, spider links and mirror content intelligently. You can use it to download a single file by using wget [full url of file], and an entire directory tree with the -r option (although you should read the man page for correct usage, as there are many other flags that are needed to get exactly the results you want). Before you can use wget however, you have to make sure your environment has the proxy settings in - you can see your environment by typing env. Now let's export the right settings like we did previously with our EDITOR variable:
export HTTP_PROXY='http://wwwcache.cache.ed.ac.uk:3128' export FTP_PROXY='http://wwwcache.cache.ed.ac.uk:3128'
The "http://" part is important, and is usually the part people forget. With this in your environment wget will know to use the proxy - you can disable it to get local files with wget -Y off [URL]. Case is important here as always - that's an uppercase "Y". You should add these two export commands to your "~/.profile" as i showed you in the bash tutorial. Now let's get a web page and look at it:
We should now have a file in our current directory called index.html. Let's read it:
We see the raw html of the document. We can feed it to links to see it as if it was a normal webpage:
Alternatively we can use html2text to strip out the html code and convert it to a regular text document:
This dumps the output to STDOUT, which is your screen, by default. You can save this to a file by piping stdout to a file:
html2text index.html > tempfile.txt
I won't discuss pipes here, but they are something you ought to know about to use the shell to its full potential. I advise googling for a bash pipes tutorial - why not do so using links from your tardis shell! Finally, simply delete the two files we created:
rm index.html tempfile.txt
...and we're done. For the next tutorial you will see why wget is so useful when we use it to download our first source package and compile it!