Monday, 26 September 2011

Looking for nicely formatted bibtex for your citations...

I've been doing another paper trawl to put together a document for a project I am working on, and came across this nice website http://www.pubzone.org.

Just search for the title of the paper you want and then click on export bibtex. It gives you some of the best formatted bibtex I've come across even going so far as to include the "proceedings" entry that accompanies an "inproceedings" entry.

Tuesday, 20 September 2011

Working with .profile

Most of the magic that goes into making things work nicely goes in the .profile file in your home directory. In this file you can tell linux how you have arranged your files and how you want to work. Setting this up correctly will make things a breeze. For those who do not know .profile is a file that contains a bunch of shell script commands that are executed every time you start up a shell.

Firstly - why .profile? Why not .chsrc or .bashrc or .login or any of the other files? Well, the reason is simple - .profile gets executed in such a way that it affects all programs that you run in Ubuntu, be it from a command-line terminal or from the GUI. If you are using a different flavour of linux then you may have to look at how the startup process runs to find the best place to set environment variables, but .profile should work with all types of linux.

So, to get things going here is my .profile:

# ~/.profile: executed by the command interpreter for login shells.
# This file is not read by bash(1), if ~/.bash_profile or ~/.bash_login
# exists.
# see /usr/share/doc/bash/examples/startup-files for examples.
# the files are located in the bash-doc package.

# the default umask is set in /etc/profile; for setting the umask
# for ssh logins, install and configure the libpam-umask package.
#umask 022

# if running bash
if [ -n "$BASH_VERSION" ]; then
    # include .bashrc if it exists
    if [ -f "$HOME/.bashrc" ]; then
	. "$HOME/.bashrc"
    fi
fi

CHECK_64=`uname -a | grep x86_64`


if [ -n "${CHECK_64}" ]; then
    export ARC_POSTFIX=64
    export ARC=linux64
else
    export ARC_POSTFIX=32
    export ARC=linux
fi

export C_INCLUDE_PATH="${HOME}"/include
export CPLUS_INCLUDE_PATH="${C_INCLUDE_PATH}"
export INCLUDES=-I"${C_INCLUDE_PATH}"

export RAVL_INSTALL="${HOME}"/
export PROCS=4
export PROJECT_OUT="${HOME}"/.ravl_out

export LIBS=-L"${HOME}"/lib/
export LIBRARY_PATH="${HOME}"/lib:"${PROJECT_OUT}"/lib
export LD_LIBRARY_PATH="${LIBRARY_PATH}"

export PATH=./:"${PROJECT_OUT}"/bin:"${HOME}"/bin:"${PATH}"

export PYTHONPATH="${HOME}"/.ipython/:"${HOME}"/lib/python/

export ASPELL_CONF="master en_GB"
export GREP_OPTIONS=--exclude-dir=.svn
export OSG_FILE_PATH="${HOME}"/share/OpenSceneGraph-Data
export CMAKE_PREFIX_PATH=${HOME}

OK, so that looks pretty complicated, let's break it down and see what we are doing. Firstly, all the stuff up until the first export statement is just the standard stuff that gets put in .profile by Ubuntu. So we can ignore that. Lets now look at the export statements and see what they do.

export C_INCLUDE_PATH="${HOME}"/include
export CPLUS_INCLUDE_PATH="${C_INCLUDE_PATH}"
export INCLUDES=-I"${C_INCLUDE_PATH}"

These statements are setting up various compilers to automatically look in ~/include for header files. Now anything we put in ~/include will automatically get picked up by pretty much any build system that uses gcc.

export RAVL_INSTALL="${HOME}"/
export PROCS=4
export PROJECT_OUT="${HOME}"/.ravl_out

These next statements are specific to RAVL (a computer vision library we use here at the University of Surrey). RAVL is set up to build to wherever the $PROJECT_OUT environment variable points. Here I am setting it to a hidden directory, and I am setting it here so I can reference that value later.

export LIBS=-L"${HOME}"/lib/
export LIBRARY_PATH="${HOME}"/lib:"${PROJECT_OUT}"/lib
export LD_LIBRARY_PATH="${LIBRARY_PATH}"

These are some of the more important statements in .profile. These are telling the compiler and the runtime system to look in ~/lib for libraries. By setting LD_LIBRARY_PATH we are telling the system to look in ~/lib before it looks in /usr/lib. Now there is a potential security risk here which is why many websites will recommend that you do not use LD_LIBRARY_PATH (the risk is that it is easier for malicious code to get into ~/lib than into /usr/lib). Personally I think the risk is higher if you are constantly copying things in to /usr/lib, but this is something to be aware of. Another issue here is that some of the startup files in the X11 system (specifically the ones to do with ssh) strip LD_LIBRARY_PATH from the environment at load time precisely because of this security issue. If you are happy and understand the risks then you can go here to see how to fix that. Otherwise you will just have to start things from a command prompt to be able to run your code nicely and easily.

export PATH=./:"${PROJECT_OUT}"/bin:"${HOME}"/bin:"${PATH}"

This line sets the path which is where linux looks to find commands to execute; useful for running your programs! Here I am setting it up to run stuff I compile with the RAVL QMake build system, and stuff I compile to ~/bin. Note it is important to include the default ${PATH} variable on the end otherwise you won't be able to run anything installed on your machine!

export PYTHONPATH="${HOME}"/.ipython/:"${HOME}"/lib/python/

This line sets the PYTHONPATH. If you are using python you will find this invaluable in addition to setting up the .pydistutils.cfg which I will cover later. These files tell python where to look for stuff you have installed (and in the case of .pydistutils.cfg - where to install stuff)

export ASPELL_CONF="master en_GB"
export GREP_OPTIONS=--exclude-dir=.svn
export OSG_FILE_PATH="${HOME}"/share/OpenSceneGraph-Data
export CMAKE_PREFIX_PATH=${HOME}

These final options are just a bunch of application specific settings - setting the Aspell dictionary, telling Grep to ignore Subversion files, telling osg where to look for data files and telling CMake where to look for stuff.

So now you should have all the ingredients to work from your home directory, both building your own code, 3rd party apps, and bleeding edge projects whose source you have downloaded off the internet. Next we will look at  building some of our own code and a project from the internet.

Friday, 16 September 2011

Working with Linux: Guiding Principles

In this post I will layout the basic principles that I have come to rely upon when working in a Linux environment to keep my data and code clean and nicely integrated. They may seem very simple and obvious to those who have worked with Linux for any time, but a) I don't see everyone else following them and b) coming from a Windows background it took me a while to work out what was important and how to manage things.

1st. Do not interfere with the base system. The bases system in this case is everything outside of your home directory. And by not interfering I mean do not add / remove files or edit configs other than using the distribution's standard UI (and try not to edit configs in /etc at all if you can help it). So for example, using Ubuntu, I will only put programs into /usr/lib/ using apt-get. This goes for svn builds of up-to-date libraries I am using - no "sudo make install" for me. It took me a while to realise how important this is for smooth operation of a machine. At first I just shoved my built code and 3rd party (not yet packaged) stuff into /usr/lib. I soon hit versioning problems and things became a mess. It also makes it hard to tell whether your code will run on another machine or not.

2nd. Document your changes. Rules are made to be broken, particularly the rule above - you will of course encounter some 3rd party lib that will only work if it is copied to /usr/share/whatever with a link in /etc, and if you want to use it you have to break the 1st rule and put it where it asks to be put. The thing is a) this should be rare and b) you should make a note of what you have done.

So how do we implement these policies in practice. Well, for the first rule you create a bin, lib, include and share directory in your home drive. With hindsight I think I should have created an opt directory too, so if you are starting from scratch do that. Then everything that would go in /usr/lib goes in ~/lib and so on for the other dirs. I have tried using /usr/local in a similar manner, but to be honest that worked out as more hassle than it was worth. Also, using your home directory will work on machines where you do not have admin rights or where your home dir is on a network share and accessed by multiple machines.

As for the second principle, I recommend keeping a text file on Dropbox or some other file sharing service (like UbuntuOne if you can get it to work). This document should also record config changes you have to make to get hardware working etc. That way when you need to undo these hacks you can easily see all the changes you have made. When you do this you will find that updating 3rd party libs is easy as you can just clean out the old version manually (or check that the automatic cleanup worked - it's quite amazing how many poorly written uninstall scripts don't remove everything they put there[1]). Also, if you suddenly want to work on another machine (got a new laptop or desktop machine) you know what changes you need to make to get things up and running. Or, when you have a catastrophic harddrive failure (believe me, they happen) you can get back up and running a lot faster. And finally, if you keep a note of all those config hacks you have been accumulating over time, you can try removing them when your distribution upgrades so you don't accumulate crud in /etc and can take advantage of improved services as they become available. This is particularly applicable to laptops - when you get a new laptop you often end up rewriting /etc to try to get wireless / mouse / soundcard working. In 12 moths stuff will work out the box, but unless you go back and restore the config files, your distribution will keep hold of the manual changes you have made, often with negative consequences.

While you are doing this, you might want to keep a list of the packages you have installed on the machine. This way when you change machines you can do so easily and quickly. In fact, if you can keep much of these documents as python scripts you are really onto a winner because then restoring (or moving) your entire work environment can be as easy as copying over your home directory and running a couple of scripts!

So those are the principles - don't go hacking around in the bowels of your system and keep a record of what you do. Pretty simple eh? Next post I will look at how we actually put these into practice and show you how to organise your work in some sensible directories and how to set up an environment so everything just works using the magic of .profile.

[1] - If you are writing something to install on a user's machine, write out an installation manifest to /var/lib/libmystuff/installation.manifest or somewhere else sensible and write to that every change you make to the user's system. Then people can do a clean uninstall of libmystuff3 even if they have deleted the installation files for libmystuff3 and already got hold of libmystuff4. You could even check for the file and do an automatic cleanup when installing a newer version!

Tuesday, 6 September 2011

How to work with Linux

The beautiful thing about Linux is that there are a hundred ways to do everything. The horrible thing about Linux is that there are a hundred ways to do anything!

I have spent the last 5 or 6 years messing around (or "working" as I have to call it in order to get paid) with Linux in various forms - mainly Ubuntu, but quite a bit of its daddy Debian, a bit of Suse and a smidgen of CentOS.

Anyway, I have come to my own conclusions as to how to work with Ubuntu that give you the power to be able to fine tune things how you like, to mix "official" code with downloaded source and to mix both local and server-side resources without having to compromise easy updates and portability between machines. The way I have sorted things out also works on machines where you are not an administrator and just have a user account.

Anyway, over the next few posts, I am going to share this way of working with you, the internet. Now, I'm sure some of you reading this (if indeed anyone reads this) will just think "Oh that's obvious, why bother writing that down?". Well, many things are obvious with hindsight. I can tell you that I tried out a lot of other "obvious" ways to get stuff working and they all ended up causing me unnecessary aggravation down the line. Also, some of these tips are quite closely related to the IT setup we have here at the University of Surrey.

Anyway, without further ado, here is a summary of how to work in Linux

  1. Alter the base system as little as possible and document all changes you make
  2. Use environment variables set in .profile to apply system tweaks
  3. Use bin lib and include directories in your home directory to manage your own files
  4. Use a src directory to manage your source files
  5. Use share in your home directory to manage things you install but don't build from source
  6. Keep your transient data separate from your "real" data
  7. Use an automatic file syncing service such as Dropbox
I may have to revisit this post to add back in any things I have forgotten...

Anyway, I will go over all of those points in more detail in subsequent posts, and having laid down the groundwork I will then do some over-arching posts explaining how I manage things like python, building my own code, etc.