Tuesday 1 November 2011

Setting up Python

So far we have looked at creating a home directory with it's own bin, lib and include directories. Now we will cover the last few steps required to get Python working properly.

Assuming Python is already installed on your system, you actually only need to create one directory and a single file. First create a folder called python in your ~/lib folder. Then, in your home directory you can should create a file called .pydistutils.cfg with the following contents:

[install]
install_lib = ~/lib/python
install_scripts = ~/bin

What this file does is it tells setuptools that it should put libraries in ~/lib/python and executable commands in ~/bin. Why do we want setuptools to work? Because setuptools is the software suite that includes easy_install which is the way to install python packages.

So try it - run easy_install scikits.learn

So there you go. Now you can install python packages quickly and easily to your home directory (and to remove it you can just run easy_install -mXN sckits.learn and then rm -r ~/lib/python/scikits.learn*)

And what about your own python code? Well, the way I work is to put my source code into my ~/src and then make a symbolic link from ~/lib/python to the packages I want to use. That way my source code is in a sensible place for source control and Python can find it. Also if I want to make some major changes to my code I can replace the symlink with a real copy of my source tree and then keep using the old version of the code while I torture the new version. In fact, probably the "proper" way of working would be to use something like distutils to copy your code from your source tree to ~/lib/python, building any C extensions and running any unit tests at the same time.

I will leave you with one final thought - I think that actually the "really proper and up-to-date" way to do things is to use virtualenv and pip to create a local virtual environment and populate it with python packages, however I haven't had time to look at them yet and I am only writing about things I know. When I've had a chance to look at them I'll update / replace this post with a writeup.

Monday 31 October 2011

~/bin ~/lib and ~/include

I'm taking a brief interlude to explain fully what is going on with this setup. After this I will get going on actually using the system and (quite importantly) setting up python with easy-install etc. But first a little exposition on our motivation...

One thing that makes Unix different to Windows is that places matter. In fact this a pretty deep difference whose knock-on effects account for many of the differences between Windows and Unix-style systems. To simplify horribly, it's all to do with how the Operating System knows how to find things. Specifically header files and libraries, and even more specifically, libraries that are loaded at run time: dynamically loaded libraries ("dll"s) or shared object libraries ("so"s).

In Unix based systems, the os looks for so files and header files in certain "well known" places. Here I am using "well known" in a specific sense, it means that anyone writing software for a Unix system can rely on these folders / files existing. So when Unix needs a configuration file, or a program file or a library, it knows where to look - in one of these "well known" places. An example of this is /etc where all the configuration files live.

In Windows based systems, the files can be anywhere. There are certain "well known" places (such as c:\windows\system32) but other than that pretty much anything goes. So how does windows find things? Well, you can put your files anywhere, but you need to record where you put the file in a central location known as The Registry. So when Windows wants a file, it looks it up in the registry and then knows where to go and get it.

Note: anyone with serious in-depth knowledge of current os internals will be tearing their hair out and screaming in frustration at this out-of-date over simplification of things, but hopefully they will admit that it is a simple explanation of the basics. If not, feel free to correct me in the comments.

Now, when you are developing software in windows, you are probably using Microsoft Visual Studio. If not, you are probably using one of a handful of well known IDE's that is at least compatible with MSVS. This means that when you install a library that you want to use in your program, the installer can tell the registry where the library is, and it can tell MSVS where the headers are. Then in your project you select the headers you want to use, specify the library you want and bingo - it all compiles and runs.

Now Unix-based systems are different. For a start, there are more development environments than you can shake a stick at, and none of them use the same configuration. This is partly because of the way Unix works. You see, nothing needs to tell anything where to look, everyone knows that for a header file you look in /usr/include and for a library file you look in /lib or /usr/lib. So there is no need for a common configuration. Also for a distribution of Unix pretty much everything will use the same compiler - gcc. So if you configure gcc via environment variables, that will work whatever IDE you use.

So where am I going with all this - well, this was all well and good when the computer you used was administered by a sysadmin, who would make sure that all the libraries you needed were installed on the system, but nowadays you are most likely the admin of your computer.

"Great" you say, so I can just put the library I want into /usr/lib and /usr/include and everything will work!

Well, yes and no. What if you can't put things in /usr/lib, or what if you don't want to?

Why wouldn't you be able to? If you are working somewhere where they don't (for security reasons) give you admin rights to the machine, then maybe you can't write to those folders.

Why wouldn't you want to? If you are running a Unix distribution such as Ubuntu, you will have loads of tools for installing libraries on your machine (such as the fantastic Synaptic and Aptitude), and they will automatically put all the right files in the right places. So if you want to install a library and it comes in an Ubuntu package then everything is hunky dory. But what if the library isn't packaged? What if you want a more recent version of a library. Most advice on the internet will tell you to use sudo to put the files in /usr/lib but this is a bad idea. You can put stuff there and it will work, but Ubuntu doesn't know about or understand the changes you have made. So when a new version comes out Ubuntu won't be able to update things properly. It won't be able to clean up properly. In fact a million and one things could wrong down the line.

So I suggest avoiding the issue entirely.

The way to do this is to create your own "well known" places that you will look after yourself. These will live in your home directory and will act just like the real "well known" places except they will only ever be edited by you and it will be up to you to keep them clean and tidy. They will have precedence over the real places, so if you put some_program in ~/bin and there is already a version in /usr/bin, then the version in ~/bin will get run. This means you can install a library on the machine using Synaptic, and then also use a more recent version for your own code.

So how do we do this? If you follow the instructions that I have been giving you then you are already set up and ready to go!

But now you know why you were doing it!

Monday 26 September 2011

Looking for nicely formatted bibtex for your citations...

I've been doing another paper trawl to put together a document for a project I am working on, and came across this nice website http://www.pubzone.org.

Just search for the title of the paper you want and then click on export bibtex. It gives you some of the best formatted bibtex I've come across even going so far as to include the "proceedings" entry that accompanies an "inproceedings" entry.

Tuesday 20 September 2011

Working with .profile

Most of the magic that goes into making things work nicely goes in the .profile file in your home directory. In this file you can tell linux how you have arranged your files and how you want to work. Setting this up correctly will make things a breeze. For those who do not know .profile is a file that contains a bunch of shell script commands that are executed every time you start up a shell.

Firstly - why .profile? Why not .chsrc or .bashrc or .login or any of the other files? Well, the reason is simple - .profile gets executed in such a way that it affects all programs that you run in Ubuntu, be it from a command-line terminal or from the GUI. If you are using a different flavour of linux then you may have to look at how the startup process runs to find the best place to set environment variables, but .profile should work with all types of linux.

So, to get things going here is my .profile:

# ~/.profile: executed by the command interpreter for login shells.
# This file is not read by bash(1), if ~/.bash_profile or ~/.bash_login
# exists.
# see /usr/share/doc/bash/examples/startup-files for examples.
# the files are located in the bash-doc package.

# the default umask is set in /etc/profile; for setting the umask
# for ssh logins, install and configure the libpam-umask package.
#umask 022

# if running bash
if [ -n "$BASH_VERSION" ]; then
    # include .bashrc if it exists
    if [ -f "$HOME/.bashrc" ]; then
	. "$HOME/.bashrc"
    fi
fi

CHECK_64=`uname -a | grep x86_64`


if [ -n "${CHECK_64}" ]; then
    export ARC_POSTFIX=64
    export ARC=linux64
else
    export ARC_POSTFIX=32
    export ARC=linux
fi

export C_INCLUDE_PATH="${HOME}"/include
export CPLUS_INCLUDE_PATH="${C_INCLUDE_PATH}"
export INCLUDES=-I"${C_INCLUDE_PATH}"

export RAVL_INSTALL="${HOME}"/
export PROCS=4
export PROJECT_OUT="${HOME}"/.ravl_out

export LIBS=-L"${HOME}"/lib/
export LIBRARY_PATH="${HOME}"/lib:"${PROJECT_OUT}"/lib
export LD_LIBRARY_PATH="${LIBRARY_PATH}"

export PATH=./:"${PROJECT_OUT}"/bin:"${HOME}"/bin:"${PATH}"

export PYTHONPATH="${HOME}"/.ipython/:"${HOME}"/lib/python/

export ASPELL_CONF="master en_GB"
export GREP_OPTIONS=--exclude-dir=.svn
export OSG_FILE_PATH="${HOME}"/share/OpenSceneGraph-Data
export CMAKE_PREFIX_PATH=${HOME}

OK, so that looks pretty complicated, let's break it down and see what we are doing. Firstly, all the stuff up until the first export statement is just the standard stuff that gets put in .profile by Ubuntu. So we can ignore that. Lets now look at the export statements and see what they do.

export C_INCLUDE_PATH="${HOME}"/include
export CPLUS_INCLUDE_PATH="${C_INCLUDE_PATH}"
export INCLUDES=-I"${C_INCLUDE_PATH}"

These statements are setting up various compilers to automatically look in ~/include for header files. Now anything we put in ~/include will automatically get picked up by pretty much any build system that uses gcc.

export RAVL_INSTALL="${HOME}"/
export PROCS=4
export PROJECT_OUT="${HOME}"/.ravl_out

These next statements are specific to RAVL (a computer vision library we use here at the University of Surrey). RAVL is set up to build to wherever the $PROJECT_OUT environment variable points. Here I am setting it to a hidden directory, and I am setting it here so I can reference that value later.

export LIBS=-L"${HOME}"/lib/
export LIBRARY_PATH="${HOME}"/lib:"${PROJECT_OUT}"/lib
export LD_LIBRARY_PATH="${LIBRARY_PATH}"

These are some of the more important statements in .profile. These are telling the compiler and the runtime system to look in ~/lib for libraries. By setting LD_LIBRARY_PATH we are telling the system to look in ~/lib before it looks in /usr/lib. Now there is a potential security risk here which is why many websites will recommend that you do not use LD_LIBRARY_PATH (the risk is that it is easier for malicious code to get into ~/lib than into /usr/lib). Personally I think the risk is higher if you are constantly copying things in to /usr/lib, but this is something to be aware of. Another issue here is that some of the startup files in the X11 system (specifically the ones to do with ssh) strip LD_LIBRARY_PATH from the environment at load time precisely because of this security issue. If you are happy and understand the risks then you can go here to see how to fix that. Otherwise you will just have to start things from a command prompt to be able to run your code nicely and easily.

export PATH=./:"${PROJECT_OUT}"/bin:"${HOME}"/bin:"${PATH}"

This line sets the path which is where linux looks to find commands to execute; useful for running your programs! Here I am setting it up to run stuff I compile with the RAVL QMake build system, and stuff I compile to ~/bin. Note it is important to include the default ${PATH} variable on the end otherwise you won't be able to run anything installed on your machine!

export PYTHONPATH="${HOME}"/.ipython/:"${HOME}"/lib/python/

This line sets the PYTHONPATH. If you are using python you will find this invaluable in addition to setting up the .pydistutils.cfg which I will cover later. These files tell python where to look for stuff you have installed (and in the case of .pydistutils.cfg - where to install stuff)

export ASPELL_CONF="master en_GB"
export GREP_OPTIONS=--exclude-dir=.svn
export OSG_FILE_PATH="${HOME}"/share/OpenSceneGraph-Data
export CMAKE_PREFIX_PATH=${HOME}

These final options are just a bunch of application specific settings - setting the Aspell dictionary, telling Grep to ignore Subversion files, telling osg where to look for data files and telling CMake where to look for stuff.

So now you should have all the ingredients to work from your home directory, both building your own code, 3rd party apps, and bleeding edge projects whose source you have downloaded off the internet. Next we will look at  building some of our own code and a project from the internet.

Friday 16 September 2011

Working with Linux: Guiding Principles

In this post I will layout the basic principles that I have come to rely upon when working in a Linux environment to keep my data and code clean and nicely integrated. They may seem very simple and obvious to those who have worked with Linux for any time, but a) I don't see everyone else following them and b) coming from a Windows background it took me a while to work out what was important and how to manage things.

1st. Do not interfere with the base system. The bases system in this case is everything outside of your home directory. And by not interfering I mean do not add / remove files or edit configs other than using the distribution's standard UI (and try not to edit configs in /etc at all if you can help it). So for example, using Ubuntu, I will only put programs into /usr/lib/ using apt-get. This goes for svn builds of up-to-date libraries I am using - no "sudo make install" for me. It took me a while to realise how important this is for smooth operation of a machine. At first I just shoved my built code and 3rd party (not yet packaged) stuff into /usr/lib. I soon hit versioning problems and things became a mess. It also makes it hard to tell whether your code will run on another machine or not.

2nd. Document your changes. Rules are made to be broken, particularly the rule above - you will of course encounter some 3rd party lib that will only work if it is copied to /usr/share/whatever with a link in /etc, and if you want to use it you have to break the 1st rule and put it where it asks to be put. The thing is a) this should be rare and b) you should make a note of what you have done.

So how do we implement these policies in practice. Well, for the first rule you create a bin, lib, include and share directory in your home drive. With hindsight I think I should have created an opt directory too, so if you are starting from scratch do that. Then everything that would go in /usr/lib goes in ~/lib and so on for the other dirs. I have tried using /usr/local in a similar manner, but to be honest that worked out as more hassle than it was worth. Also, using your home directory will work on machines where you do not have admin rights or where your home dir is on a network share and accessed by multiple machines.

As for the second principle, I recommend keeping a text file on Dropbox or some other file sharing service (like UbuntuOne if you can get it to work). This document should also record config changes you have to make to get hardware working etc. That way when you need to undo these hacks you can easily see all the changes you have made. When you do this you will find that updating 3rd party libs is easy as you can just clean out the old version manually (or check that the automatic cleanup worked - it's quite amazing how many poorly written uninstall scripts don't remove everything they put there[1]). Also, if you suddenly want to work on another machine (got a new laptop or desktop machine) you know what changes you need to make to get things up and running. Or, when you have a catastrophic harddrive failure (believe me, they happen) you can get back up and running a lot faster. And finally, if you keep a note of all those config hacks you have been accumulating over time, you can try removing them when your distribution upgrades so you don't accumulate crud in /etc and can take advantage of improved services as they become available. This is particularly applicable to laptops - when you get a new laptop you often end up rewriting /etc to try to get wireless / mouse / soundcard working. In 12 moths stuff will work out the box, but unless you go back and restore the config files, your distribution will keep hold of the manual changes you have made, often with negative consequences.

While you are doing this, you might want to keep a list of the packages you have installed on the machine. This way when you change machines you can do so easily and quickly. In fact, if you can keep much of these documents as python scripts you are really onto a winner because then restoring (or moving) your entire work environment can be as easy as copying over your home directory and running a couple of scripts!

So those are the principles - don't go hacking around in the bowels of your system and keep a record of what you do. Pretty simple eh? Next post I will look at how we actually put these into practice and show you how to organise your work in some sensible directories and how to set up an environment so everything just works using the magic of .profile.

[1] - If you are writing something to install on a user's machine, write out an installation manifest to /var/lib/libmystuff/installation.manifest or somewhere else sensible and write to that every change you make to the user's system. Then people can do a clean uninstall of libmystuff3 even if they have deleted the installation files for libmystuff3 and already got hold of libmystuff4. You could even check for the file and do an automatic cleanup when installing a newer version!

Tuesday 6 September 2011

How to work with Linux

The beautiful thing about Linux is that there are a hundred ways to do everything. The horrible thing about Linux is that there are a hundred ways to do anything!

I have spent the last 5 or 6 years messing around (or "working" as I have to call it in order to get paid) with Linux in various forms - mainly Ubuntu, but quite a bit of its daddy Debian, a bit of Suse and a smidgen of CentOS.

Anyway, I have come to my own conclusions as to how to work with Ubuntu that give you the power to be able to fine tune things how you like, to mix "official" code with downloaded source and to mix both local and server-side resources without having to compromise easy updates and portability between machines. The way I have sorted things out also works on machines where you are not an administrator and just have a user account.

Anyway, over the next few posts, I am going to share this way of working with you, the internet. Now, I'm sure some of you reading this (if indeed anyone reads this) will just think "Oh that's obvious, why bother writing that down?". Well, many things are obvious with hindsight. I can tell you that I tried out a lot of other "obvious" ways to get stuff working and they all ended up causing me unnecessary aggravation down the line. Also, some of these tips are quite closely related to the IT setup we have here at the University of Surrey.

Anyway, without further ado, here is a summary of how to work in Linux

  1. Alter the base system as little as possible and document all changes you make
  2. Use environment variables set in .profile to apply system tweaks
  3. Use bin lib and include directories in your home directory to manage your own files
  4. Use a src directory to manage your source files
  5. Use share in your home directory to manage things you install but don't build from source
  6. Keep your transient data separate from your "real" data
  7. Use an automatic file syncing service such as Dropbox
I may have to revisit this post to add back in any things I have forgotten...

Anyway, I will go over all of those points in more detail in subsequent posts, and having laid down the groundwork I will then do some over-arching posts explaining how I manage things like python, building my own code, etc.