NP-completemess

Well yesterday I read a blog post by Jeff Atwood over at www.codinghorror.com about NP-completeness. In the comments to his post I felt that Jeff got a somewhat raw deal with the criticism he received. Ok, the details of his post were not exactly accurate to the nth degree, but his post may have identified a problem set that some of his many readers may never otherwise have been aware of. This has got to be a good thing, right?

So anyway, with a little experience on my side with these NP-complete problems maybe I will try to shed some further light on the details. Remember though, this is quite a complicated area and many books have been written about this stuff and many will save a whole chapter just as an introduction. Thus, I will try to keep this as light as possible and hopefully the detractors can fill in their own details where I leave them out.

First, one of the biggest questions in the field of computer science is does P=NP. For those that wish to pursue research in this area, there is the Millenium Prize Fund offering a cool $1million for a proof either way of this question. Those living in the UK will be hoping that it’s paid in dollars with the piss poor exchange rate 😉 .

Now what is P? Well P is the set of problems that can be solved in polynomial time. That is, a problem where an algorithm is known to take O(n<sup>k</sup>), where n is the size of the input and k is some fixed constant; fixed meaning it doesn’t vary with the input size. An example of such an algorithm is Quicksort, whose complexity is O(n<sup>2</sup>).

So what is NP? Well first, NP does NOT stand for Non-Polynomial, if you want to take anything for this just take that, as it will save you embarrassment when you say it is (sorry I’ve seen this so many times). It actually stands for Non-deterministic Polynomial. Now when is a problem in NP?

For a problem to be in NP, the problem must be a decision problem. See, this is where it escalates trying to describe this stuff, as you now have to define a decision problem for all this to make sense. Anyway, a decision problem is a problem where the answer can be given as a simple ‘Yes’ or ‘No’. For example, given two cities is there a direct plane route between these cities. Clearly the answer is Yes or No. For a more computing/maths related example we can say that given a graph G, does there exist a path between two vertices, say u and v, in G.

Now that we know what a decision problem is, we can now describe what the class of problems in NP look like. Mmmmm, if only. In actual fact, I now need to describe what a non-deterministic algorithm is. Sigh. Let’s keep it simple if not 100% accurate. A non-deterministic algorithm has two phases. The first phase guesses an answer (certificate) to our problem and the second phase verifies that this certificate satisfies the constraints of the problem in polynomial-time. For example, consider the problem of matching 3 sets of people (known as 3D matching). Let’s name these people men, women and dogs, with each set having size n. So what we want is a set of (man, woman, dog) triples, such that each man, woman and dog appear in exactly one triple and the matching is of size n, i.e. everyone is matched. So the decision problem here is: given such an instance does there exist a matching of size n? The non-deterministic algorithm is: guess a set of (man,woman,dog) triples, then first verify that our set of triples is of size n, by counting the triples, and then to ensure that it is a matching, we can check that no two triples have someone in common.

Phhhhewww, that was longer than I imagined, see why Jeff’s abbreviated version may be better if not so accurate. I mean I have not even described what an NP-complete problem is yet. We can however describe what the class of problems in NP now are.

The class NP is defined as the collection of all decision problems that are polynomial-time solvable using a non-deterministic algorithm. We note here that many of the problems we see in everyday (programming) life are defined as optimization problems rather than decision problems – an optimization problem is one where we seek to maximise or minimise some property, for example the Travelling Salesman Problem. However, it is known that all optimization problems can be re-written as a decision version of the same problem.

So can we now define what an NP-complete problem is? Mmmm just about. In fact, I will state it here and then describe the one further bit of information we need. A problem X is NP-complete problem if:

X is in NP;
X is NP-hard.

Often you hear people getting confused by NP-complete and NP-hard. They are different things, an NP-hard problem need not be NP-complete, basically NP-hard problems are at least as hard as those problems in NP. Also, if a combinatorial decision problem is NP-complete we say that the optimisation problem is NP-hard.

So what does it mean for a problem to be NP-hard? Well to show that a problem is NP-hard we must show that it is polynomial-time reducible to an NP-complete problem (in fact, to be precise, what we have to show is that it is reducible to ALL NP-complete problems, however due to the transitivity of reductions we need only show it’s reducible to a single NP-complete problem). That is, we need to show that we can reduce an instance of one problem to the instance of another problem, i.e a Yes answer in one problem maps to a Yes answer in the other and vice-versa, similarly with the No answers.

Now if only we had one problem that we know is NP-complete. Well we do! This was shown in Cook’s Theorem. So now all (and I say all very lightly) we have to do to show that our problem is polynomial-time reducible to satisfiability, or some other NP-complete problem – satisfiability (SAT) is the problem Cook showed to be NP-complete. Another (classic) example of an NP-complete problem is the 3D matching problem example given above.

And there you have it, NP-completeness in just over 1000 words, although I missed a few details out I hope this is useful to someone. Incidentally I have meet Stephen Cook (ala Cooks’ theorem) before and I’m pretty sure from experience he would struggle to tell you this in under 1000 words 😉 Actually maybe not………………..

Over and out.

more json and java

My post yesterday was actually intended to be a placeholder where I was planning to include a build of Douglas Crockfords Java JSON library that I have been using on some recent projects. However, I seemed to get carried away while writing and, as you can see, ended rabbiting on more than I intended. The result of this was that by the time I got to the end of it, I could no longer see where my original point fitted in to the discussion.

For the record, there are other Java JSON libraries (http://json-lib.sourceforge.net/, and http://oss.metaparadigm.com/jsonrpc/ to name two) but these are way way way too complicated for a simple application, and/or also have far too many dependences. The files in the jar file included at the end of this post work out the box, no need for other libraries.

So, anyway, today I thought I may as well get round to putting this jar file up on this site . This is not something that you can’t get elsewhere but I wanted it easily available for ME. Also, if like me, you are happy just to get a jar file, and really don’t want the hassle of building the (small) library, then you can just get it from this ere blog; at the time when I built this I could only seem to find the source, hence why I thought I may as well build it myself and host it. The javadoc for the code can be found at http://www.json.org/java/ and the link to the Java jar file I created can be downloaded using this link. If anyone has any issues with me hosting it then let me know and I will take it down, otherwise enjoy it.

Java JSON Library jar file

json is where it’s at

Well people I must say that I have come to LOVE JSON quite a bit in the last few months. Having used in on several projects both in work and in play.

However, not so long ago, I used XML for my configuration and data passing needs. I have always pretty much hated (YES! I said it in public) XML. Everything you do with it just seems loooooooong and verbose; a bit like the applications that (over)use it.

I mean we could just abandon structure altogether in a document and save time by not tagging anything, but where does that get us? Nowhere I imagine. It’s nice to have structure, just not an overwhelming amount that XML provides, especially when you only need it for are some simple config files, or sending messages between processes. XML is especially wasteful of space when you have a rather large file with a structure like below, multiplied by 10,000:

    
       1
       8;3;4;5;6;7;9;10;11;13;17;18;19

The above is data basically representing a list of choices, in strict order, for a person with id 1. So, person 1 prefers person 8 to person 3, person 3 to person 4 etc. To me it was better than what people used before XML, and let me tell you is still used now, which is:

1 3 4 5 6 7 9 10 11 13 17 18 19

Yes this is simple, clean and has no “spare” characters being used. However, when you have a problem with person number 7868, it makes debugging a little harder. It also makes changing person 7868’s prferences also very difficult.

With JSON this can be done as follows:

{ 'id' : 1,
  'prefs' : [1,3,4,5,6,7,9,10,11,13,17,18,19]}

It just sits as a very nice compromise between the latter schools of thought. I realise that this may seem very specific but I’m certain that it’s true in general. Not only is it less verbose, it is also easy to integrate with JavaScript on the client side of any web app, and for me with a little nurturing JavaScript is gonna be where it is ALL at it a few years time.

So the next time you go to use XML in a project, have a think first and see if JSON is your new friend.

regular expressions saved my life – again

Right, so, I talked in my last entry how the wonders of regular expressions had saved my life, and therefore filled the world with utter joy. Once again a few days later I find myself faced with a similar problem, and you guessed it, regular expressions saved my life AGAIN.

The problem is essentially the same as before only this time I had a medium-sized database dump as a CSV file and once again I wanted to fill a Java array with the values from certain columns. For the record, as previously, this stuff was all for some JUnit tests I was running. A simplified example of what I was doing is shown below:

while(itor.hasNext()) {
    Student student = itor.next();
    assertTrue("Student " + student.getId() + " != " + idResults_[count],
                    student.getId() == idResults_[count]);
}

Basically I have a list of Students that I have created whose ids I want to ensure correspond to what I expect them to be. To test this I have a data set of around 250 students (I’m not really that interested in checking the ids, it more a category a student is in but the ids example was easier to show).

In the code above idResults_ corresponds to an int array that I would like to generate from a column in the CSV file. So idResults_ looks something like:

int [] idResults_ = {87868,78757,89987,......};

So how did I generate this array? Well I extended the 5 lines in my last post into a slightly larger Perl script that takes some options and spits out the array initaliser. The actual script can be found HERE. The usage for this script is:

Usage: extract.pl -f  -c  -[hnwisro]
        -h Show this screen
        -n Show the column names in the file
        -w Separate on whitespace (default is a comma)
        -i Don't ignore first line, i.e. it contains the names of the columns
        -s Treat the data as a string, i.e. data in generated array is in 
           double quotes, defaults to an int array
        -r Treat the data as characters, i.e. data in generated array is
           in quotes
        -f   Input file
        -c   column to include in array (can be either a number, 
            zero based, or column name)
        -o   File array is output to (any other content will be
            over-written)

   Outputs a Java/C# array initaliser with values from column  from file
  and send it out to

As you can see I have extended this somewhat from my previous post into a full blow utility (useful probably only to me, but hey who cares). As you can see, instead of creating an int array from the data if you use the -s flag you can create a string array (e.g. {"Harry","Sally", "Billy"}) or a character array using the -r option. Furthermore, you can specify the column to create the array from. This can either be a zero based integer or the id of the column – this presumes that the first line in your file contains that names of the columns (ala CSV file). Also if the first line contains actual data, and you do not want it to be treated as the column names, then you can specify the -i option to choose NOT to ignore the values contained on this line.

Well that’s it. Hope someone else finds the script useful. Over and out.

regular expressions saved my life

A little note to all those programmers who have not taken the time to learn how to use regular expressions: DO IT NOW. I think learning Perl and regular expressions while I was working at Cisco was almost the best thing I took from that job. You never realise how useful it is to know until the day you are presented with a rather large text file and you have to extract some of the data. This is what happened to me last week; and it ain’t the first time either.

The file I was looking at was around 250 lines long and consisted of three tab separated numbers on each line, of which I was interested in only one of these numbers at a time. The idea was that I was trying to generate an array in Java initalised with these numbers, e.g:

int [] = {2,3,4,5,7,8,8,2,3............}

For the record this was just for some testing I was doing. Anyway, doing this without regular expressions would have been a nightmare, as not only did each file have 250 lines but there was 5 files. Instead 5 lines of Perl:

my $ar = "{";
while($text =~ /^\s*(\d+)\s+(\d+)\s+(\d+)\s*$/gm) {
    $ar .=  "$2,";}
chop $ar;
$ar .= "}";

done it in a flash. Now what to those people who do not know how to use regular expressions do in this situation?

They should teach more people this kind of programming at university. I’m all for learning the theory behind things, but there does have to be a better theory/practical split in my opinion. Maybe even a course called “Practical Computer Programming”, where they teach things like regular expressions, debugging, memory management (yes even for Java programmers), design patterns, useful data structures, etc. If anyone wants to pay me a tidy sum to teach it then your wish would be my command 🙂

get a life

Rarely do I feel impassioned about something so much that I just have to write but I have found that something! This has nothing to do with computing so those that normally ignore can now listen.

My issue is with the whole Russell Brand and Jonathan Ross thing that is plastered all over the media. I mean do people not have a life and can they not get a joke? I fuckin hate these kind of people. As much as I have championed the Russell Brand podcast on here before I’m no real lover of his stand-up material so I think I’m pretty unbiased about all this. The facts of this whole situation are, number 1, only two people complained about the show when it was aired or in fact in the first week, that means that all those who listened thought it was fine. This means that the 10,000 people who complained since decided to listen to something that they knew would offend them, yet they still listened. Should we even pander to the opinions of these “demi-gods”? I mean, they had the option of just not listening but chose to listen and be offended. Number 2, they asked Andrew Sachs, the offendee in question, if it was ok to broadcast it and he clearly didn’t say no. Number 3. Gordon Brown wants to get involved. Is the whole country not in enough of a mess without listening to this man tell YOU what to do. So the unemployment rate is rising and what does he want to do, sack two people, good one 🙂

So the crux of this matter is if any of your friends are genuinely offended by this please don’t call them your friends. Don’t fool yourself into being offended just because you don’t like Brand or Ross or are jealous of their money – that would just make you look weak. The offended people are those that want to turn this whole place into a nation of dreary shitheads. Stand up against this and tell the media where to go. If I can think of some way to stand up to these shitheads then I will. In fact, now that I think about it, I actually have a Russell Brand related domain name that maybe can be put to good use (www.fusspotfarm.com). Fusspot Farm, which is probably exactly where we should send these people.

being dynamic is overrated

I was going to post about a little problem that I experienced while undertaking a project I’m involved with but canned it thinking it was of limited interest. That was, however, until I made the same mistake again yesterday, so I thought I would document it to save my own sanity, even if there is no-one else who cares – not that anyone reads this anyway.

So to my faux pas, and believe it or not this cost me a few hours! The problem: well as part of my project (a bike shop website for those who really want to know – www.cyclewize.com) I’m programming in PHP using cakePHP. The advantages/disadvantages of PHP can be debated from dusk till dawn if that is your thing – just remember though, while you are debating all those smart people are just getting on with doing stuff. That said, my problem was, in a sense, due to the use of PHP or dynamic languages in particular.

Anyway, back on track, so the component I was working on simply uploads a set of photos and scales them accordingly. Simple as that. The problem that I kept seeing was that although the photographs were being uploaded ok, i.e no errors were being reported, I was unable to copy the files from their temporary directory over to their new home. Since no errors were being returned I was stuck as to what was happening. I battled away putting print_r’s in here there and everywhere to no avail. Then I just happened to check my call to move_uploaded_file. On inspection, I noticed I had a slight misspelling of a variable name in one of the functions arguments, thus only at that point did the variable come into existence. Therefore what was getting passed to the function was the empty string and hence move_uploaded_file could not copy the file – you would have thought it should have raised an error but no. However, as it turns out, even if it had raised an error it wouldn’t have helped me much. Why? Because straight after the call to move_uploaded_file I was doing a redirect to a new page. Therefore, any errors related to the problem http request were output to a screen which flashed by too quickly for me to notice.

So what is the root of this problem? Well if I was using a complied language it is likely that the compiler would have warned me that the variable I was passing to move_uploaded_file was either not declared or at least never been initialised. I’m pretty sure that a warning message would have been output (which I missed) by PHP’s interpreter but this is no substitute for a compile-time message. As much as I love the freedom of dynamic languages you have to be aware of this significant drawback, and people can go on about how this stuff gets picked up through thorough testing but I’m not sure that’s good enough. Also, the page redirect was a killer, as it made me miss all the output messages. It’s probably worth while not performing redirects if you find something isn’t working, just to ensure that you are seeing all the messages output by the interpreter (you can always switch it back on later). Finally, some sort of lint check should have picked this up. So from now on I think I will be running things through php –l to check that my syntax is good.

There you have it, a nice root cause analysis! I probably missed out the fact that it was my rather inane programming that was the main problem, but bugger it, I’m not really into damaging my ego.

programmer rehab

Faced with the task of learning a new programming language (and its associated libraries) I was left considering what is the best way to do this effectively. I will now describe what I do, and by no means am I advocating its use, be aware of the hidden dangers that lie with it.

My first port of call is normally to try and see if some book on Amazon is the overwhelming favourite of the masses and purchase it. I must admit though, I often get the book, flick through it, and then probably ignore just about everything in it. I do however normally remember the name of the book and the author and when someone asks me if there is a book I recommend I spout this information to them. So the moral of this is probably never to ask me to recommend a book as I’m probably grossly misinformed about what is actually in it.

Anyway, in the first week of learning the language I will try to map the syntax of the language to concepts I know, i.e. object-orientation. This probably presents the first problem: what if the language is not really object-oriented? My solution, make it object-oriented 🙂 This can prove tiresome in certain languages, and JavaScript in particular. However I’m not one for letting this type of hurdle get in the way, and tend to find a way to do this (with JavaScript this was the module pattern described by Douglas Crockford). By the second week I’m normally getting pissed off with my lack of understanding of the language’s features. I think this in part stems from the fact that my style of coding appears to be a copy-paste based system. What I mean by this is that quite often at the start I won’t even bother learning how to say define a class – I will find it out once then just copy-paste the class definition in from another class and change the name. A similar process is then used for other language features (I must point out I’m not copy-pasting the code itself though, that’s just waaay too 80’s). So those first couple of classes and library calls can be like kicking smoking the crack pipe.

After this initial push though, it tends to get easier, I replace the crack induced cold turkey with a slightly more sociable nicotine based habit, but experience constant irritation at the need to stand out in the cold street whenever I want to spark up in public. At this stage I try to abstract most things away from the language itself and think more about how I can isolate change. In other words I tend to have the thought process where I ask myself, if someone asked me to change a requirement, how can I do it with just a change to one line; oh and preferably that one line I would have to change is at the top of the file so I don’t have to search through everything to find it. My motto is that not everyone cares enough to want to learn how something works and is happy if the process is simple and easy. Me, I prefer to waste a mountain of time trying to figure out every detail of how the code fits together – I really need to give up that pursuit, rehab or something. So at this point I start to trace into every language feature, not happy with it just working – tantamount to starting chain smoking. If I get through this stage then I will normally stick with the language, many however have fallen by the wayside after two weeks. More so learning specific frameworks though, as you start thinking “God I could do this far easier if I was writing ALL the code myself”. Remember if you get these thoughts they are normally bullshit though, try to ignore them, as you ARE wrong, you just don’t know it. It’s your mind trying to trick you out of a little hard work. All you will be is a hopeless quiter!

So only after I’ve kicked the crack pipe by weeding myself on to ciggies, and then developed a massive chain smoking habit, do I get the moment of enlightenment where I dump the ciggies and tag myself clean of any irritants (well apart from all those people who tell you that you have learned the wrong language, who all should go get a life). At this point I still occasionally use my copy-paste system described above but it’s becoming easier.

As I said at the beginning of this post I’m not sure this is the best way to go about this process. The main problem that I see is that I don’t learn all the language specific features (maybe this is a good thing). In most languages I work with a standard set of ideas that I apply to the specific language. This always leaves me thinking I could get more out of the language if I only pushed the boundaries a little further. For example, I always think that I’m not quite getting as much out of things like closures, lamdas, etc. I use these things but the way people rave about them I feel there must be more to it.

So that’s it. It usually takes me about a month or so to get comfortable with the language. I then use the language exclusively for around six months and then, for whatever reason, have to shift back to an old favourite which I have forgot how to use. So I dust down the crack pipe and seek out the Rizlas and start all over again……

podcasts

Over the last six months I have taken to the ultimate geek porno, software development podcasts. I line up the podcasts on my cheap-ass USB pen come mp3 player and go out on my bike, cycling to the rhythm of MVC, jQuery, and the Cloud. It has to be said though, I have learned quite a bit from this process. Anyway, I just wanted to document the list of podcasts that I’m currently actively listening to (in no particular order, other than that I prefer the one in first place to the one in second place 😉 ).

Stack Overflow – Hosted by Joel Spolsky and Jeff Atwood, think Laurel and Hardy, only funnier. I have been listening to this since podcast 1 and I’m pretty hooked. Topics vary from software management techniques to good/bad coding principles and practices.
Boagworld – Hosted by Paul Boag and Marcus Lillington. I’ve only been listening to this for maybe a month or so now but I like it. It’s not so much software development but web design but interesting topics and interviews all the same. Also, I would swear that Marcus Lillington is Noel Fielding, he sure sounds like him. This can actually be a bit disconcerting, as sometimes I think “is this just a couple of guys taking the piss out of computer geeks” – now that would be worth listening to.
Open Web Podcast – Hosted by Dion Almaer, John Resig, and Alex Russell. This started off well but there currently has not been a podcast for nearly a month so I’m losing a bit of hope. One thing I have to say about this podcast is that it tended to go into the topics in a fair bit of depth and when out on a cycle this maybe doesn’t sink in so much. You really had to be on the ball and up-to-date with the topic to have a full understanding. Also, I’m into a bit of humor whether intended or not, and this was maybe a little too serious. None-the-less still interesting.
WebDevRadio – Hosted by Michael Kimsal – variety of stuff on this podcast, I don’t tend to listen to it religously as it can be a little off topic to my interests at times.
Hanselminutes – Hosted by Scott Hanselman and Carl Franklin. This can tend to have a strong .NET bias which has put me off a little, but can sometimes have a topic that is of more general interest.

To summarise: a top podcast has to have an interesting people, interesting topics, and humour. Alternatively, it can just have humour, and this is why the Russell Brand podcast is one of the first I listen to each week. He’s not everybody’s cup of tea but I recommend a listen, you may be surprised. The show is at its best when Matt Morgan and Noel Gallagher are both in attendance but is always worth a listen.

the same old mistakes

Well, two days two entries, not bad eh. This one will be short though, because I’m still feeling pretty tired. Which is one of the points I want to make actually. After a year of, how can I say it, managing my own time? I’m now back in the unreal world of real work. I hate getting up in the morning though. I don’t mind so much when the sun is shining but when it’s pissing it down there just feels little incentive. More so in the heart of winter when the sun doesn’t come up until 9am. The pleasures of living in Scotland. I have no issues with getting up at 10am but 9am or before is just a day breaker.

So I hear you say, what is this work I hear you talk about? Well for the next six months I will be working for Glasgow University creating/maintaining/living matching software (maybe I will write a post about what I will be doing – when I finally figure that out). I have actually been working for around a week now, bizarrely enough though I still have not really been in. That is a long story I won’t go into. None-the-less I have been doing a fair amount of work, which leads me to my point.

After a few years of (self-imposed) exile from using Java, the tempestuous love affair has been rekindled in the last week. The majority of the stuff within the department I’m working in is implemented in C# but I was struggling to find a constraint programming toolkit that could be used with C#. However, I must make the same mistakes again and again when I go back to Java. That is, without fail, I get “java.lang.NoClassDefFoundError: AppName” when running the application, despite setting the correct classpath on the command line. And without fail I sit pissing around with the command trying every possible permutation to get it to work. Then, the bell rings, and I remember this happens to me every time, and the solution presents itself (usually about 2 hours wasted).

So the problem: well I like to use cygwin as my terminal in windows, simply because it looks and feels nicer than a dirty old windows command prompt. However, the java command that I pick up on my path in the cygwin installation is the windows version, and as a consequence doesn’t understand the path separator in the classpath parameter, e.g. it understands neither java -classpath .:./lib/log4j.jar AppName nor java -classpath .;./lib/log4j.jar. With the first command the java application just thinks .:./lib/log4j.jar is the full path and on the second it treats the ; as the end of the first command. The solution? Well for me it was just to use windows command prompt, where everything worked fine. For others it will be to use the cygwin version of java (which I assume is available but have not checked). The latter means managing two version of java on my system which seemed enough work to warrant not doing it. Instead I prefer to waste half my day typing ls into the command prompt.