bad software engineering – made easy

Often I find myself suffering from a chronic bout of “project fatigue” – I made this term up off the top of my head so don’t go to your doctor asking about it πŸ˜‰ . However, it’s not the symptoms of this illness that concern me, it’s the consequences of getting it. So what is this “project fatigue”?

You may be one of the lucky ones never to have experienced this, but I know my close friends who also work in the programmer trade have suffered. It can onset at any time. It’s characterised by a morning or afternoon of doing anything but typing code into the computer, which can then be followed by days (or weeks) of doing absolutely nothing related to the work you’re supposed to be doing. And it all happens because you got tried/bored/frustrated/stuck with the project you are working on.

When you first start a project (or a new job for that matter) it normally has a euphoric feel to it. You charge in head first, and it’s brilliant, you are learning all these new things. You come in every day and after the obligatory email and quick favourites check you are straight on to coding away like a demon (or daemon). You’re thinking that “if I’m this productive and can get so much done, I don’t understand why it takes so long to finish things”. What you have forgotten about is THE FATIGUE.

So after maybe a month of working away night and day, it hits. Myself, I normally find it starts when I find out that the way I was doing something doesn’t seem to work. This inevitability means that some of the code you have written is now useless, and other stuff you have done really should be changed to accommodate the new design. To generalise, I would say that it causes you to do ANYTHING just to get the project done. This maybe because you either don’t have the time to let the fatigue pass, or that you are just so desperate to get off the project that anything will do.

What this means is that where before you were creating a design of your code to die for, you now just make sure that it “works”. This is where it becomes so very easy to do some bad software engineering.

For example, you think “I should really change that class/method to such and such” but you also are thinking “well I could just hack round it by doing something not so nice, as that is bound to be quicker”. When you start thinking like this you should ALWAYS make sure that you do the former rather than the latter, as much as it pains you to do so. Why? When you come back to looking at the software for a later version you will be glad you done it, and also it never takes as long as you think to do it the correct way.

So how can we avoid the fatigue? I really don’t know, maybe YOU have some suggestions? What I tend to find helps is leaving the code for a day or so and not think about it. Why not write a script! Obviously this “leaving it for a day or so” may not sit with your employer so well, but if he only knew that it would mean less bugs in the long run then it may be a different story. I dunno, as this line of thought doesn’t always work.

What I think is a far better option is to have good mentor. As I would imagine that about 98% of the fatigue is caused by a lack of motivation to finish or that you are stuck on a certain problem. Talking to someone about the work and resolving points that you are stuck on can be empowering. You can often be even more productive than just before the fatigue set in after a simple chat with someone – I often experienced this after a meeting with my supervisor when doing my PhD. It’s important to have the right person doing the mentoring though – as there are people that can make the situation worse by just plain confusing you or demoralising you with their “superior” knowledge.

So in the style of a counselling group, my name is Gregg and I’m suffering from project fatigue. What are your suggestions?

perl is dead, long live…perl?

When I started writing this article it was going to be about the choice of language people use when they create a quick dirty script for say a one off task. For this type of thing I tend to find myself using Perl and since I thought Perl was maybe a little old in the tooth, I wondered what the “cool kids” were using these days. However, this got me thinking more about the benefits of such scripts both to the programmer and their employer.

Often I find myself needing to process/generate a file in some way. To do this kind of task I feel that Java, C#, C++ etc are just waaaay too heavyweight for the task – my default reaction is that you need an interpreted language. For me this is Perl. I know PHP pretty well but it just never enters my head to use it for a command-line task, Python I kind of know but not very well, and Ruby again I just don’t know well enough, but would probably only consider this an option for a web app like PHP – no reason why I only see Ruby and PHP as web app not command-line options. So what do people use for this kind of stuff? I’m interested.

Anyway, this line of thought led me to the following observation: writing small scripts to do simple file manipulation/generation and system tasks makes you a better programmer. Even if the task you are trying to complete seems small and probably a one off, I say “write a script”. Why?

First, this may give you the opportunity to stray away from the heavyweights like Java that I mentioned above, and learn something new (the more you know the better programmer you are. Right?). Your boss may not like the idea, but the thing you are doing is work related so you can tell him it will save time, which I promise you in the long run it will. Even if you find you run the script only once you will find the learning process will have stood you in good stead.

Second, it’s an ideal way to break the monotony of your everyday work cycle. You may have been working on a particular project for months and, as happens, you are beginning to hate the thought of even looking at the code never mind write more of it. So, if you think of a script that will help you in some way in your day to day work, and will benefit both you and the company, then write it. Not only does it break the monotony but when you finish it there is also a sense of achievement that you feel by actually completing something. This sense of purpose is then reflected back into the main project; I mean you want that sense of achievement again. Right? So everyone wins.

Third, well surely the above two are more than enough reason to do it. If not tell me some more πŸ™‚ . You can always turn the little script writing into a competition within your work, i.e. who can come up with the best/most useful script! Don’t worry about finding these types of scripts to write – I generally think of a couple everyday. Just look around you and you will be amazed at what you will find.

Over and out.

practically np

Often I find that people have the opinion that the class of NP-complete problems, as discussed in my previous post, in general, don’t show up in practice. And if they do show up, then they normally represent a restriction on the problem that we can solve easily. This line of thought though is unfortunately untrue πŸ™ .

An example of a problem that has significant practical consequences, yet is NP-hard, is that of finding a maximum size stable matching in the context of what is known as the Hospital/Residents problem with ties.

In the Hospital/Residents problem (HR), we seek to assign a set of student doctors (residents) to hospital posts (a hospital may offer more than one post) – this task is carried out in many countries in a student’s first year after graduating from medical school. Each student is asked to rank a set of hospitals in strict preference order, with the hospitals ranking the set of students also by preference. However, in many schemes, hospital’s preference lists are allowed to contain ties. That is, a hospital is allowed to be indifferent between a set of students – this generally produces a more realistic set of preference lists, as otherwise the hospital may typically have to rank several thousand students individually. We note that this description of the problem is not complete but I’m describing a known practical situation here, not the general case of HR with ties, where a student’s preference list may also contain ties.

So what is a stable matching? A matching is stable in the context above if there is no student s that strictly prefers a hospital h to his current assignee h' (or s is unmatched), and h is either under-subscribed or prefers s to at least one assignee. I think an example will better illustrate:

students       hospitals
s: h h'         h: s (t r)
t: h h'         h': r s t
r: h h'

The above states that student s prefers h to h', similarly for students t and r. For the hospitals, each is offering 1 post, and hospital h is indifferent between t and r and prefers s to both of them, with h' preferring r to s to t respectively. Now the matching {(t,h),(r,h')} is not stable. Why? Because s is unmatched and would rather be assigned to h than be unmatched, and h prefers s to t, so both s and h would get better off if h reject t and took on s. In this example a stable matching is {(s,h),(r,h')}.

Mmmmm maybe the last bit was a little complicated but I just wanted to give a feel for why we desire a matching to be stable.

As I mentioned earlier the problem of finding a maximum stable matching in this context is NP-hard. So why is this a problem? Well obviously we want to match as many students as possible, if not these people will not get a job, so this is very important to them. Also, clearly we want a matching to be stable for otherwise you end up with chaos as students and hospitals realise that they would be better off if they just switched. This is exactly what happened in the US prior to establishing a system (NRMP) that was based on stable matchings way back in the 50s.

However, despite its importance as a problem we are stuck! If we want to deal with ties (and also if we have no ties but “couples” who want to submit joint preference lists) we are in the realms of using heuristics. What this means is that we are highly unlikely to find a solution that maximises the number of students matched. This may not sound bad to you, but to those involved in the matching scheme, it could affect the rest of their life. That said, even a stable solution less than optimal size (it’s easy to find such a solution in polynomial-time) is more acceptable than a solution that is not stable.

Anyway, there you have it, a practical NP-hard problem. Also, there are many others. If this was of interest then I can maybe write up something about the Kidney Exchange Problem – this concerns the organisation of kidney transplants from live donors and it is known that the most important practical variants of this problem are NP-hard. This is in every sense of the word a life or death problem.

NP-completemess

Well yesterday I read a blog post by Jeff Atwood over at www.codinghorror.com about NP-completeness. In the comments to his post I felt that Jeff got a somewhat raw deal with the criticism he received. Ok, the details of his post were not exactly accurate to the nth degree, but his post may have identified a problem set that some of his many readers may never otherwise have been aware of. This has got to be a good thing, right?

So anyway, with a little experience on my side with these NP-complete problems maybe I will try to shed some further light on the details. Remember though, this is quite a complicated area and many books have been written about this stuff and many will save a whole chapter just as an introduction. Thus, I will try to keep this as light as possible and hopefully the detractors can fill in their own details where I leave them out.

First, one of the biggest questions in the field of computer science is does P=NP. For those that wish to pursue research in this area, there is the Millenium Prize Fund offering a cool $1million for a proof either way of this question. Those living in the UK will be hoping that it’s paid in dollars with the piss poor exchange rate πŸ˜‰ .

Now what is P? Well P is the set of problems that can be solved in polynomial time. That is, a problem where an algorithm is known to take O(nk), where n is the size of the input and k is some fixed constant; fixed meaning it doesn’t vary with the input size. An example of such an algorithm is Quicksort, whose complexity is O(n2).

So what is NP? Well first, NP does NOT stand for Non-Polynomial, if you want to take anything for this just take that, as it will save you embarrassment when you say it is (sorry I’ve seen this so many times). It actually stands for Non-deterministic Polynomial. Now when is a problem in NP?

For a problem to be in NP, the problem must be a decision problem. See, this is where it escalates trying to describe this stuff, as you now have to define a decision problem for all this to make sense. Anyway, a decision problem is a problem where the answer can be given as a simple ‘Yes’ or ‘No’. For example, given two cities is there a direct plane route between these cities. Clearly the answer is Yes or No. For a more computing/maths related example we can say that given a graph G, does there exist a path between two vertices, say u and v, in G.

Now that we know what a decision problem is, we can now describe what the class of problems in NP look like. Mmmmm, if only. In actual fact, I now need to describe what a non-deterministic algorithm is. Sigh. Let’s keep it simple if not 100% accurate. A non-deterministic algorithm has two phases. The first phase guesses an answer (certificate) to our problem and the second phase verifies that this certificate satisfies the constraints of the problem in polynomial-time. For example, consider the problem of matching 3 sets of people (known as 3D matching). Let’s name these people men, women and dogs, with each set having size n. So what we want is a set of (man, woman, dog) triples, such that each man, woman and dog appear in exactly one triple and the matching is of size n, i.e. everyone is matched. So the decision problem here is: given such an instance does there exist a matching of size n? The non-deterministic algorithm is: guess a set of (man,woman,dog) triples, then first verify that our set of triples is of size n, by counting the triples, and then to ensure that it is a matching, we can check that no two triples have someone in common.

Phhhhewww, that was longer than I imagined, see why Jeff’s abbreviated version may be better if not so accurate. I mean I have not even described what an NP-complete problem is yet. We can however describe what the class of problems in NP now are.

The class NP is defined as the collection of all decision problems that are polynomial-time solvable using a non-deterministic algorithm. We note here that many of the problems we see in everyday (programming) life are defined as optimization problems rather than decision problems – an optimization problem is one where we seek to maximise or minimise some property, for example the Travelling Salesman Problem. However, it is known that all optimization problems can be re-written as a decision version of the same problem.

So can we now define what an NP-complete problem is? Mmmm just about. In fact, I will state it here and then describe the one further bit of information we need. A problem X is NP-complete problem if:

  1. X is in NP;
  2. X is NP-hard.

Often you hear people getting confused by NP-complete and NP-hard. They are different things, an NP-hard problem need not be NP-complete, basically NP-hard problems are at least as hard as those problems in NP. Also, if a combinatorial decision problem is NP-complete we say that the optimisation problem is NP-hard.

So what does it mean for a problem to be NP-hard? Well to show that a problem is NP-hard we must show that it is polynomial-time reducible to an NP-complete problem (in fact, to be precise, what we have to show is that it is reducible to ALL NP-complete problems, however due to the transitivity of reductions we need only show it’s reducible to a single NP-complete problem). That is, we need to show that we can reduce an instance of one problem to the instance of another problem, i.e a Yes answer in one problem maps to a Yes answer in the other and vice-versa, similarly with the No answers.

Now if only we had one problem that we know is NP-complete. Well we do! This was shown in Cook’s Theorem. So now all (and I say all very lightly) we have to do to show that our problem is polynomial-time reducible to satisfiability, or some other NP-complete problem – satisfiability (SAT) is the problem Cook showed to be NP-complete. Another (classic) example of an NP-complete problem is the 3D matching problem example given above.

And there you have it, NP-completeness in just over 1000 words, although I missed a few details out I hope this is useful to someone. Incidentally I have meet Stephen Cook (ala Cooks’ theorem) before and I’m pretty sure from experience he would struggle to tell you this in under 1000 words πŸ˜‰ Actually maybe not………………..

Over and out.

more json and java

My post yesterday was actually intended to be a placeholder where I was planning to include a build of Douglas Crockfords Java JSON library that I have been using on some recent projects. However, I seemed to get carried away while writing and, as you can see, ended rabbiting on more than I intended. The result of this was that by the time I got to the end of it, I could no longer see where my original point fitted in to the discussion.

For the record, there are other Java JSON libraries (http://json-lib.sourceforge.net/, and http://oss.metaparadigm.com/jsonrpc/ to name two) but these are way way way too complicated for a simple application, and/or also have far too many dependences. The files in the jar file included at the end of this post work out the box, no need for other libraries.

So, anyway, today I thought I may as well get round to putting this jar file up on this site . This is not something that you can’t get elsewhere but I wanted it easily available for ME. Also, if like me, you are happy just to get a jar file, and really don’t want the hassle of building the (small) library, then you can just get it from this ere blog; at the time when I built this I could only seem to find the source, hence why I thought I may as well build it myself and host it. The javadoc for the code can be found at http://www.json.org/java/ and the link to the Java jar file I created can be downloaded using this link. If anyone has any issues with me hosting it then let me know and I will take it down, otherwise enjoy it.

Java JSON Library jar file

json is where it’s at

Well people I must say that I have come to LOVE JSON quite a bit in the last few months. Having used in on several projects both in work and in play.

However, not so long ago, I used XML for my configuration and data passing needs. I have always pretty much hated (YES! I said it in public) XML. Everything you do with it just seems loooooooong and verbose; a bit like the applications that (over)use it.

I mean we could just abandon structure altogether in a document and save time by not tagging anything, but where does that get us? Nowhere I imagine. It’s nice to have structure, just not an overwhelming amount that XML provides, especially when you only need it for are some simple config files, or sending messages between processes. XML is especially wasteful of space when you have a rather large file with a structure like below, multiplied by 10,000:

    <person>
       <id>1</id>
       <pref>8;3;4;5;6;7;9;10;11;13;17;18;19</pref>
    </person>

The above is data basically representing a list of choices, in strict order, for a person with id 1. So, person 1 prefers person 8 to person 3, person 3 to person 4 etc. To me it was better than what people used before XML, and let me tell you is still used now, which is:

1 3 4 5 6 7 9 10 11 13 17 18 19

Yes this is simple, clean and has no “spare” characters being used. However, when you have a problem with person number 7868, it makes debugging a little harder. It also makes changing person 7868’s prferences also very difficult.

With JSON this can be done as follows:

{ 'id' : 1,
  'prefs' : [1,3,4,5,6,7,9,10,11,13,17,18,19]}

It just sits as a very nice compromise between the latter schools of thought. I realise that this may seem very specific but I’m certain that it’s true in general. Not only is it less verbose, it is also easy to integrate with JavaScript on the client side of any web app, and for me with a little nurturing JavaScript is gonna be where it is ALL at it a few years time.

So the next time you go to use XML in a project, have a think first and see if JSON is your new friend.

regular expressions saved my life – again

Right, so, I talked in my last entry how the wonders of regular expressions had saved my life, and therefore filled the world with utter joy. Β Once again a few days later I find myself faced with a similar problem, and you guessed it, regular expressions saved my life AGAIN.

The problem is essentially the same as before only this time I had a medium-sized database dump as a CSV file and once again I wanted to fill a Java array with the values from certain columns. For the record, as previously, this stuff was all for some JUnit tests I was running. A simplified example of what I was doing is shown below:

1
2
3
4
5
while(itor.hasNext()) {
    Student student = itor.next();
    assertTrue("Student " + student.getId() + " != " + idResults_[count],
                    student.getId() == idResults_[count]);
}

Basically I have a list of Students that I have created whose ids I want to ensure correspond to what I expect them to be. To test this I have a data set of around 250 students (I’m not really that interested in checking the ids, it more a category a student is in but the ids example was easier to show).

In the code above idResults_ corresponds to an int array that I would like to generate from a column in the CSV file. So idResults_ looks something like:

int [] idResults_ = {87868,78757,89987,......};

So how did I generate this array? Well I extended the 5 lines in my last post into a slightly larger Perl script that takes some options and spits out the array initaliser. The actual script can be found HERE. The usage for this script is:

Usage: extract.pl -f <input_file> -c <id> -[hnwisro]
        -h Show this screen
        -n Show the column names in the file
        -w Separate on whitespace (default is a comma)
        -i Don't ignore first line, i.e. it contains the names of the columns
        -s Treat the data as a string, i.e. data in generated array is in 
           double quotes, defaults to an int array
        -r Treat the data as characters, i.e. data in generated array is
           in quotes
        -f  <input_file> Input file
        -c  <id> column to include in array (can be either a number, 
            zero based, or column name)
        -o  <output_file> File array is output to (any other content will be
            over-written)
 
   Outputs a Java/C# array initaliser with values from column <id> from file
 <input_file> and send it out to <output_file>

As you can see I have extended this somewhat from my previous post into a full blow utility (useful probably only to me, but hey who cares). As you can see, instead of creating an int array from the data if you use the -s flag you can create a string array (e.g. {"Harry","Sally", "Billy"}) or a character array using the -r option. Furthermore, you can specify the column to create the array from. This can either be a zero based integer or the id of the column – this presumes that the first line in your file contains that names of the columns (ala CSV file). Also if the first line contains actual data, and you do not want it to be treated as the column names, then you can specify the -i option to choose NOT to ignore the values contained on this line.

Well that’s it. Hope someone else finds the script useful. Over and out.

regular expressions saved my life

A little note to all those programmers who have not taken the time to learn how to use regular expressions: DO IT NOW. I think learning Perl and regular expressions while I was working at Cisco was almost the best thing I took from that job. You never realise how useful it is to know until the day you are presented with a rather large text file and you have to extract some of the data. This is what happened to me last week; and it ain’t the first time either.

The file I was looking at was around 250 lines long and consisted of three tab separated numbers on each line, of which I was interested in only one of these numbers at a time. The idea was that I was trying to generate an array in Java initalised with these numbers, e.g:

int [] = {2,3,4,5,7,8,8,2,3............}

For the record this was just for some testing I was doing. Anyway, doing this without regular expressions would have been a nightmare, as not only did each file have 250 lines but there was 5 files. Instead 5 lines of Perl:

1
2
3
4
5
my $ar = "{";
while($text =~ /^\s*(\d+)\s+(\d+)\s+(\d+)\s*$/gm) {
    $ar .=  "$2,";}
chop $ar;
$ar .= "}";

done it in a flash. Now what to those people who do not know how to use regular expressions do in this situation?

They should teach more people this kind of programming at university. I’m all for learning the theory behind things, but there does have to be a better theory/practical split in my opinion. Maybe even a course called “Practical Computer Programming”, where they teach things like regular expressions, debugging, memory management (yes even for Java programmers), design patterns, useful data structures, etc. If anyone wants to pay me a tidy sum to teach it then your wish would be my command πŸ™‚