perl is dead, long live…perl?

When I started writing this article it was going to be about the choice of language people use when they create a quick dirty script for say a one off task. For this type of thing I tend to find myself using Perl and since I thought Perl was maybe a little old in the tooth, I wondered what the “cool kids” were using these days. However, this got me thinking more about the benefits of such scripts both to the programmer and their employer.

Often I find myself needing to process/generate a file in some way. To do this kind of task I feel that Java, C#, C++ etc are just waaaay too heavyweight for the task – my default reaction is that you need an interpreted language. For me this is Perl. I know PHP pretty well but it just never enters my head to use it for a command-line task, Python I kind of know but not very well, and Ruby again I just don’t know well enough, but would probably only consider this an option for a web app like PHP – no reason why I only see Ruby and PHP as web app not command-line options. So what do people use for this kind of stuff? I’m interested.

Anyway, this line of thought led me to the following observation: writing small scripts to do simple file manipulation/generation and system tasks makes you a better programmer. Even if the task you are trying to complete seems small and probably a one off, I say “write a script”. Why?

First, this may give you the opportunity to stray away from the heavyweights like Java that I mentioned above, and learn something new (the more you know the better programmer you are. Right?). Your boss may not like the idea, but the thing you are doing is work related so you can tell him it will save time, which I promise you in the long run it will. Even if you find you run the script only once you will find the learning process will have stood you in good stead.

Second, it’s an ideal way to break the monotony of your everyday work cycle. You may have been working on a particular project for months and, as happens, you are beginning to hate the thought of even looking at the code never mind write more of it. So, if you think of a script that will help you in some way in your day to day work, and will benefit both you and the company, then write it. Not only does it break the monotony but when you finish it there is also a sense of achievement that you feel by actually completing something. This sense of purpose is then reflected back into the main project; I mean you want that sense of achievement again. Right? So everyone wins.

Third, well surely the above two are more than enough reason to do it. If not tell me some more 🙂 . You can always turn the little script writing into a competition within your work, i.e. who can come up with the best/most useful script! Don’t worry about finding these types of scripts to write – I generally think of a couple everyday. Just look around you and you will be amazed at what you will find.

Over and out.

regular expressions saved my life – again

Right, so, I talked in my last entry how the wonders of regular expressions had saved my life, and therefore filled the world with utter joy.  Once again a few days later I find myself faced with a similar problem, and you guessed it, regular expressions saved my life AGAIN.

The problem is essentially the same as before only this time I had a medium-sized database dump as a CSV file and once again I wanted to fill a Java array with the values from certain columns. For the record, as previously, this stuff was all for some JUnit tests I was running. A simplified example of what I was doing is shown below:

1
2
3
4
5
while(itor.hasNext()) {
    Student student = itor.next();
    assertTrue("Student " + student.getId() + " != " + idResults_[count],
                    student.getId() == idResults_[count]);
}

Basically I have a list of Students that I have created whose ids I want to ensure correspond to what I expect them to be. To test this I have a data set of around 250 students (I’m not really that interested in checking the ids, it more a category a student is in but the ids example was easier to show).

In the code above idResults_ corresponds to an int array that I would like to generate from a column in the CSV file. So idResults_ looks something like:

int [] idResults_ = {87868,78757,89987,......};

So how did I generate this array? Well I extended the 5 lines in my last post into a slightly larger Perl script that takes some options and spits out the array initaliser. The actual script can be found HERE. The usage for this script is:

Usage: extract.pl -f <input_file> -c <id> -[hnwisro]
        -h Show this screen
        -n Show the column names in the file
        -w Separate on whitespace (default is a comma)
        -i Don't ignore first line, i.e. it contains the names of the columns
        -s Treat the data as a string, i.e. data in generated array is in 
           double quotes, defaults to an int array
        -r Treat the data as characters, i.e. data in generated array is
           in quotes
        -f  <input_file> Input file
        -c  <id> column to include in array (can be either a number, 
            zero based, or column name)
        -o  <output_file> File array is output to (any other content will be
            over-written)
 
   Outputs a Java/C# array initaliser with values from column <id> from file
 <input_file> and send it out to <output_file>

As you can see I have extended this somewhat from my previous post into a full blow utility (useful probably only to me, but hey who cares). As you can see, instead of creating an int array from the data if you use the -s flag you can create a string array (e.g. {"Harry","Sally", "Billy"}) or a character array using the -r option. Furthermore, you can specify the column to create the array from. This can either be a zero based integer or the id of the column – this presumes that the first line in your file contains that names of the columns (ala CSV file). Also if the first line contains actual data, and you do not want it to be treated as the column names, then you can specify the -i option to choose NOT to ignore the values contained on this line.

Well that’s it. Hope someone else finds the script useful. Over and out.

regular expressions saved my life

A little note to all those programmers who have not taken the time to learn how to use regular expressions: DO IT NOW. I think learning Perl and regular expressions while I was working at Cisco was almost the best thing I took from that job. You never realise how useful it is to know until the day you are presented with a rather large text file and you have to extract some of the data. This is what happened to me last week; and it ain’t the first time either.

The file I was looking at was around 250 lines long and consisted of three tab separated numbers on each line, of which I was interested in only one of these numbers at a time. The idea was that I was trying to generate an array in Java initalised with these numbers, e.g:

int [] = {2,3,4,5,7,8,8,2,3............}

For the record this was just for some testing I was doing. Anyway, doing this without regular expressions would have been a nightmare, as not only did each file have 250 lines but there was 5 files. Instead 5 lines of Perl:

1
2
3
4
5
my $ar = "{";
while($text =~ /^\s*(\d+)\s+(\d+)\s+(\d+)\s*$/gm) {
    $ar .=  "$2,";}
chop $ar;
$ar .= "}";

done it in a flash. Now what to those people who do not know how to use regular expressions do in this situation?

They should teach more people this kind of programming at university. I’m all for learning the theory behind things, but there does have to be a better theory/practical split in my opinion. Maybe even a course called “Practical Computer Programming”, where they teach things like regular expressions, debugging, memory management (yes even for Java programmers), design patterns, useful data structures, etc. If anyone wants to pay me a tidy sum to teach it then your wish would be my command 🙂