slu

Søren's Blog

Random Ramblings

Installing Hadoop 1.0.4 on Ubuntu 12.04
slu

You'll need

A Linux Server. I'm using Ubuntu 12.04 LTS 64 bit, but any should do

  • Java JDK 1.6_24 64 bit for Linux (jdk-6u24-linux-x64.bin)
    get it here

  • Hadoop 1.0.4 (hadoop-1.0.4-bin.tar.gz)
    get it here

Install your server. I wont describe how to do that.

I'm using a virtual machine on VirtualBox with the following port forwards:

Copy Java and Hadoop to the server:


pscp -l ubuntu -P 9022 hadoop-1.0.4-bin.tar.gz jdk-6u24-linux-x64.bin 127.0.0.1:


scp hadoop-1.0.4-bin.tar.gz jdk-6u24-linux-x64.bin ubuntu@127.0.0.1:9022:


All files can be downloaded here: https://app.box.com/s/5hbldhot60rtig2m1182



hadoop-01
hadoop-02
hadoop-03
hadoop-04
hadoop-05
hadoop-06
hadoop-07
hadoop-08
hadoop-09
hadoop-10
hadoop-11
hadoop-12
hadoop-13
hadoop-14
hadoop-15
hadoop-16
hadoop-17
hadoop-18
hadoop-19
hadoop-20
hadoop-21
hadoop-22
hadoop-23
hadoop-24
hadoop-25
hadoop-26
hadoop-27
hadoop-28
hadoop-29
hadoop-30
hadoop-31
hadoop-32
hadoop-33
hadoop-34
hadoop-35
hadoop-36
hadoop-37
hadoop-38
hadoop-39
hadoop-40
hadoop-41
hadoop-42
hadoop-43
hadoop-44
hadoop-45
hadoop-46
hadoop-47
hadoop-48
hadoop-49
hadoop-50
hadoop-51

Installing Redhat Enterprise Linux 6
slu
rhel6-01
rhel6-02
rhel6-03
rhel6-04
rhel6-05
rhel6-06
rhel6-07
rhel6-08
rhel6-09
rhel6-10
rhel6-11
rhel6-12
rhel6-13
rhel6-14
rhel6-15
rhel6-16
rhel6-17
rhel6-18

Keeping up with the Java Release cycle
slu
Did you know that Java has a regular release cycle? Well, it has, and that means you'll be able to plan and prepare for future releases.

If you're developing Applets or Web Start-applications, it's extra important to check if your software works with future releases. You might even want to align your own releases with the coming Java releases.

In the list below, you can see the release dates of Java from version 7 update 40 and up.


  • 7u40 10 Sep 2013 LU

  • 7u45 15 Oct 2013 CPU

  • 7u51 14 Jan 2014 CPU

  • 7u55 15 Apr 2014 CPU

  • 7u60 28 May 2014 LU

  • 7u65 15 Jul 2014 CPU

  • 7u71 14 Oct 2014 CPU

  • 7u75 20 Jan 2015 CPU

  • 7u81 14 Apr 2015 CPU

The designation LU means Limited Update and they are numbered in multiples of 20 (u40, u60, u80, etc.).

The designation CPU means Critical Patch Update and they use odd numbers, which are calculated by adding multiples of five to the prior LU release and when needed adding one to keep the resulting number odd. There are three planned CPU releases after each LU release, e.g. after u40 comes u45, u51 and u55. The free numbers, e.g. u53 can be used for unplanned releases.

The LU releases are concerned with adding new features.

The CPU release are concerned with security fixes and have previously had some impact on Applets and Web Start-applications, because security has be tightened, resulting in warnings or (worse) application that would not start.

I've gathered this information from the following, where you can find more information:

You can use the Early Access Releases to test whether you're software works with the next release.

You might also want to check out the complete Java version history.'

Update 11 Jun 2014: Inserted correct release date of 7u60 and added release 7u81.
Tags: ,

Adding more disk space using LVM
slu
I had an old build server that was running low on disk space.

The server was virtual and it was running Ubuntu 10.10 configured with LVM.

LVM means Logical Volume Managment, and it's a virtual layer upon the physical disks (or virtual disks in this case). LVM makes it easy to expand the disk space without the need to move files.

I started by adding an extra disk, which was easy, as this is a virtual machine.

Then I logged on to the machine and issued the following to see what disks were available:

# sudo lshw -short -c disk
H/W path             Device      Class      Description
=======================================================
/0/100/7.1/0.0.0     /dev/cdrom  disk       DVD-RAM writer
/0/100/10/0.0.0      /dev/sda    disk       100GB SCSI Disk
/0/100/10/0.1.0      /dev/sdb    disk       100GB SCSI Disk


The disk I just added is /dev/sdb, which needs formatting:

# sudo fdisk /dev/sdb

(Then select 'm', 'n', 'p', '1', <enter>, <enter>, 't', '8e', 'w')

The command issued to fdisk means: show menu, create new primary partition number 1. Use the complete disk (i.e. just use the suggested values for start and end sectors = 2 x <enter>). Set type to "Linux LVM" (8e) and write changes to disk.

Create the partition as a Physical Volume (PV) in LVM:

# sudo pvcreate /dev/sdb1


Add the PV to the Virtual Group (VG) called 'build' (yours will be another name, probably the hostname of the server):

# vgextend build /dev/sdb1


Extend the Logical Volume (LV) which is available at '/dev/build/root' and is mounted on '/':

# lvextend -L+99G /dev/build/root


Resize the filesystem to use all available space:

# resize2fs /dev/build/root


All the above operations was done while the server was running, i.e. no downtime. I did however reboot to make sure that the server found the new disk, but I'm not sure that was necessary. Anyway the downtime was very short, and there was no need to move or copy files to the new disk.

I other words: LVM FTW.

Finally, you can see the LVM configuration by calling pvdisplay, vgdisplay and lvdisplay.
Tags: ,

Installing Cygwin Perl on Windows 7
slu
(This is part three of my Perl for Java programmers guide)

Below is a simple step-by-step guide on how to install Cygwin Perl on Microsoft Windows 7. The instructions should work on most versions of Windows from XP and up.

The instructions are supported by a series of screenshots. I've made all the (uncropped) screenshots available for easy reference/download.

Start Internet Explorer and open the Cygwin project website:

win7-cygwin-install-01

You might need to scroll down a bit to find the download link for the installation program, simply called setup.exe. Click the link:

win7-cygwin-install-02

A dialog will ask you if you want to run or save the file. Just click Run:

win7-cygwin-install-03_crop

You'll get a warning about running a program from an untrusted source, click Yes:

win7-cygwin-install-04_crop

The installation program will start, click Next:

win7-cygwin-install-05_crop

The default location to install is C:\cygwin\, you should not need to change that, unless you have another disk and want to use that. Do not use directory names with spaces like c:\Program Files\. Click Next:

win7-cygwin-install-07_crop

The installation program will download a lot of packages during the installation. The default directory is C:\Users\name\Desktop (where name is your username). I don't like clutter on my desktop, so I change that to C:\Users\name\Downloads. Click Next:

win7-cygwin-install-08_crop

Unless you know you're behind a proxy, just click Next:

win7-cygwin-install-09_crop

Select a download site and click Next:

win7-cygwin-install-10_crop

You'll be presented with following window, where you can select which packages to install:

win7-cygwin-install-12

Just write perl in the search field and the list becomes a little shorter:

win7-cygwin-install-13

Click on the line that says "Perl () Default" to change it to "Perl () Install" and click Next:

win7-cygwin-install-14

The installer will resolve some dependencies that you'll need, just click Next to continue:

win7-cygwin-install-15_crop

The installation will start:

win7-cygwin-install-16_crop

You might get an error downloading packages:

win7-cygwin-install-17_crop

Go back and select another download site:

win7-cygwin-install-18_crop

I got some postinstall errors, but they were safe to ignore, i.e. just click Next:

win7-cygwin-install-19_crop

By default you'll get some icons in both the start menu and on your desktop, untick any you don't want and click Finish:

win7-cygwin-install-20_crop

The installation is complete, and you'll see new programs installed:

win7-cygwin-install-21_crop

Select the Cygwin Terminal and a Terminal Window will open:

win7-cygwin-install-22_crop

Do a perl --version to check the version of perl:

win7-cygwin-install-23_crop

That's it. You've installed Cygwin Perl.

Installing Strawberry Perl on Windows 7
slu
(This is part two of my Perl for Java programmers guide)

Below is a simple step-by-step guide on how to install Strawberry Perl on Microsoft Windows 7. The instructions should work on most versions of Windows from XP and up.

The instructions are supported by a series of screenshots. I've made all the (uncropped) screenshots available for easy reference/download.

Start Internet Explorer and open the Strawberry Perl website:

win7-strawberry-perl-install-01

Click on one of the recommended download links, based on whether you're using 32 bit or 64 bit Windows. A dialog will ask you if you want to run or save the file. Just click Run:

win7-strawberry-perl-install-02-crop

The download will start:

win7-strawberry-perl-install-03-crop

When download is ready, the installation will start:

win7-strawberry-perl-install-04-crop

Click Next:

win7-strawberry-perl-install-05-crop

Strawberry Perl is Open Source and you'll need to accept the terms by clicking the checkmark before clicking Next:

win7-strawberry-perl-install-06-crop

The default location to install is c:\strawberry\, you should not need to change that, unless you have another disk and want to use that. Do not use directory names with spaces like c:\Program Files\. Click Next:

win7-strawberry-perl-install-07-crop

Now we're ready to install, click Install:

win7-strawberry-perl-install-08-crop

The installation will start:

win7-strawberry-perl-install-09-crop

You might get a warning from Windows, asking you if you want to install third party software, just click Yes:

win7-strawberry-perl-install-10-crop

Now the installation will really start:

win7-strawberry-perl-install-11-crop

After a short time it's ready, click Finish:

win7-strawberry-perl-install-12-crop

The README will open, you might want to read it:

win7-strawberry-perl-install-13

Notice that new programs have been installed. There's a "Perl (command line)" entry on the Start Menu, click on that:

win7-strawberry-perl-install-14-crop

A terminal (or DOS Prompt if you like) will open. Try writing perl --version followed by Return (note there are two dashes before "version"). Perl should print out a short version and copyright statement:

win7-strawberry-perl-install-15-crop

That's it! You now have Perl installed on your Windows computer.

You might want to try the classic Hello World program in Perl.

Hello World in Perl
slu
The mantra There's More than One Way to Do It is well known in the Perl community. Even for something as simple as a "Hello World" program. On Rosetta Code there's three examples of "Hello World" in Perl.

I will show you only two other variantions.

The first one is a small script, that will demonstrate some aspects of Perl. Here it is:

#!/usr/bin/env perl

use strict;
use warnings;

sub say_hello {
    my $name = shift;

    die "Name must contain at least two characters" if length($name) < 2;

    print "Hello, $name\n";
}

say_hello $ARGV[0] || 'world';


Save it as hello.pl

The first line is the shebang line, which is a construct used in scripts on Unix (and Linux) systems. It's not strictly needed and Perl will perceive the line as a comment (because it starts with a hash (#)) and ignore it.

The next two lines are called pragmas and they restrict the use of unsafe constructs and disable enables optional warnings respectively. You should always use those in your scripts.

Then we define a subroutine, say_hello. The subroutine expects one argument: a name, which much be at least two characters long. If the provided argument is ok, then a message is printed. Note that subroutines in Perl does not have named arguments, instead all arguments are provided in the default array, @_. The shift operator pulls the first element of an array, and if no array is specified the default array is used. The die keyword will raise an exception, which usually will make the program exit with the specified warning. The subroutine and consequentially the program will only die if the length of the name is less than two, because of the statement modifier, e.g. the if-statement after the die command.

Finally we call the subroutine to say hello. The argument to the subroutine is either the first argument given on the command line or, if no such argument is given, the word 'world'.

Below is a screenshot of how it looks on a Windows 7 PC running Strawberry Perl:

win7-perl-hello-world-1

That was the first example of a Hello World program. The second one is not meant to be saved as a script, instead it's called directly on the command line:

perl -E"say'Hello, world'"


The option -E means execute what follows as a script and enable all optional features. The say keyword is an optional feature, only found in newer versions of Perl. It prints its arguments to the screen followed by a new line.

Here's how it looks when run:

win7-perl-hello-world-2_crop

The second example might seem silly, but creating small one-time scripts directly on the command line can be very useful.

The anatomy of a README file
slu
I like to put a README file in the root directory of all my software projects. And as I would never do any kind of software development without using some kind of version control (e.g. Git), the README will also be in the root of the repository.

The README is the first piece of documentation a new development will see. It does not need to be the only, in many cases you will have an issue tracker, a wiki or other site with further documentation. You should list those in the README.

Because the README is what the developer sees, it should describe how to build the software and what tools are required (including versions). Describing the directory structure is also a good idea, especially if it's a complex system.

Below is an example of a minimal README, you can use as a template:

System Name
===========

A short description of the system.

Copyright and/or company information.

If relevant name and description of customer.

Links to other documentation, issue tracker, etc.


Requirements
------------

The following tools are required for development:

* tool version 1.2.3
* another thingy version 4.0


Directory layout
-----------------

All source files are placed in the src directory. 
Generated files are placed in build. This README, the
Makefile and a few helper scripts are placed in the root.
Documentation can be found in doc.

Developlement
-------------

You'll need access to the repository, mainline development is 
done here:

* http://subversion-server/svn/project-name/trunk/

Install the required software. Setup you local database...

Build by issuing the following command

* build

Deploy to you local server using:

* deploy

Now the system is deploy on http://localhost:1245/system/

Perl for Java programmers
slu
A co-worker asked if/how/why he should learn Perl. He wanted a complimentary language to Java, which is what he mainly uses. Something for quick scripts, one-shot conversions, experimentation etc. Perl fits the bill perfectly.

There's really, in my opinion, only two other options: Python and Ruby. Both are great languages, but I know (and love) Perl, which has a C-like syntax, just like Java. Which is why we agreed that Perl was the way to go for him.

So, as a Java programmer, how do you get started with Perl?

You should start by reading the best (free) book about Perl: Modern Perl. It's also available for download as PDF or epub.

To try out Perl, you need to install it.

On Windows you have three possibilities:


  1. Strawberry Perl (see Installing Strawberry Perl on Windows 7)

  2. Perl under Cygwin (see Installing Cygwin Perl on Windows 7

  3. ActivePerl


If you're already using Cygwin, then use the Perl that comes with that. Otherwise, I would recommend Strawberry Perl, it's up-to-date, easy to install, free and open. I used ActivePerl years ago; it was the first easy to use Perl on Windows. It's still a very good product, it's has some Windows-specific modules and enhancements, and you can buy support from ActiveState.

I don't have a Mac, but as far as I know, Perl should be available by default. ActivePerl is also available for MacOS.

Linux will usually have Perl installed by default, or available in a package for simple installation. And ActivePerl is also available here.

I will (hopefully) blog a bit more about how to learn Perl as a Java programmer.

If you want to know more, check out O'Reilly's Perl section, especially Programming Perl, 4th Edition - avoid old books and tutorials, Perl has changed at lot through the years, and although it's still highly backwards compatible, best practice has changed quite a bit.
Tags: ,

Partial migration of data
slu

I'm currently involved in updating an in-house developed issue/bug tracking system. It's a simple web application with a SQL database used for persistence. We've decided not to do a big bang migration of all data, instead we're migration one project/product at a time. This makes the transition easier to manage, as fewer users need to be informed of the change. Also, if the new system has any serious bugs, not that many users will be affected.

The migration of data from the existing production database to the new database is a little more complex, when we just can't migrate everything at once.

I've create an example to demonstrate how I did the migration. The example below is using MySQL, but the technique can be used with all SQL databases.

To try the code below you'll need a database. Often MySQL is installed with a test database, and you can use that. Or create one like this:


create database catalog default character set utf8;


Then switch to using the database:


use catalog


Now create the first set of tables. They are what's in production right now:


create table countries (
  id int not null auto_increment primary key,
  code varchar(20) not null,
  name varchar(200)
) ENGINE = InnoDB;

create table cars (
  id int not null auto_increment primary key,
  name varchar(50),
  country_id int not null,
  foreign key (country_id) references countries(id) on delete cascade
) ENGINE = InnoDB;



Add some data:

insert into countries (code, name) value ('DK', 'Denmark');
insert into countries (code, name) value ('SE', 'Sweden');
insert into cars (name, country_id) value ('Volvo', (select id from countries where code = 'SE'));
insert into cars (name, country_id) value ('Ellert', (select id from countries where code = 'DK'));



Let's see what's in the database:

mysql> select a.name Car,b.name Country from cars a, countries b where a.country_id = b.id;
+--------+---------+
| Car    | Country |
+--------+---------+
| Ellert | Denmark |
| Volvo  | Sweden  |
+--------+---------+
2 rows in set (0.00 sec)


Now create a set of new set of tables. This is the new system, where data from production will be migrated to:

create table countries_copy (
  id int not null auto_increment primary key,
  code varchar(20) not null,
  name varchar(200)
) ENGINE = InnoDB;

create table cars_copy (
  id int not null auto_increment primary key,
  name varchar(50),
  country_id int not null,
  foreign key (country_id) references countries_copy(id) on delete cascade
) ENGINE = InnoDB;



As the two system will run in parallel, we'll add some data to this as well:

insert into countries_copy (code, name) value ('DE', 'Germany');
insert into cars_copy (name, country_id) value ('Audi', (select id from countries_copy where code = 'DE'));



And let's see that:

mysql> select a.name Car,b.name Country from cars_copy a, countries_copy b where a.country_id = b.id;
+------+---------+
| Car  | Country |
+------+---------+
| Audi | Germany |
+------+---------+
1 row in set (0.00 sec)



Now we'll prepare the new (copy) database for migration. Because both databases are in use, we can't migrate the id's. Instead will create an extra column in the coutries_copy table to hold the original id:

alter table countries_copy add column id_orig int;



Now we can copy all the countries:

insert into countries_copy (code,name,id_orig) select code,name,id from countries;



If we look at the data, we can see that the entries has new ids, but we'll also have the original id stored:

mysql> select * from countries_copy;
+----+------+---------+---------+
| id | code | name    | id_orig |
+----+------+---------+---------+
|  1 | DE   | Germany |    NULL |
|  2 | DK   | Denmark |       1 |
|  3 | SE   | Sweden  |       2 |
+----+------+---------+---------+
3 rows in set (0.00 sec)



We can use the link between the original and new id when we copy data to the cars_copy table. We'll only copy the swedish cars:

insert into cars_copy (name, country_id) select a.name, b.id from cars a, countries_copy b
where a.country_id in (select id from countries where code = 'SE')
and a.country_id = b.id_orig;



The copy now contains data from both systems:

mysql> select a.name Car,b.name Country from cars_copy a, countries_copy b where a.country_id = b.id;
+-------+---------+
| Car   | Country |
+-------+---------+
| Audi  | Germany |
| Volvo | Sweden  |
+-------+---------+
2 rows in set (0.00 sec)



And the relations are still correct.

Now we can continue to add Danish cars to the original table, cars. But Swedish cars should be added to the new table, cars_copy. If we need to add a new country, this should also be added to the new table, countries_copy.

The above code does not remove or somehow deactivate the migrated data in the old system. This should be done when doing this in real life.
Tags: ,

You are viewing slu