In previous blog posts we have presented the fantastic result of Running 2400 Nodes and Starting up 1000 Nodes on Google Compute Engine (GCE). Now we will share the how-to experience we acquired when performing those tests. The documentation of GCE is excellent, so concepts and details will not be repeated here.
Your application must have a class with a main method that the Java runtime will execute to startup the ActorSystem. We used Akka Microkernel, which provides this main class, but you can easily write your own main class or use akka.Main.
The binaries—classes, dependencies, configuration, start scripts—of the application should be packed in an archive file to be transferred to GCE. We used the Akka sbt plugin (dist) and a few command lines to create this archive:
VERSION=0.1-SNAPSHOT sbt clean dist mv target/dist target/akka-testapp-$VERSION cp bin/* target/akka-testapp-$VERSION/bin/ cp gce-scripts1/* target/akka-testapp-$VERSION/bin/ tar -cz -C target -f target/akka-testapp -$VERSION.tgz akka-testapp-$VERSION
A better way would be to use the sbt-native-packager, but that choice is not important for GCE.
Sign in and create a GCE project.
Install and setup the GCE command line tool gcutil.
We made one of the instances special in that we used it for collecting logs. For this instance we also used a persistent boot disk. The very first time it was started with:
gcutil addinstance n0001 --persistent_boot_disk --image=projects/debian-cloud/global/images/debian-7-wheezy-v2013081 6 --nosynchronous_mode --zone=europe-west1-a --machine_type=n1 -standard-2 --kernel=projects/google/global/kernels/gce-v20130813
This creates the persistent boot disk with the specified image. Thereafter this instance was started with:
gcutil addinstance n0001 --disk=n0001,boot --nosynchronous_mode --zone=europe-west1-a --machine_type=n1-standard-2 --kernel=projects/google/global/kernels/gce-v20130813
When you have started n0001, you can almost immediately ssh into it with:
gcutil ssh n0001
You can stop the instance with:
gcutil deleteinstance n0001 --zone=europe-west1-a --nodelete_boot_pd --force
We started other instances without external IP address, which means that ssh must be done via n0001.
In the large cluster tests we placed the application binaries on a persistent disk that was shared among the instances. This worked fine, but is inconvenient when trying out different changes, because a persistent disk can only be attached in write mode to one instance, and meanwhile it cannot be attached to other instances. Therefore we recommend to store the application binaries in Cloud Storage instead, which will be explained later. Anyway, it is a good exercise to understand how to use persistent disks.
Create a new persistent disk, in our case named akka-testapp:
gcutil adddisk akka-testapp --zone=europe-west1-a
When starting up the instance we attach this disk in read_write mode:
gcutil addinstance n0001 --disk=n0001,boot --nosynchronous_mode --zone=europe-west1-a --machine_type=n1-standard-2 --kernel=projects/google/global/kernels/gce-v20130813 --disk=akka-testapp,mode=read_write
ssh into it:
gcutil ssh n0001
and mount the disk with:
sudo mkdir /mnt/akka-testapp sudo /usr/share/google/safe_format_and_mount -m "mkfs.ext4 -F" /dev/disk/by-id/google-akka-testapp /mnt/akka-testapp
Now we can transfer the application binaries to n0001:
gcutil push target/akka-testapp-0.1-SNAPSHOT.tgz .
untar and place them on the persistent disk, /mnt/akka-testapp/
We also placed the JDK installation on that disk.
To start the application we used bash scripts defining JVM parameters and main class. We used a slightly different script for the seed nodes, but if you only need to run one ActorSystem per instance that should not be necessary.
Note that the host name (e.g. n0017) assigned when starting the instance should be used in the akka.remote.netty.tcp.hostname and akka.cluster.seed-nodes configuration settings. From a start script the host name can be retrieved with $(hostname).
This start script is executed when the instance is booted by passing in a startup script when creating the instance. The startup scripts looks like this:
#! /bin/bash sudo mkdir /mnt/akka-testapp sudo /usr/share/google/safe_format_and_mount -m "mkfs.ext4 -F" /dev/disk/by-id/google-akka-testapp /mnt/akka-testapp sudo tar -C /opt -xf /mnt/akka-testapp/install/jdk-7u40-linux-x64.tar cd /mnt/akka-testapp/ bin/startTestapp.sh
Note how it mounts the persistent disk and in the end call the start script for the application. This startup script is specified when creating the instance with the metadata_from_file=startup-script argument.
gcutil addinstance n0017 --nopersistent_boot_disk --nosynchronous_mode --zone=europe-west1-a --machine_type=n1-standard-2 --image=projects/debian-cloud/global/images/debian-7-wheezy-v2013081 6 --kernel=projects/google/global/kernels/gce-v20130813 --disk=akka-testapp,mode=read_only --external_ip_address=none --metadata_from_file=startup-script:bin/startup.sh
The script for starting the 1500 instances looked like this: startClusterGCE.sh
The script for stopping the instances: stopClusterGCE.sh
The approach with the persistent disk is inconvenient when trying out different changes, because a persistent disk can only be attached in write mode to one instance, and meanwhile it cannot be attached to other instances. Therefore we recommend using following approach to store the application binaries in Cloud Storage instead.
For this you need to install another command line tool, called gsutil.
Create a bucket:
gsutil mb -l EU gs://akka-testapp
Package the application:
VERSION=0.1-SNAPSHOT sbt clean dist mv target/dist target/akka-testapp-$VERSION cp bin/* target/akka-testapp-$VERSION/bin/ cp gce-scripts2/* target/akka-testapp-$VERSION/bin/ tar -cz -C target -f target/akka-testapp-$VERSION.tgz akka-testapp-$VERSION
Upload the tarball to Cloud Storage:
gsutil cp target/akka-testapp-$VERSION.tgz gs://akka-testapp
The same thing can be done with the JDK installation.
GCE instances can automatically obtain the necessary tokens to access Cloud Storage by adding the command line option --service_account_scopes=storage-rw to the addinstance command.
gcutil addinstance n0017 --nopersistent_boot_disk --nosynchronous_mode --zone=europe-west1-a --machine_type=n1-standard-2 --image=projects/debian-cloud/global/images/debian-7-wheezy-v2013081 6 --kernel=projects/google/global/kernels/gce-v20130813 --external_ip_address=none --metadata_from_file=startup-script:bin/startup.sh --service_account_scopes=storage-rw
The startup script can download and install the application and the Java runtime:
#! /bin/bash VERSION=0.1-SNAPSHOT gsutil cp gs://jvm/jdk-7u40-linux-x64.tar.gz - | tar -C /opt -xzf - cd /tmp gsutil cp gs://akka-testapp/akka-testapp-$VERSION.tgz - | tar xzf - gsutil cp -R gs://akka-testapp/patches . cd akka-testapp-$VERSION #patch it cp -a ../patches/* . exec bash bin/startTestapp.sh
If you start the instances without external IP address it cannot access Cloud Storage. This problem is solved by installing a Squid proxy on one of the instances and use that proxy from the other instances as explained in the documentation.
This is what we did. Add a firewall rule:
gcutil addfirewall proxy-wall --allowed="tcp:3128"
ssh to the instance with public IP, and then install Squid
sudo apt-get install squid3 # Enable any machine on the local network to use the Squid3 server sudo sed -i 's:#\(http_access allow localnet\):\1:' /etc/squid3/squid.conf sudo sed -i 's:#\(http_access deny to_localhost\):\1:' /etc/squid3/squid.conf sudo sed -i 's:#\(acl localnet src 10.0.0.0/8.*\):\1:' /etc/squid3/squid.conf sudo sed -i 's:#\(acl localnet src 172.16.0.0/12.*\):\1:' /etc/squid3/squid.conf sudo sed -i 's:#\(acl localnet src 192.168.0.0/16.*\):\1:' /etc/squid3/squid.conf sudo sed -i 's:#\(acl localnet src fc00\:\:/7.*\):\1:' /etc/squid3/squid.conf sudo sed -i 's:#\(acl localnet src fe80\:\:/10.*\):\1:' /etc/squid3/squid.conf # Prevent proxy access to metadata server echo "acl to_metadata dst 169.254.169.254" | sudo tee -a /etc/squid3/squid.conf echo "http_access deny to_metadata" | sudo tee -a /etc/squid3/squid.conf sudo service squid3 stop sudo service squid3 start
Since n0001 had a persistent boot disk this was a one-time installation.
In the startup script of the other instances we patched the configuration for gsutil to proxy through Squid on n0001:
# Proxy through n0001 echo "[Boto]" | sudo tee -a /etc/boto.cfg echo "proxy = n0001" | sudo tee -a /etc/boto.cfg echo "proxy_port = 3128" | sudo tee -a /etc/boto.cfg
It is not hard to see that Google Compute Engine is developed by top notch devops engineers.
The command line tools, gcutil and gsutil, are excellent for automating the startup and operation of a cluster with ordinary scripts. The excellent documentation makes it easy to understand how to use the tools. Running Akka on Google Compute Engine is a breeze. Try it out!