Sunday, May 12, 2013

Paired programming as a way to share knowledge can sometimes be dangerous...

Paired programming as a way to share knowledge is a great idea - especially if it is for work on legacy code. It can help someone who needs to update code avoid wasting time trying to understand the structure of the code when a person familiar with the code can quickly give a tour and point out all the important bits. 

However, pairing as a way to share knowledge can sometimes be dangerous. I recently had a learning experience for how not to do paired programming.

I was given the task to update a web service that was created by a sibling development team. The story was split into multiple tasks. 

I was paired with a person from the other dev team for one of the tasks. I'll call him Adam. We made our changes, and our unit tests gave coverage for what we were able to test. Everything worked as we expected. 

I was paired with another person for the next task. I'll call him Bob. This is where the paired programming started to transform into pear shaped programming.

Lesson 1: Strive for consistency, or at least a common vision.

Try to avoid pairing with multiple people on one story as a way to learn about a new system unless each person has a good idea of what the overall user story is covering.

The problem that occurred was that Adam and Bob only knew about the work in the specific task where they paired with me. There was a dependency on code from the first task where I paired with Adam that wasn't completely obvious when working on the code for the second task where I paired with Bob. All of the unit tests that we created for both tasks passed, and the bit of manual integration testing we did appeared to pass. However, there was a bit of code that needed to be updated when working on the task with Bob that was missed. This probably would have been noticed by Bob had he been more aware of what the changes were for the task I worked on with Adam. 

Lesson 2: Make your pairing partner accountable.

Make sure that the person you are pairing with attends your scrum.  

Bob treated the situation as though he was doing a bit of side work, and would pull other tasks to work on. We should have both been solely focused on the one task until it was accepted as done and ready to ship. Bob might have been more likely to stick with the task and treat it as work that has his name attached to it if he had attended our scrums. I should have said something, but I also treated the situation as though Bob was just helping out instead of being an equal partner.

The result of the misguided pairing was that we shipped a bug to production. We spotted the bug and were able to fix it before it could impact customers, but it took time from us being able to work on other tasks.   

Lesson 3: Do your homework. 

If your name is attached to some work, then make sure you understand the code you are touching well enough to explain it to someone else. Don't assume that the person you are pairing with is not going to miss some bit of code that needs to be updated just because they are familiar with the code.

I should have made sure that I knew how every bit of code worked that I was touching, and how the code was being called. If I had, then I would have caught the missing code change. Instead I accepted quick explanations from people already familiar with the code, and assumed that I was "learning" what was important. What I had done was basically the same as listening to a teacher talk about a topic, but not bothering to do any homework to make sure that I understood what was being said. 

Monday, May 6, 2013

Things I learned while using AWS SQS...

Updated 03-20-2017

Amazon's Simple Queue Service (SQS) provides an easy to use mechanism for sending and receiving messages between various applications/processes. Here are a few things that I learned while using the AWS Java SDK to use SQS.

SQS is not can be FIFO

It used to be that AWS SQS didn't guarantee FIFO ordering. Now you can create a standard queue or a FIFO queue. However, there are some differences to be aware between standard and FIFO queues that are worth pointing out. The differences can be read about here. Here are some of the key differences:

Standard Queues - available in all regions, nearly unlimited transactions per second, messages will be delivered at least once but might be delivered more than once, messages might be delivered out of order.

FIFO Queues - available in US West (Oregon) and US East (Ohio), 300 transactions per second, messages are delivered exactly once, order of messages is preserved (as the queue type suggests).

SQS Free Usage Tier

The SQS free usage tier is determined by the number of requests you make per month.  You can make up to 1 million requests per month.  The current fee is $.50 per million requests after the first million requests. The cost is pretty low, but it would be easy to start racking up millions of requests. Luckily, there are batch operations that can be done, and each batch operation is considered one request.

Short Polling/Long Polling

You can set a time limit to wait when polling queues for messages. Short polling is when you make a request to receive messages without setting the ReceiveMessageWaitTimeSeconds property for the queue. Setting the ReceiveMessageWaitTimeSeconds property to up to 20 seconds (20 seconds is the maximum wait time) will cause your call to wait up to 20 seconds for a message to appear on the queue before returning.  If there is a message on the queue, then the call will return immediately with the message.  The advantage to using long polling is that you will make less requests without receiving messages. 

One thing to remember is that if you have only one thread being used to poll multiple queues, then you will have unnecessary wait times when only some of the queues have messages waiting.  A solution to that problem is to use one thread for each queue being polled.

Something that seemed a bit contradictory is that queues created through the web console have the ReceiveMessageWaitTimeSeconds set to 0 seconds (meaning it is going to use short polling). However, the FAQ mentions that the AWS SDK uses 20 second wait times by default. I created a queue using the AWS SDK, and the wait time was listed as 0 seconds in the web console. I shouldn't have to specifically set the wait time property to 20 seconds if the default wait time is 20 seconds.  Perhaps the documentation just hasn't been updated yet.

Message Size

The message size can be up to 256 KB in size. If you plan on using SQS as a way to manage a data process flow then you might want to consider how easy it is to reach the 256 KB limit.  Avoid putting data into the queue messages.  Instead, use the messages as notifications for work that needs to be done, and include information that identifies which data is ready to be processed. This is especially important to remember since the messages in the queue can be out of order, and you don't want to count on the data embedded in a message as being the latest version of the data. 

Message TTL On Queues

Messages have a default life span of 4 days on queues, but can be set to be kept for 1 minute to 2 weeks. 

Amazon May Delete Unused Queues

Amazon's FAQ mentions that queues may be deleted if no activity has occurred for 30 days.

JARs Used By AWS Java SDK

There are certain jar files that you will need to reference when using the AWS Java SDK.  They are located in the SDKs "third-party" folder. Here are the jar files I referenced while using the SQS APIs:

  • third-party/commons-logging-1.1.1/commons-logging-1.1.1.jar
  • third-party/httpcomponents-client-4.1.1/httpclient-4.1.1.jar
  • third-party/httpcomponents-client-4.1.1/httpcore-4.1.jar

Elastic Load Balancers in AWS have a pretty confusing message...

I had an issue the other day with AWS an Elastic Load Balancer (ELB) that said the instances I had assigned to the load balancer were "Out of Service".  There was a link that was displayed as "(why?)", and would display the hint text of "Instance is in stopped state."  This was particularly confusing, because the EC2 console displayed the instances as running.

It turns out that the problem was with the load balancer settings.  Doing a search revealed that the error message "Instance is in stopped state." will be displayed when the health check fails.  It turns out that the problem was that the health check ping target was pointing to the wrong location (a web page that didn't exist).

I wish that the AWS console would have listed a suggestion of "Please confirm that the health check ping target is correct." instead of just listing an invalid assumption that the instance was in a stopped state.  Or, have the "(why?)" anchor display a page of possible troubleshooting steps. One of the suggested steps could still mention the possibility that the instance is stopped.

In the end it was resolved somewhat quickly, but it could have been a lot less stressful if the information provided was more accurate and more helpful.