While refactoring the TFS and Continuous Integration struture of a couple projects, one thing toke my attention: Health Check builds were stealing most of the building times, which for me was representing long build queues to wait. The reason they were crowding the queue is because they were taking a long time to complete (about 10 to 15 minutes), and building too often (every 5 minutes when changes are pending).
Strength or kindness?
Since our CI tool has only two build agents, the obvious answer is to increase this number to properly serve the whole company. But does this tells the whole story? More build agents means more resources (disk space, licenses, etc) and unfortunately we still live in a world of limited resources. This also means we rarely will have resources enough to properly serve everyone at peak times.
Take as comparison the limited space most cities have to make their roads. Most of the time traffic can be ok, but when everyone goes out at same time, like in rushy times, there are no room for all and we have car jams.
Fortunately it’s easier to interfere and positively affect our Continuous Integration systems, so why not try a thing or two as a sign of civility, just like we do in a Drive Thru not taking too much to order your fast food?
Health Check builds, how big?
Depending on the Solution Architect who setup the project you can have different things, but most cases what I see is Health Checks doing everything but deployments. Since we use TDS in our projects, you can make it build a package of the whole deployment, along with some meta data. But is it really necessary? It’s a question I made myself, when decided to make it different.
My Health Check ended up being minimalist: just a compilation is executed, while TDS is kept totally off. No packages, nothing. Ended up with builds taking 45 to 90 secs to finish, much better!
Health Check builds: how often?
Being tiny means it can build more often. In my case I have it building after changes are detected, with a 5 minutes latency to avoid subsequent check-ins to causing multiple builds in sequence.
Full builds and Health Checks working together
No doubt the resulting Health Check builds are weaker, as it’s doing just a tiny part of the integration process. To fill the gaps, I also setup full builds working in conjunction with them. These are also set to run automatically when changes are pending, but with a much longer latency. Having it building each 4 or 8 hours will ensure a full integration is made once or twice in a normal day work.
Let’s also keep in mind these deploys can and must be manually triggered by developers, no matter the latency, as soon as they finish the user story they are working on, so they can test what they did at the integration servers before considering their work done.
In short: deployments will be made when developers finishes their work, or at minimum once or twice a day if pending changes are to be deployed.
Full builds with Asynchronous HTTP calls
Another improvement I made was replacing at full builds some Synchronous HTTP calls by Asynchronous. They were mainly used to wake up instances after deployments and publishing from CM to CD. Most cases, The build agent doesn’t really need to wait for these calls to respond before going to the next step, so we can save it’s precious time to another teams.
What about your experiences, do you agree and disagree? What other factors are left behind at this analysis? Let me know your thoughts!