Our system is like a sports car: many separate components all have to work in harmony to create a functioning whole. Except we're a 24/7 service providing software for doctors, so we don't allow our system the luxury of a "pit stop" for scheduled downtime. That requires a lot of thought up front to make sure that each component in the system can be "unplugged" for maintenance without affecting the whole. We've borrowed liberally from the best practices of financial and software companies' datacenters in order to achieve this, with some improvements of our own along the way.
What's been most interesting to me about datacenter work is that there are always tradeoffs. There are better and worse answers, but there is never a perfect answer. For example, when we introduced large file storage into our system so that users can send attachments in their secure text messages, we had to rethink a lot of our assumptions to make sure that those files could be synced in a robust and secure way between our different datacenters. And more recently, we've been looking at ways to use more than one firewall server at a time within a single datacenter, so they can share the work between them. These small innovations keep the system nimble and responsive as our user base and functionality both grow.
We recently crossed the mark of processing more than 1,000,000 HL7 messages per day for our various hospital, Practice Management, and EHR interfaces. It's an exciting milestone, and I enjoy thinking about how our infrastructure will change and evolve to support the next million!