Speed and accessibility are the key benefits most often talked about when promoting the TT® platform. But what is less frequently highlighted is the infrastructure and support structure that keep operations moving smoothly. That responsibility falls on Mike Mayhew, CIO at Trading Technologies. In this edition of “Five Questions with…,” we talk with Mike about his team and the behind-the-scenes work that goes into making the TT platform dependable and secure.
Brian Mehta, CMO
What is the scope of your responsibilities as TT’s CIO?
Mike: My responsibilities span global trading platform operations and support, security and corporate IT. Corporate IT is a well-oiled machine, so typically when I talk about my responsibilities, I focus on trading platform operations and support. In other words, my teams are responsible for the overall availability of TT’s production trading environments. In fact, about 99% of our efforts—whether infrastructure design, change management, security audits or day-to-day ops—are focused on ensuring users have a fast, secure and reliable trading experience.
How do you and your teams provide highly available services?
Mike: The foundation starts with an application architecture designed to tolerate failures in the underlying compute, storage and network infrastructure. A highly available system architecture, which includes strategic partitioning of software functions, state recovery and process clustering, can significantly improve system availability and help reduce the underlying hardware infrastructure cost. My teams work closely with our development teams to identify single points of failure in the system, common failure modes of the underlying infrastructure, and optimal design tradeoffs that inform infrastructure design, software deployment and expected application failover behavior.
Monitoring is equally critical in providing highly available systems. Quick detection of system failures or performance degradations minimizes the outage duration—or may preempt an outage altogether. We strive to automate monitoring and system checkouts as much as possible. For example, we’ve implemented automation that continually checks the production system’s order-routing availability globally. Our goal is to minimize the number of manual checkouts and the time our follow-the-sun ops team spends staring at screens, both of which are error prone.
Regardless of how well-architected a system might be, things will break unexpectedly and require manual restoration. This is why we have a robust disaster recovery plan. In the worst case, regional infrastructure may fail, forcing a migration to an alternate data center. To restore services quickly, there is no substitute for preparation, so we incorporate regular application layer failover testing between data centers and cloud regions to ensure that our failover designs and processes work as intended.
How do we secure the TT platform?
Mike: There is so much to discuss about our cybersecurity practice at TT that the topic warrants a separate discussion. But suffice it to say, security is a high priority for the whole company and is critical to system availability. Security-centric thinking is part of our culture. I know it sounds a bit cliché, but security is really everyone’s responsibility. The best way to thwart attacks is to make it a company-wide priority and follow industry best practices. We are checking all the boxes with respect to industry best practices for both our cloud and dedicated infrastructure deployments.
How do the operations and support teams keep pace with continuous software delivery?
Mike: We develop and roll out system enhancements at a rapid pace, which can make it a challenge for staff to stay ahead of the changes.
To combat this, the support team allocates time each day to ensure we are up to speed on new features. We also review all new functionality with software engineering before features are released.
On the operations side, we’ve standardized logging among system components and created unique transaction identifiers that allow for system transactions to be traced quickly through log collection tools. We have also developed swimlane diagrams that outline platform transaction flows. Additionally, when a subsystem of the architecture changes, ops, support and engineering always meet for an architectural review.
What does the next-generation trading infrastructure for TT look like?
Mike: As most probably know, TT is built on a hybrid cloud architecture that includes at its core a bespoke architecture for high-performance trade execution hosted in premium colocation space. As the architecture evolves, I foresee expanded use of cloud-based services. For example, I would like to believe that on-demand infrastructure will be available in premium colocation with proximity to the exchange matching infrastructure.
But the opportunities to consume performance infrastructure as a service are essentially non-existent today for technical and commercial reasons, so for the foreseeable future, high-performance trading infrastructure will remain dedicated and bespoke. That said, we’re continually investigating improvements for the management and efficiency of that infrastructure through automation, software defined networking and the use of technologies like containers.
Posted by: Brian Mehta, CMO