Deploy Gone Wrong: The Knight Capital Story

· 4 min read
Deploy Gone Wrong: The Knight Capital Story

Today, I found another amazing story from the software world on Doug Seven’s blog. Doug is a Senior Director at Microsoft Healthcare, writing about many interesting things.

Knight Capital Group was a financial services firm engaged in market making, electronic execution, and institutional sales and trading. In 2012, due to a deployment error by one of the engineers, Knight realized a $460 million loss and went bankrupt. During the first 45 minutes of trading, Knight’s executions constituted more than 50% of the trading volume, driving certain stocks up over 10% of their value.

The Fatal Deployment Mistake

Knight Capital Group was the largest trader in U.S. equities, with a market share of 17.3% on NYSE and 16.9% on NASDAQ, thanks to its high-frequency trading algorithms. The company’s Electronic Trading Group managed an average daily trading volume of over 3.3 billion trades, trading over 21 billion dollars. On July 31, 2012, Knight had approximately $365 million in cash and equivalents. The bottom line is that it was a huge company, and they were doing great.

Knight Capital was preparing for the launch of a new Retail Liquidity Program at The NYSE. The engineers updated their automated, high-speed, algorithmic router that sends orders into the market for execution. However, during the deployment of the new code, one of Knight’s technicians did not copy the new RLP code to one of the eight routing servers.

Unfortunately, it was a manual process, and Knight did not have a second technician review this deployment. No one at Knight realized that the old code had not been removed from the eighth server nor that the new RLP code had been added. This caused the old code to be activated the following day, which began routing child orders for execution but wasn’t tracking the number of shares against the parent order, resulting in an immense loss and bankruptcy of the company.

Bridging the Gap with DevOps

Knight Capital’s failure teaches us critical lessons for development teams. Automated, well-tested deploys are a must. Incremental canary releases are even better, minimizing the risks and allowing quick rollbacks when issues emerge. Knight lacked any of those, resorting to a manual process. Finally, close collaboration between developers, QA, and ops staff would have surfaced the risks sooner. If there was an ops staff, because clearly, one person is not enough. Silos at Knight led to oversight costing hundreds of millions.

You must ensure it is delivered and deployed correctly and invest in infrastructure to support this. You need this so that your customers get the value you provide and nothing breaks. Otherwise, you might bankrupt your business and take a few others with you.

In most companies, DevOps engineers are responsible for bridging the gap between development and operations teams, ensuring that software is delivered quickly and reliably. Without a DevOps engineer, development and operations teams may work in silos, leading to communication gaps, slower delivery times, and increased risk of errors.

Developers may also be responsible for tasks outside their area of expertise, such as infrastructure management, but it will likely lead to inefficiencies and errors on a decent project.

The Role of the DevOps Engineer

Every day is different for DevOps engineers. They work closely with developers and operations staff to ensure software rolls out smoothly and reliably. It’s their job to bridge gaps between those teams and keep everything running efficiently.

A lot of their work involves setting up systems to automate processes that used to be manual. That way, teams can deploy code faster and reduce errors. DevOps engineers enjoy using tools like Ansible, Puppet, and Chef to streamline operations. Of course, things don’t always go as planned, so they also spend plenty of time monitoring alerts and troubleshooting issues, coordinating with other teams to fix any production problems quickly.

While DevOps engineers are comfortable writing code, their primary focus is the operational side. They like being the grease that keeps the gears turning. Their broad knowledge across development and ops allows them to see the big picture and connect the dots between those roles. It’s fast-paced and challenging, but DevOps engineers love being at the heart of it all.

DevOps and SysAdmins — Two Peas in a Pod

DevOps engineers and system administrators are like two peas in a pod, even if their jobs aren’t exactly the same. DevOps engineers are the bridge between the coding whizzes and the infrastructure gurus. Their job is to bring everyone together to deliver software fast and reliably. They automate all the boring stuff, like managing servers and networks.

That frees up the sysadmins to focus on keeping things running smoothly day-to-day. Sysadmins have the thankless task of being everyone’s IT hero — running around putting out fires and keeping the server lights on. Meanwhile, DevOps engineers play around with code and conjure new ways to deploy the latest app release at lightning speed.

Their roles are different, but they’re all on the same team. The developers bring the innovation, the sysadmins bring the stability, and the DevOps engineers bring them together in blissful harmony. One big, happy tech family.

The Evolution of “Dev” in DevOps

The “dev” in DevOps stands for “development” — seems obvious, right? But it’s more than just a job title. It represents a whole new way of thinking.

In the past, developers and IT ops folks worked in their little bubbles, tossing software back and forth like a hot potato. Now, those walls are torn down. Developers and sysadmins work shoulder-to-shoulder, blending together.

All is tied together now — coding, testing, deploying, and monitoring. Instead of just writing the code and calling it a day, developers stay engaged throughout the software delivery process until the last step of continuous delivery. They take ownership of what they build, working closely with ops to ensure it runs like butter once in production.

The “dev” reminds DevOps engineers of their origins — they are innovators, problem-solvers, and creators. But now they get their hands dirty deploying that clever code written by others, automating routine tasks, and monitoring performance.

The Vital Role

The story of Knight Capital Group’s catastrophic failure due to a faulty deployment highlights the importance of strong software delivery practices. While developers focus on writing code, someone has to take care of the operational side of delivering that code.

DevOps engineers play a crucial role in ensuring software reliability by bridging the gaps between development and IT operations. With automated testing, incremental rollouts, and close cross-team collaboration, businesses can avoid costly mistakes like Knight’s.


Originally published on Medium.com