Skip to main content

Command Palette

Search for a command to run...

How a Single Uncaught SQLException Grounded a Multibillion Dollar Airline?

Updated
2 min read
How a Single Uncaught SQLException Grounded a Multibillion Dollar Airline?
Z

I am a Journalist-turned-Software Engineer. I love coding and the associated grind of learning every day. A firm believer in social learning, I owe my dev career to all the tech content creators I have learned from. This is my contribution back to the community.

U.S. Federal Aviation Administration (FAA) in Jan 2023 announced new details on the cause of the Notice to Air Mission (NOTAM) system outage, which caused the delay or cancellation of more than 8,400 flights earlier that month.

The FAA announced that a contractor “deleted files while working to correct synchronization between the live primary database and a backup database.”

This reminded me of a case study I had read recently in the book
“Release It! Design and Deploy Production-ready Software” by Michael Nygard.

Interestingly the case study involved an airline and a database.

The book in the section "Case Study: The Exception That
Grounded An Airline" talks about how a tiny programming error starts the snowball rolling downhill.

In the post-mortem analysis of a major outage that occurred at an airline company, it was discovered that the root cause of the problem was a single uncaught SQLException in the code of a session bean.

The incident happened after a routine database failover and maintenance, and it caused all check-in kiosks and IVR servers to stop servicing requests at the same time.

Through investigating the thread dumps, log files, and configurations of the servers, it was determined that the problem was caused by a resource leak in the connection pool of the application server.

The leak was caused by a failure to handle SQLException when closing a JDBC statement, which resulted in the exhaustion of the resource pool and the blocking of all future calls to connectionPool.getConnection().

This incident serves as a reminder of the importance of proper handling of exceptions in code, and the potential consequences of a seemingly small oversight.

FAA outage

Book "Release It"

More from this blog

I am Zahiruddin Tavargere (Zahere). A social-learner, here to learn, share and grow with the tech community.

74 posts

I am Zahiruddin Tavargere (Zahere). A firm believer in social learning, I owe my dev career to all the tech content creators I have learned from - this is my contribution back to the community.