What most worried you during development of the Apollo software, and how did you and your team solve it?
MHH: The greatest challenge was that our software had to be man-rated; which meant lives were at stake. Failure was not an option. Not only did it have to work; it had to work the first time. Not only did the software, itself, have to be ultra-reliable, but the software would need to be able to detect an error and recover from it in real time. It did not disappoint.
The task at hand included developing and integrating all of the software for the command module, the lunar module and the systems software shared between, and residing within, both the command and lunar module; making sure that everything would play together and that there were no integration, communication, or interface conflicts (i.e., data, timing or priority conflicts). Updates, submitted from hundreds of people, were continuously being made over time and over the many releases for every mission; making sure that the software would successfully interface to, and work together with, all the other systems including the hardware, peopleware and missionware.
Because of the never-ending focus on making everything as perfect as possible, anything to do with the prevention of errors was not only not off the table, but it was top priority both during development and during real-time where it was necessary to have the flexibility to be able to detect anything unexpected and recover from it at any time in a real mission. To meet the challenge, the software was developed with an ongoing, overarching focus on finding ways to capitalize on the asynchronous and distributed functionality of the system at large in order to perfect the more systems-oriented aspects of the flight software. Such was the case with the flight software’s system-wide snapshot rollback capabilities and priority displays together with its man-in-the-loop techniques. Our software was designed to be asynchronous in order to have the flexibility to handle the unpredictable, and in order that higher priority jobs would have the capability to interrupt lower priority jobs, based on events as they happened (especially in the case of an emergency).
Each mission was exciting in its own right, but Apollo 11 was special; we had never landed on the moon before. Just as the astronauts were about to land on the moon, everything was going according to plan until something totally unexpected happened. All of a sudden, the on-board flight computer became overtaxed. The software’s priority displays of 1201 and 1202 alarms interrupted the astronaut’s normal mission displays to warn them that there was an emergency, allowing NASA’s Mission Control to understand what was happening, and alerting the astronauts to place the rendezvous radar switch in the right position. The priority displays gave the astronauts a go/no go decision (to land or not to land).
It quickly became clear that the software was not only informing everyone that there was a hardware-related problem, but that the software was compensating for it. With only minutes to spare, the decision was made to go for the landing. The rest is history. The Apollo 11’s crew became the first humans to walk on the moon, and our software became the first software to land on the moon. An explanation of what happened, and the steps taken by the on-board flight software to “continue on” to landing are briefly described in my letter to the editor, “Computer Got Loaded”, published in the March 1, 1971 issue of Datamation.
The development and deployment of this functionality would not have been possible without an integrated system of systems (and teams) approach to systems reliability and the innovative contributions made by the other groups to support our systems-software team in making this become a reality. The hardware team at MIT changed their hardware and the mission planning team in Houston changed their astronaut procedures, both working closely with us to accommodate the priority displays for both the command and lunar modules for any kind of emergency and throughout any mission. In addition, the people at Mission Control were well prepared to know what to do should the astronauts be interrupted with the priority displays.
Since it was not possible (certainly not practical) on Apollo for us to test the software “before the fact” by flying an actual mission, it was necessary for us to test our software by developing a mix of hardware and digital simulations of every (and all aspects of an) Apollo mission which included man-in-the-loop simulations (with real or simulated human interaction); and variations of real or simulated hardware and their integration.
Astronauts who have walked on the moon often describe a certain listlessness once they get home. As an engineer key to that achievement, are you left with a similar feeling? What sort of feeling follows?
MHH: Of course, I would be hard put to even begin to compare feelings of my own to that of an astronaut who walked on the moon! Do you mean by a certain listlessness that I may have experienced a letdown or feeling of depression, because of the fact that nothing could ever follow that could be as exciting? I do not remember a time, following a major event (like landing on the moon) or a major project (like Apollo), when there was a real chance (or when I took a chance) to reminisce and miss the action. There was always something happening immediately thereafter that seemed to be exciting in its own right.
I have always been more “wrapped up” than not, with wasting little time in capturing lessons learned from an experience and doing something about it so that we could apply that knowledge on the adventures to follow. Towards this end, I have found that it helps to focus on learning from the past, not living in it. There was always an adventure to follow that would have its own kind of excitement. I do want to say, however, that what we have been doing over the years with our computer science-related work is much more exciting because of the lessons we learned from Apollo.
Describe your work on Skylab and the space shuttle.
MHH: Skylab was a continuation of the Apollo command module on-board flight software, with new software added for new Skylab requirements. We defined systems software requirements for the Skylab and the Space Shuttle on-board flight software as a result of many of the lessons we had learned from Apollo. Among other things, we performed an empirical study of the Apollo on-board flight software development effort, resulting in formalizing lessons learned. Part of the requirements for Skylab and the Space Shuttle originated from this work.
As a pioneer in the field, what would you say is your greatest contribution to the discipline of computer science?
MHH: For whatever success I have experienced in my work, the credit goes not only to those I have learned so much from and have worked with, but also to the errors I have had the opportunity of having had some responsibility in making, without which we would not have been able to learn the things we did — some with great drama and fanfare, and often with a large enough audience to not want such a thing to ever to happen again!
Having been through some amazing experiences such as those involved with the Apollo on-board flight software, one could not help but do something about learning from them. With initial funding from NASA and the Department of Defense (including the Air Force, the Navy, and the Army), we performed an empirical study of the Apollo effort. This resulted in a systems theory, based upon a concept of control, that has continued to evolve based on lessons learned from Apollo and later projects. From its axioms, we derived a universal systems language together with its automation and its preventative development paradigm.