The challenge was to understand how various stakeholders use Work Authorization Documents (WADs) for building and maintaining the Space Launch System at Kennedy Space Center. Discover pain points and propose a solution.
We had four months to conduct research on Work Control processes at Kennedy Space Center [KSC]. All work that is done to support NASA launches at KSC is documented in the form of Work Authorization Documents [WADs]. There are over 1,300 WADs released and currently being executed for the Space Launch System [SLS], and that's just for the unmanned flight expected to launch next year. Each WAD represents a portion of the work necessary to assemble and launch a rocket. Many different researchers, engineers, and program managers participate in the production of these documents, and technicians are the last step of the process: they execute them as written. We will not see SLS's first launch until these 1,300 WADs are executed seamlessly.
Our team went to Kennedy Space Center to talk to stakeholders, primarily engineers, program managers, and technicians. Each interacts with WADs daily with different purposes in mind. We did affinity diagrams to understand organizational sentiment towards the documents and identify primary users. We also produced various other documents such as a sequence flow describing how different stakeholders interact to write, approve, execute, and revise WADs. We identified technicians as our primary user and supplemented research to understand their domain more specifically.
I conducted interviews and read literature to understand the technicians' process at Kennedy, at other NASA facilities, and within analogous contexts similar to NASA. I also looked at several WADs that were executed (with time-stamps throughout the process) to create an accurate sequence flow of how a WAD is typically followed, from the technician's point of view. This research supplemented my teammates' research, helped us identify narrow our scope, and produce a functional prototype during the summer of 2018.
74 Documents Reviewed
15 Site Tours
Technicians here handling Hypergolic fuel. This fuel is extremely flammable, corrosive and toxic. This compounds the safety issues that personnel must be aware of.
WADs are basically instructions. In a lot of ways, they're similar to the instructions you follow when you put a LEGO set together, or follow a cooking recipe. As you'd probably imagine, they're a lot more complicated though. There is much more at stake when you build a rocket than when you cook a quiche. Before a WAD is executed, it has to be seen and approved or modified by a variety of stakeholders, such as safety personnel. The rigor of this process is intended to prevent another disaster such as Challenger. In the interest of mitigating risk, WADs are intended to be thorough, and safety information is easy seperated from other information.
WADs are basically a list of operations, broken down into individual steps.
In some instances work is done in a digital system using iBASEt Solumina. However, our researched showed that many of these WADs are still printed on paper and given to technicians. The WADs we had access to were printed, so the information described here is based on the printed structure of WADs.
Operations theoretically contain all the information necessary to excute a WAD. This includes safety notes, engineering notes, and steps. It also includes buyoffs, which are assigned to technicians. A buyoff is a way to assert that the work has been double checked and verified. Lastly, there are tools are measurements listed out in the preferred units they should be collected in. The intent is to be thorough and unambiguous.
A WAD goes through a long and straining process before it is executed. Remember, the primary purpose of this process is to migitate risk.
Notice Alterations is highlighted. Its not a step, but its a common part of work execution, and a big pain point. There's more on this topic later.
A WAD goes through the hands of many stakeholders before it actually gets in the hands of a technician who executes it. The chart below describes this. Basically, contract engineers are handed specs from manufacturers such as Boeing. They then write work instructions that are iteratively amended and approved by other actors when they are determined to be relevant. Once this is approved, a program manager will schedule the work to be executed based on numerous factors. Example factors may be dependencies, facility availability, the availability of technicians who are qualified to execute the WAD, or the availability of quality and safety personnel who must be present and be a "second set of eyes."
A closer look at the stakeholders involved in a WAD. Some of the steps are compressed to help visualize this process. Engineers and technicians are highlighted in blue since they are the key stakeholders we're studying.
The SLS program is 75% over budget, and almost 2 years behind schedule. WAD process is a major part of this problem. There are many downsides, and many different perspectives on the downsides, so it's best to represent them that way.
In terms of human resources, there is a combination of many newcomers to Kennedy, as well as many veterans who have been around since the dawn of the Shuttle program or perhaps even the Apollo program. As far as the newcomers, theres a lot of onboarding and training that needs to be done. For the veterans, they come to the SLS program with a lot of heritage knowledge. That is, they might automatically approach a new problem using old methods or techniques, which can be very problematic. If there is a right way to do a process, it needs to be made explicit for the sake of newcomers, as well as veterans.
It's rare for a WAD to be executed seamlessly, as written. There are many factors that may prevent that. This could be human errors, such as a technician making a mistake that needs undoing, or an engineer using ambiguous or incorrect terminology in the WAD. Other times the error is secondhand: parts show up broken or even incorrect. Lastly, there's uncontrollable issues, such as bad weather. Without a contingency plan, these factors make a WAD impossible to execute as written. Technicians can't simply "wing it" in this situation, or otherwise do undocumented work. Far too much is at stake. To proceed, they need what is called an alteration. This is single handedly the biggest bottleneck in the process.
It's never like this.
It tends to be more like this.
Facilities rarely do work control or processes consistently. Some facilities may use the digital work systems, while others still use paper. When things go wrong, the process is even messier between organizations. Some groups such as the Launch Control Center are structured to handle things efficiently and accurately when things don't go as planned. Other facilities have messy processes. We've even heard of people taking a paper WAD and going on a bike to get signatures in order to proceed with work.
How might the WAD process relate to other domains? The reality is, it's never 1:1. The most directly analogous domain would be manufacturing. In this case, there are processes such as Lean Manufacturing. However, these processes tend to be specialized for mass production. In the case of a NASA rocket, its only built a handful of times.
One of the main problems with this lack of iteration is that it's hard to predict what might go wrong.
Knowing that many manufacturing processes don't perfectly translate to the way work is done at Kennedy,we looked to other domains and products to see what we might learn. This spanned across different cases as predictable as IKEA instructions, to those as bizarre as paralegal research. Here's a brief explaination of the particularly strange domains:
After interviewing, touring, and reading as much analogous research as we could get our hands on,we proceeded to do affinity diagramming. For those unfamiliar with the process, we took research notes and clustered them into insights. Those insights are then clustered again into domain-level or organizational notes. Finally, the organizational notes are clustered into high level themes. Although this process is subjective, it inevitably shows which ideas are strongly backed by research, and which are not. Starting from 582 primary notes, or those directly transcribed from interviews, we performed the meticulous process. Only after running out of these did we supplement the wall with analogous research notes.
At this point in the project, we were seeking depth, we still have 50 stories that were still equally weighted. We certainly couldn't ask our client to review all 50 stories and choose their favorite. Our solution was to generate several meterics to narrow this list down. We first looked through the stories individually, because we felt this would help identify what made certain stories interesting. This, we believed, would help us identify better metrics. We also used traditional metrics, as well as ones aligned with our client goals. Also, a handful of our metrics were derived directly from primary research.
The client overwhelmingly liked the concept of using git-like "branches" to keep track of WAD versions. At a more granular level, parts and procedures could be modularized and branched.
The idea of capturing tribal knowledge was really exciting to our clients. They also mentioned the problem of data bloat with respect to this story. In this case, our feedback was more pointed towards pitfalls to avoid:
It took us another additional month of exploring concepts through design before we saw a concept that demonstrated true value, as well as room to scale up. That concept was capturing tribal knowledge.
While there is plenty of thorough documentation at KSC describing what to build and what procedures to follow, there's little organizational insight into how technicians perform the work described.
The primary product objective is for technicians to capture and reference tips for how to execute processes, or work with certain materials.
An easy way to understand this concept and the design opportunity is to substitute the phrase tribal knowledge, with craftsmanship. As people become proficient in their work, they tend to have tacit knowledge about how to perform procedures. Take cooking for example: following a recipe isn't the same thing as having good knife chopping skills, but both types of knowledge can be transferred. This analogy extends to our product. Our goal was to be thoughtful about where tacit knowledge might be applied in work procedures, and allow veterans to capture that knowledge, and for newer technicians to reference it.
following a recipe isn't the same thing as having good knife chopping skills, but both types of knowledge can be transferred.
In order to understand the legitimacy of our solution, we put it into a several daily contexts. In one scenario, we focused on a technician who used to work on the Space Shuttle. His experience working on a part contradicts what the instructions tell him. His knowledge is either superior to the instructions or obsolete. Either way, it's important for the organization to understand the discrepancy and document it.
Capturing tribal knowledge largely depends on what type of devices and tools technicians are bringing with them to perform work. We wanted our design to be independent of this. So, we focused on creating a cross-platform app that would allow technicians to reference tribal knowledge. For our scenario to work, we imagine a threshold amount of tribal knowledge has already by captured. We then design for the moment of referencing it.
This concludes the research segment of the product. Over the summer, we spent about a month developing this idea into a functioning prototype. You can read more about that work in the proceeding case study.