A story from my engineering days at Cisco about how a lack of trust led to a default assumption that “they screwed up.”
They Screwed Up: They Followed Our Specification
Here is a story from when I was working at Cisco about how a lack of trust slowed down the solution to a problem.
We had been shipping a successful new mid-range router for about a year when we were suddenly faced with a line stop. The boxes were failing final test. The product team was working on new versions and now faced a fire drill: find and fix a problem on a box already in volume production.
A quick investigation showed that a cost-reduction effort had phased in a new memory part from a different supplier. The design engineers, in the thick of a new design, were frustrated that they had to go back and undo what they believed to be a short-sighted decision by the supply base team. “It’s typical. They try to shave a few cents off a part and buy one that doesn’t meet spec.”
I was the acting manager for the component engineering team, and we immediately requested some of the new parts to test and start a root cause analysis. A few days later, David, the senior component engineer, called me, “Murphy, these parts meet spec.” David was a conscientious engineer, but I knew that the design engineers would not believe him. So I said, “Pull your notes together and let’s sit down this afternoon with Chuck and Tom in the lab.”
We met and walked them through the testing we had done. They were not satisfied, and we ran some more tests on new parts, which they also passed. Then David replaced a memory part in an existing box with a part from the new supplier, and the box, which had been working, now failed the bootup diagnostic.
So we had a part that passed the spec but caused the design to fail. Tom said, “They screwed up: they followed our specification.”
Based on the nature of the system failure David suspected the problem had to do with one key performance parameter for the memory. He went back and tested several of the older parts. They were well inside the margin for the specification of this parameter.
At this point, the design team went back and studied their system timing and realized they had a problem with the old and next-generation designs they were working on. So they changed the production design to allow it to work with parts that satisfied the specification and incorporated this into the new designs.
I think because the supply base folks had been less involved in the product team up front, they were viewed as less trustworthy and more likely to make a mistake. They thought less like engineers and more like cost accountants, they were measured differently than the design engineers, and most of the conversations involved either criticism of the design engineers’ sourcing decisions or requests for work on additional sources.
Three take-aways
- Be methodical in your troubleshooting, especially when there is a lot of pressure to find an answer. Document the steps so that others can follow your test and reproduce them (this is less work than proving you reached the correct answer).
- Always seek common ground, especially when there are others on the team with different skill sets and frames of reference, communication problems, primarily due to a lack of shared context, cause more problems than errors.
- Have a plan for how you will troubleshoot your contribution to the design. If the problem is serious, consider starting to work on this plan before you are presented with solid evidence that you need to re-verify your work.