A problem-solving pattern
You have a network problem. Where do you start to solve the problem? This tip excerpted from an article on InformIT by Pat Eyler, author of
Networking Linux: A Practical Guide to TCP/IP, a set of guidelines for approaching and solving networking problems in a logical manner.
A pattern is just that: It is not a firm set of rules--it's a set of guidelines. If you follow a troubleshooting method consistently, it will help you to find solutions more easily. You will be able to zero in on the root cause of the issue and quickly resolve it. One nice thing about this pattern is that it is neither Linux- nor TCP/IP-specific. You can apply it to a variety of problems--I make no promises about in-law problems, though.
To try to set this pattern into context, each step of the pattern is described in its own section. Nine steps are involved in the pattern.
Step 1: Clearly Describe the Symptoms
There's no good way to attack a problem until you know what the problem really is. Far too often, system and network administrators hear a rather poor (if not outright misleading) description of the problem. It's then your job to dig in and find out what's really going on.
Step 2: Understand the Environment
When you have a clear description of the symptoms, you must be able to understand the environment that the problem occurs in to effectively troubleshoot it. Gaining this understanding is really a twofold job: It requires both identifying the pieces involved in the problem and understanding how those pieces should act when they are not experiencing the problem.
Step 3: List Hypotheses
Having made a list of the affected systems (in Step 2), we can begin to list potential causes of the problem. It's safe to brainstorm at this stage because we will be narrowing our search later. In fact, it is better to be overly creative here and end up with extra hypotheses than to miss the actual cause and chase blind leads.
Step 4: Prioritize Hypotheses and Narrow Focus
This is the step where we stop making work for ourselves and start making our jobs easier. Although we've just made a list of things that could be the problem, we don't want to research every item on the list if we don't have to. Instead, we can prioritize the potential causes and chase down the most likely ones first. Eventually, we'll either solve the problem or run out of possible causes (in which case we need to go back to Step 3).
Step 5: Create a Plan of Attack
Now that you've identified the most likely causes of the problem, it's time to disprove each of the possible causes in turn. As each of the potential causes is eliminated, you narrow your search further. Eventually you will reach a problem that you can't disprove, and your most recent attempt will have corrected the problem.
Step 6: Act on Your Plan
With a plan in place and reviewed by those with a stake in solving the problem you're prepared to act.
While you're acting on the plan, take good notes and make sure that you keep copies of configuration files that you're changing. Nothing is worse than finishing off a series of tests, finding that they didn't solve the problem, and then discovering that you introduced a new problem and can't easily back out your changes. It can also be disheartening to have insufficient or misleading information to report at the conclusion of your test.
Step 7: Test Results
You'll never know whether your test has done anything without checking to see if the problem still exists. You'll also never know whether you've introduced new problems with your changes if you don't test. Testing gives you confidence that all is as it should be.
Step 8: Apply Results of Testing to Hypotheses
This is the pay-off step. If your testing has isolated and solved the problem, you're almost done. All that remains is to make the changes introduced in your test a permanent part of the network. If you haven't solved the problem yet, this is where you sit down with your results and your list of hypotheses to see what you've learned.
Step 9: Iterate as Needed
Most often, you won't need to go all the way back to Step 1 or 2. Instead, you'll be able to go back to Step 4 to reprioritize and refocus. You might find that the things you learned in your most recent test point you in a slightly different direction. It is also possible that you will find another possibility in this case, you can jump back to Step 3 and add it to your list.
If you've completely run out of possible causes or found additional information, you might even want to go all the way back to Step 1 and restate the problem just to make sure that you've not missed the mark completely.
To read this article in its entirety go to InformIT. Registration is required, but it's free.
Click the title to learn more about Networking Linux: A Practical Guide to TCP/IP.
Did you find this tip useful? E-mail and let us know.
This was first published in July 2001