Network management troubleshooting |
 |
|
09 Jan 2004 | AlterPoint, Inc. |
 |


|
This text is excerpted from the eBook Tips and Tricks Guide to Network Configuration
Management, Chapter 3: Network Management Troubleshooting
The book from which this chapter is excerpted presents tips and tricks for four network
configuration management topics. For ease of use, the questions and their solutions are
divided into sections based on topic, and each question is numbered based on the topic,
including Topic 1: Change Management Best Practices, Topic 2: Network Management Security,
Topic 3: Network Management Troubleshooting, Topic 4: Change Management Techniques, Topic 5:
Selecting and Deploying a Network Device Management Solution, and Topic 6: Enterprise
Network Device Management.
To download/read the eBook in its entirety, visit: http://www.alterpoint.com/ebook
Topic 3: Network Management Troubleshooting
Q 3.1: What is the first step toward fixing a router that
isnt working?
A: The first question you should ask is "What changed?"
Very few network devices go belly up on their own; youll find
that it usually requires human involvement to really screw things
up. Assuming that youve eliminated some kind of hardware failure
as the cause of the problem, the culprit is most likely a recent change
made to the devices configuration. Of course, if the hardware
is at fault, you simply need to replace the hardware and restore your
configuration from a backup.
Restoring from a backupyou do have a backup of the routers
configuration, dont you?is a good first step even if the
hardware is fine. Ideally, the backup configuration will resolve the
problem, and you can use a tool to compare the old and new configurations
to determine the differences. Thats not exactly troubleshooting
the problem, but unless youre working in a lab, your goal should
be to restore the device to operation first, and figure out what caused
the problem later.
| >> One change at a time, please! The
idea of using a known-good backup to recover from a device failure
only works if you tend to make a small number of changes at
a time, let them settle to ensure that theyre working
properly, then immediately make a backup. If youre in
the habit of making a raft of changes at once, youll have
a much more difficult time tracking down the change that caused
the problem. |
If you dont actually have a recent backup, shame on you! Hopefully
you have change management documentation that describes the changes
that have been made to the router in recent memory. Start examining
those changes to see which ones might apply to the problem youre
having. If necessary, manually undo each change, one at a time, until
the problem goes away.
Other changes might involve a device operating system (OS) upgrade
or patch. In such cases, you should never make a change without understanding
how you can rollback to the prior (working) version of the OS. If
necessary, keep a spare router on hand in case the OS upgrade or patch
kills your production unit. The goal, in any event, is to not worry
so much about troubleshooting the current problem, and to simply fall
back to the last configuration that worked.
Keep in mind that not all changes need to involve the routers
configuration files or OS. For example, perhaps your company recently
hired someone to straighten out that rats nest of a wiring closet,
and that person accidentally plugged the router into the wrong subnet
when he or she put the closet back together. The wiring closet change
should have been documented as a network change, and would tip you
off that you need to check out the routers interfaces to see
what theyre plugged into.
>> Theres no such thing as a
minor change! Every single change to your network devices should
go through your change management process. No change is too
minor. Weve all heard the story about the technician who
blew dust out of a routers cooling fan. He blew hard enough
to stop the fan, causing the router to overheat and restart
itself at seemingly random intervals. Had that simple maintenance
actioncleaning out the routerbeen logged as a change,
a senior administrator might have guessed that the problem was
in the cooling fan, and checked that out first for a speedier
resolution to the problem.
|
Of course, if you dont have a change management program in place
or, at least, a backup of the routers configuration, youre
out of easy options. Youll need to start troubleshooting the
problem the hard way, which might eventually involve completely reloading
the routers factory configuration and rebuilding your configuration
from scratch. Such drastic measures highlight the importance of both
backups and a solid change management methodology.
Q 3.2: How can change management contribute to improved network
performance?
A: Managing large networks is a complex, difficult task. Suppose
you took a job at a large corporation with tens of thousands of
users spread across dozens of offices. Your job, you're told, is
to find out why network performance is slow. Where do you start?
You could whip out your network analysis tools and start analyzing
bandwidth utilization, broadcast traffic, router load, switch bandwidth,
firewall utilization, and so forth, but doing so would require tons
of time and might never point to a real performance bottleneck.
If you do find a bottleneck, all you could really do is start shooting
in the dark, making device configuration changes in an attempt to
fix the bottleneck. More often than not, that practice simply reveals
additional bottlenecks, creating an unending process of network
configuration changes that never really improve performance. If
you're after actual results, your best starting place is gathering
some basic performance trend information and analyzing the network's
change-management log.
If you can pin down a rough point in time when performance started
to become less than optimal, you can start analyzing the changes
that were made to the network's infrastructure devices around that
time. You might discover, for example, a switch to a less-efficient
routing protocol, or you might find that the routers connecting
the various offices are providing packet filtering services. You
might discover incorrectly configured multicast boundaries that
are resulting in excess WAN traffic. Regardless, the configuration
history can point to potential problems that contribute to the network's
current condition. Discovering those problems empirically could
take weeks or more, but finding them in the configuration history
can be much, much easier.
The fact is that modern networks are becoming too large and too
complex to manage as a single unit. Instead, you have to manage
them in bits and pieces, and you have to manage them in small chunks
of time. For example, suppose your company is getting ready to make
a whole series of network device reconfigurations designed to improve
performance or simply designed to increase network addressing capacity.
Before making the changes, you can take a complete set of performance
measurements. By taking another set of measurements after the changes
are complete, you can determine the performance impact of the change,
and relate those changes to specific configuration changes from
the configuration history. You're not attempting to manage the network's
overall performance. Instead, you're simply trying to manage the
performance delta, or difference between the two configurations.
Some administrators refer to this process as managing in increments,
and it's an effective way to keep on top of large, complex networks.
Of course, managing in increments is only possible if you have
a solid change-management process in place. The change-management
process provides some important capabilities:
- Change management provides a logical
checkpoint, allowing you the opportunity to take performance
measurements before and after a discreet set of changes
- Change management provides a history,
enabling you to compare before and after configurations and
relate them to measured performance changes
- Change management provides a rollback
mechanism, making it easier to revert to a previous configuration
if the performance of a new configuration isn’t what you
desired.
Ideally, you'll have access to software that can help gather and
maintain device configuration information for historical and analytical
purposes. That software might even allow you to store performance
measurements so that you can save a performance baseline with each
set of changes, defining a point in time at which that performance
was measured and relating it to the device configuration that resulted
in the performance.
Q 3.3: What are some industry best practices for troubleshooting
network devices?
A: Network devices have been around a long time, and the technology
industry has developed several best practices that make troubleshooting
easier and often let you avoid the need to troubleshoot altogether.
As author Scott M. Ballew states in his book Managing IP Networks
with Cisco Routers (O’Reilly and Associates), “The
best way to handle network problems is to avoid them.”
Here are some additional tips I've picked up over the years:
- Create detailed documentation of
your network’s physical connections. One of the most common
reasons for network downtime is swapped cables, and a detailed
map of which wires go where can be a huge benefit during troubleshooting.
Given the alternative—tugging on wires until you figure
out where they go, making documentation is a great investment
in time
- As I’ve described in other
tips, document every change you make to network devices’
configurations, and have backup configurations ready in case
a change backfires
- Your first troubleshooting step
should often be to simply undo whatever it was you did last.
Backup configuration files can make doing so very easy and will
let you review the problem-causing changes at your leisure
- Make as few changes as possible
at a time; that way, if problems occur, you’ll have fewer
changes to sort through to find the cause. How long you wait
between changes is a matter of personal taste; I like to wait
at least 1 week so that my network can experience the full range
of a week’s workload before I certify the change as a
success. Of course, in a busy network environment that uses
the latest technologies, limiting your workload can be difficult
or impossible, making third-party change-management tools all
the more valuable.
Experienced administrators have learned these tips through trial
and error. You likely have a few other common practices you follow
in your environment to keep things running smoothly.
Q 3.4: How can I determine whether a new product or a consultant
makes changes to our network devices?
A: Large companies are likely to have any number of consultants
and contractors running around on different projects at any given
time. Some of them might have the authority to make changes to your
network devices, probably with the understanding that they document
any changes they make. However, there’s always a change or
two that gets made right before the weekend that doesn’t make
it into the documentation.
In addition, it’s possible for new software applications
to make changes to your network devices. Suppose you’re evaluating
a new network performance monitoring solution that needs to query
information from your routers. Or perhaps you’re installing
an enterprise management solution that needs credentials to access
your managed network devices. In these cases, the software might
make minor configuration changes to your devices without your knowledge.
That’s not necessarily a bad thing; the changes made by these
software packages are usually minor and simply make it easier for
the software to do its job. But you still need to know about those
changes in order to control your device change management process.
So what can you do?
Unfortunately, very few network devices are designed to automatically
notify an administrator when their configurations are changed. After
all, only an administrator should have the credentials to make a
change, so the devices quite reasonably assume that the administrator
made any changes and doesn’t need to be notified.
Manually Detecting Changes
Most higher-end network devices allow you to use Trivial File Transfer
Protocol (TFTP) to transfer the devices’ configuration files
to a TFTP server (I explained how to set up a TFTP server in tip
4.2). If you regularly dump your devices’ configurations to
TFTP and save the files, you have a baseline from which to check
for changes to the devices’ configuration. For example, suppose
you downloaded a router’s configuration into a file you named
Router5Feb03.txt. A contractor recently finished installing a new
enterprise management solution, and you want to see if any changes
were made to Router5. Just follow these steps:
1. Enter
telnet
routername
to Telnet to the router that you want
to back up (for this example, I’ll assume you’re using
a Cisco router; change the following commands
as necessary if you’re using a different device). Obviously,
you could also use the router’s
IP address instead of a name.
2. Log on to the router.
3. Enter
enable
and provide the correct password. Doing
so enters privileged mode and lets you access the router’s
configuration.
4. Enter
write
network
then enter the IP address of your TFTP
server.
5. Enter the name of the configuration file (I’ll use Router5Mar03.txt
for this example).
6. Press Enter to confirm the write. Ensure that the router responds
with an [OK] prompt after writing the configuration.
7. Enter
exit
to log out of the router.
Now you’ve got two text files, one with the old configuration
and one with the new configuration. You simply need to compare the
two. Assuming you’re running on a UNIX computer, enter the
following
Diff -abls
Router5Feb03.txt Router5Mar03.txt
If you’re using Windows, you can use a graphical version
of Diff, called CSDiff, which I mentioned first in tip 4.2. It’s
available from Component
Software and makes it much easier to spot changes between versions
of a text file. Best of all, it’s a free tool. Figure 3.1
shows how CSDiff highlights the differences between two text files.
Figure 3.1: Using CSDiff
to analyze
the differences in a router configuration file
Unfortunately, watching for changes manually is a lot of work.
You have to regularly monitor for changes on each and every network
device or you could easily miss one. Because the whole point of
this exercise is to pick up changes that you didn’t know were
being made, you need to have a change detection system that’s
a bit more automated.
Proactive Change Notification
Enter device change management software. Most of the big players in
this field, including AlterPoint DeviceAuthority, Tripwire, and Cisco’s
CiscoWorks can immediately notify you via email when a network device’s
configuration changes. These solutions run on a server, and periodically
(usually daily, although you can configure more frequent intervals)
download your devices’ configuration files. They then perform
an internal comparison—not unlike the manual Diff I used earlier—to
compare the most recent configuration with the last one they downloaded.
If they spot any changes, they generate an email to an administrator.
>> Software management solutions often
use a more sophisticated comparison than a simple Diff. Instead,
they create a cryptographic checksum of each version of a configuration
file. The checksum can only be the same if no changes were made
to the file; if any changes occur, the checksum is different,
and the software knows to investigate more closely to determine
exactly which changes occurred.
Using a checksum—rather than a line-by-line comparison—allows
these software packages to accurately and quickly compare configuration
files that might include thousands of lines of text.
|
Ideally, your change management software should allow you to configure
daily reports. That way, you’ll be able to carefully review
changes on a day-to-day basis rather than waiting a week or more
and having to review dozens of potential changes. For example, as
Figure 3.2 shows, DeviceAuthority provides a great deal of flexibility
in scheduling reports. You can also configure reports to be emailed
to multiple recipients. For example, I like to receive a copy of
the report myself, and I have another copy sent to my Help desk
manager for archival. Whenever we’re conducting a process
audit, a third copy is emailed to an auditor, who compares the report
to our official change log to verify our compliance with our internal
change management process.
Figure 3.2: Creating a
daily
schedule keeps you on top of unexpected device changes and is a
useful tool for auditing your change management process.
Although these change management software solutions involve additional
expense and require effort to deploy, they provide a much better
means of keeping tabs on your network devices than a manual process.
Automation on the Cheap
If you’re completely unable to implement a change management
software solution, you’re not completely out of luck. You can
still automate parts of the manual detection process and provide some
basic functionality for keeping track of unexpected changes to network
devices. Basically, you need to break down the process into its component
steps, and come up with a means of automating each step:
- Commanding devices to dump their
configuration files via TFTP. If you have any devices that don’t
support TFTP, you’re going to have a hard time automating
a means of retrieving their configuration settings. Software
solutions can pull configuration data from just about any kind
of managed device, so if you have a lot of non-TFTP devices,
you have one more argument for purchasing a software package.
- Comparing new and old configuration
files.
- Emailing the results.
Each of these tasks can be performed on Windows- or UNIX-based
computers, although the exact techniques obviously differ. Because
Windows is the most common desktop OS, I’ll focus on techniques
for Windows. Where possible, I’ll mention UNIX alternatives.
Automating the Configuration File Dump
You need to be able to script a Telnet session to automatically log
onto your devices and command a TFTP dump. Unfortunately, Windows’
built-in Telnet client doesn’t support scripting. However, you
can get a scriptable Telnet client, called Cybersource Scriptable
Telnet, from http://www.cyber.com.au/cyber/product/cybertel.
Another scriptable client, which I prefer, is the ZOC Terminal Emulator
and Telnet/SSH Client available from http://www.emtec.com. ZOC understands
a superset of the REXX scripting language, which make it a pretty
powerful automation tool. Use the scriptable Telnet client
of your choice to create a batch file. For example, suppose you
decide to use the ZOC client, and you create a script named GetRouter5.zrx.
This REXX script logs onto a particular router and commands it to
write its configuration to a TFTP server. You’d then create
a batch file, I’ll use Router5.bat as the filename, that contains
the following text:
ZOC
/RUN:SCRIPTGetRouter5.zrx
/U
Note that the /U parameter places ZOC into unattended mode, forcing
it to take the default settings for any prompts rather than hanging
and waiting for a reply.
After the batch file is ready, use Windows’ Task Scheduler
to schedule the batch file to run once a day, say at around 1:00
AM. On UNIX systems, you can use CRON to set up a similar automation,
using a scriptable Telnet client for UNIX. So every morning at 1:00
AM, this batch file will run and command the router to dump its
configuration to your TFTP server.
| >> If you have multiple devices (and
who doesn’t?), simply create a Telnet script for each
one. Include multiple lines in your batch file, with each line
executing the Telnet client and one Telnet script. The batch
file will then run through each device in turn, commanding them
to dump their configuration to TFTP.
|
Automating the File Comparison
You don’t want a fancy GUI to automate file comparison, so CSDiff
isn’t really appropriate. Instead, you want a basic command-line
Diff (like the UNIX guys have) that will output differences to a file.
You can get one from MKS.
The syntax to use is: diff
-ir -c folder1 folder2
The cool part about this utility is that it can compare all of
the files in a folder. So suppose you’ve stored your most
recent configuration files in a folder named Old, and you’ve
had your devices TFTP their current configurations to a folder named
Current. You could execute the following command:
diff -ir
-c Old Current > changed.txt
This command will compare each and every file in the two folders
and write the results to a file named Changed.txt. The results will
include each changed line, plus an additional three lines before
and after the change to help you locate the change’s context.
If you’re using this technique, it’s important that
your devices dump their configurations to the same filename each
time. Simply create a new batch file— probably on your TFTP
server, where the files are located—and schedule it to run
by using Task Scheduler. If you set it to run at about 3:00 AM,
that should give your first batch file time to complete.
Emailing the File Comparison Results
You’re ready to email Changed.txt, the file that contains any
changes found in your device configuration files. You’ll need
a command-line email utility, such as BySoft’s Command Line
E-mailer at http://www.bysoft.se. Create a third batch file with this
command:
—clemail -quiet
-from changes@domain.com
—to recipient@domain.com
—subject “Report”
—bodyfile changed.txt
—smtpserver mail.domain.com
—smtpport 25
Of course, you’ll need to type all of that on a single
line. Schedule the batch file to run
at about 4:00 AM, after the second file finishes running, and you
should have an email waiting in your
mailbox when you get to work.
So there you have it, a no-cost (or low-cost, depending on how
much you pay for the various utilities
you’ll need) solution for automatically detecting changes to
network device configurations and
emailing those changes to you in a daily report. It’s a lot of
work to set up, and you’ll need
to fine-tune it to work in your environment. After a while, I
suspect you’ll start looking at those
change management solutions with a new appreciation for the work
that they do!
Q 3.5: Troubleshooting network devices is complicated. Is
there a general framework that can make it easier?
A: There’s no industry-standard framework to make network
device troubleshooting easier,
but there are several resources that can help you develop a
framework that works in your environment:
As I’ve mentioned in previous tips, the best place to
start troubleshooting network
devices is to look at what has recently changed. You can usually
trace most device problems to a
recent configuration change that’s not working out as well
as you’d hoped; network
change management software or even simple text file comparisons of
device configurations can help
highlight recent changes and let you quickly focus your
troubleshooting efforts.
Q 3.6: What is the best way to start troubleshooting router
problems?
A: That’s a tall order! Routers are complex, powerful
computers in their own right, and can have
several problems: routing tables can be wrong, CPU utilization can
be high, network interfaces might be down,
passwords can be lost, or the router might simply crash.
The best way to start, no matter what the problem, is with a
step-by-step troubleshooting flowchart. Most
routers’ documentation includes basic troubleshooting
flowcharts, which are designed to help narrow the
problem as much as possible.
Most manufacturers, including Cisco, Nortel, and 3Com, offer flowcharts
for their devices and provide them for download from their Web sites.
For example, Cisco 7304 router troubleshooting is available at http://www.cisco.com/pcgi-bin/tsa7304/trouble.pl?tree=7304.
You start by selecting from a basic menu of problems (for example,
high CPU utilization, interface issues, IOS upgrade, line card issues,
password recovery, power, PXF feature support, router crash, and
startup). Suppose you were to select interface issues from the main
menu; the troubleshooter would walk you through a variety of questions
to narrow the problem:
- Are you using an ATM interface?
- What is the output of show interfaces pos?
- What encapsulation method-such as frame relay or PPP-are you using?
At the end, the troubleshooter displays a recommended solution. This might include links
to other portions of the
troubleshooting tree to eliminate or confirm potential causes of the problem.
Cisco also offers these flowcharts in PDF format so that you don’t
need Internet access to use them. For the 7304 router, you can download
PDF flowcharts by going to http://www.cisco.com/pcgi-bin/tsa7304/flows.pl?tree=7304,
then clicking Flow Charts in the left-hand menu.
>> Cisco offers flowcharts for most of its network
devices, and you can access
all of them from the support section of Cisco’s Web
site.
|
Q 3.7: We have a number of junior administrators, so we need
to make network device troubleshooting more of
a science and less of an art. What can we do?
A: You can create a sound troubleshooting methodology. To do so,
simply answer this question: “How do you find a wolf in Siberia?”
Sounds frivolous, but it’s a similar task to network device
troubleshooting, which can often seem to an inexperienced administrator
like looking for a needle in a haystack. The answer provides the
solution: Build a wolf-proof fence down the middle of Siberia, and
look for the wolf on one side. If he’s not there, divide what’s
left in half again, and repeat. Technically, the technique is referred
to as a binary search.
An Example Problem
Consider the network diagram that Figure 3.3 shows. Imagine that the
client using the laptop computer isn’t
able to communicate with the desktop computer in Office 1.
Figure 3.3: Sample troubleshooting problem.
This is a simplistic example, but it will serve to illustrate a
troubleshooting methodology, which can be
used for any problem, no matter how complex.
Identifying the Problem Domain
The first step is to simply make a list of everything that could be
causing the problem. Experienced administrators
do this in their head, but it’s worth writing down the list if
you’re just getting the hang of
troubleshooting. In this case, the list might include:
- Laptop unplugged
- Laptop network stack failure
- Desktop unplugged
-
Desktop network stack failure
- Router in Office 3 failed
-
Router in Office 1 failed
- WAN link failed
- DNS server not working
- Bad routes in Office 1 router
- Bad routes in Office 3 router
It’s important to make this list because doing so will rule
out elements that might seem to be
problems—such as the router in Office 2—that obviously
aren’t. Of course, the ability to generate
a list such as this example list requires a thorough understanding
of how the network is built (having
documentation such as the network diagram is invaluable) and a
thorough knowledge of how the network operates.
For example, if you don’t know how computers resolve names to
IP addresses, you might not suspect the DNS
server.
Breaking the Testable Systems in Half
Next, develop some logical means of dividing the land in half. In
this case, about half the potential problems
seem to be router-related, and the other half are client-related;
breaking the list along those lines creates a
basically even set of possibilities.
| Router Problems | Client Problems |
| Bad routes in Office 1 router |
Laptop unplugged |
| Bad routes in Office 3 router |
Desktop unplugged |
| Office 1 router failed |
Stack failure in laptop |
| Office 2 router failed |
Stack failure in desktop |
| Bad WAN line |
DNS server failed |
Figure 3.4 illustrates how this process effectively divides your suspect subsystems into
a logical half.
Figure 3.4: Dividing the suspect subsystems into half.
Now you need to build your wolf-proof fence down the middle by conducting a test.
Performing Tests
The only useful troubleshooting tests are those that allow you to definitively eliminate
some potential problem. For example,
suppose you determine that the laptop computer also can’t connect to a server in
Office 2. What have you proven? Well,
nothing, really. You can’t even say for sure that the Office 3 router is OK, although
it’s now less likely that
it has failed or has a bad route. In other words, you haven’t built a wolf-proof fence
at all.
Suppose, however, that you are able to connect to computers on the Office 3 network from
the laptop, and connect to
computers on the Office 1 network from the desktop. That’s a definitive test: you can
eliminate half of your suspect
systems from the list because you’ve proven that they work.
>> Stuck for tests? Go one-by-one. If you can’t
readily think of a test that will
result in your wolf-proof fence, you can just eliminate half
of the list on a subsystem one at a time. For example,
you can check the connections on both computers and ensure
that they can ping their gateways to ensure that their
stacks are functioning. You can use nslookup to test the DNS
server(s) to eliminate them from the list. However,
efficient troubleshooting requires you to be able to divide
the list in such a way that one or two tests can eliminate
half the list. That type of efficiency comes primarily with
string knowledge of how the network works and with good
old experience.
|
Divide, Conquer, Repeat
With half the list out of the way, you can start working on the other half. Figure 3.5
illustrates the systems you’ve
eliminated, including DNS servers at each office (shown in the diagram as Server1B and
Server3B), the client computers, and
their network connections.
Figure 3.5: Half the suspect systems eliminated, with just the green-colored
half to go.
Additional tests at this point could involve logging on to one of the two routers and
attempting to ping the other one.
That test, if it worked, would eliminate the WAN links as a potential suspect and let you
know that at least the routers’
external interfaces are up and running. You’d be down to a quarter of your original
list, and the odds would start
looking good for a bad route in one of the routers. Manually checking the routing tables
would let you know whether that was
the problem.
Shortcuts
In some cases, you might be able to go after the entire list of
suspect systems with one good test. For example, running tracert
from the laptop to the desktop will help you eliminate most, if
not all, of the suspect systems. If DNS has failed, tracert will
tell you so. If it’s a local connectivity issue, you’ll
see that in the results. If a router has a bad route, you’ll
see that in the results, too. A WAN failure won’t be distinguishable
from a failed router interface, but you’ll at least have narrowed
the list to two possible candidates.
>> Know your tools! Another trick to
performing this methodology is having thorough knowledge of
the troubleshooting tools at your disposal. Knowing what ping,
pathping, and tracert can do, for example, will enable you to
select the most effective test for eliminating a particular
subsystem.
|
Selecting the right testing tools can make all the difference, particular
with regard to efficiency. For
example, if you were following the troubleshooting path I’ve
been using, you might have spent an hour or so
figuring out that a bad route was at fault. Tracert, however, could
have brought you to this conclusion in 5
minutes or so. However, you would have found the problem either way,
eventually, proving that the methodology is
useful even to an administrator without years of experience.
Now It’s a Science
Where do most new administrators get caught up? First, they might
not completely understand how the network
functions, so they ignore suspect subsystems and spend their time
troubleshooting only part of the problem.
Second, they often don’t perform conclusive tests—they
might incorrectly eliminate a suspect
subsystem, and waste time looking for wolves in the wrong part of
Siberia.
It’s a simple methodology, one that experienced
administrators follow almost without thinking about
it—which makes it difficult to teach to newer personnel. To
summarize:
- Identify the actual cause of the problem
- List suspect subsystems
- Break the list into halves so that one half can be eliminated by one or two conclusive
tests
- Perform conclusive tests to focus on one half or the other; repeat the process by
splitting what’s left into half
- Ensure that all tests can conclusively eliminate something; essentially, all tests must
prove that something is either working or not with no room for question
This tried-and-true methodology becomes instinctive through
experience, but for less experienced technical
professionals, it can make the daunting task of network
troubleshooting more approachable, methodical, and efficient.
Copyright
2003 Realtimepublishers.com, Inc.
This text is excerpted from the eBook Tips and Tricks Guide to Network Configuration
Management, Chapter 3: Network Management Troubleshooting
To download/read the eBook in its entirety, visit: http://www.alterpoint.com/ebook
');
// -->
|
 |
|
 |