Classification of Faults : Software & Hardware

Classifying problems Faults need to be classified according to their criticality – this is what sort of impact a specific problem may h... thumbnail 1 summary

August 02, 2017

Classifying problems

Faults need to be classified according to their criticality – this is what sort of impact a specific problem may have on business operations. The questions that will need to be answered may include:

· How critical is this problem?

· What is the impact on the overall operations of a business?

· Should the contingency and disaster recovery plan be enacted?

· Does the business have the expertise to deal with the problem and provide a satisfactory solution?

Problems that are regarded as non-critical (low criticality), won’t represent a threat to the daily operations of a business. Operations will continue with some level of disruption. This disruption may affect a standalone system, a series of systems or an entire network. An example of a problem regarded as non-critical would be an Internet server going down due to a hardware failure – this is certainly non-routine, but assuming that the business does not use the Internet for their core business operations, business operations may continue, but without Internet access.

Problems that are regarded as critical are certainly serious. These problems have the potential to seriously impair the function of a business. These types of faults will generally require IT personnel to enact a contingency and disaster recovery plan. Business that are not prepared for these types of faults and that have not formulated a sound contingency and disaster recovery strategy will suffer serious consequences, including a total halt of business operations and loss of revenue. An example of this type of fault would be an inaccessible database server holding inventory, ordering and sales data, without which business cannot proceed.

Quite often, IT support managers and supervisors are responsible for assessing the criticality of faults. Many companies have different scales for representing criticality. The following is a suggestion of how this could be implemented:

Table 1: Sample scale for representing criticality of faults

Criticality Level or Risk	Definition	Disaster Recovery
1	High potential impact to large number of users It involves network/system down time	Enact Disaster Recovery Plan
2	High potential impact to large number of users or business critical service. May result in some down time	May require enacting Disaster Recovery Plan
3	Medium potential impact to smaller number of users or business service Resolution may require some down time.	Disaster Recovery Plan enactment not warranted. Remedial action required.
4	Lower potential service or user impact. Change may require some down time.	Disaster Recovery Plan enactment not warranted. Remedial action required.
5	No user or service impact. No down time.	Disaster Recovery Plan enactment not warranted. Remedial action optional.

Hardware faults

Apart from faults being classified as critical and non-critical, you will need to use other classifications in order to aid the troubleshooting process. One of the typical classifications of faults is whether the source of the fault is a hardware device or component, or whether the source of the fault is found on software – system or application.

Hardware faults are reasonably easy to troubleshoot, as the symptoms of the fault are fairly obvious. For example, if the power supply unit of a computer fails, the computer will not power up. Sometimes though, hardware faults can be difficult if the fault and symptoms only appear intermittently – that is, the fault is not present all the times. For example, some hardware components only develop faults under certain conditions, such as when the temperature of the device reaches a certain threshold.

Hardware faults sometimes can be rectified fairly quickly, by replacing the failed component. Usually, technicians will have common Field-Replaceable-Units (FRU) available. FRUs are simply common components that can be replaced on the field with reasonable ease. Examples of FRU may include:

· Hard Disk Drives

· Floppy Disk Drives

· Optical Drives (CD, CDR, DVD etc)

· Memory (RAM)

· Sound Cards,

· Video Cards,

· Keyboard & Mouse

· Network Interface Cards

· Network Patch Leads

Software faults

As you might have guessed, software faults are those faults that are caused by a software component. The software component may be part of the system’s software or may be applications software.

Software faults sometimes can be tricky to troubleshoot. Even though the source of the problem is found to be software, not always it is crystal clear which software component is actually causing the fault.

System Software Faults – are those faults that are caused by system software. Generally speaking, the operating system is regarded as system software. However, some application software might also install some system components it needs to run, which could become [and quite frequently are] the source of faults. The source of software faults can be caused by:

· Software components corruption

· System incorrect configuration

· Documented and undocumented bugs

· Compatibility issues (hardware and software)

System software faults can have system-wide implications, which might hinder the operations of the whole system.

Application Software Faults – these types of faults are rooted in application software components. Generally, these types of faults only affect the application software in question – the rest of the system operates normally. Similar to system software faults the source of these faults can be tracked down to one or more of the following reasons:

· Software components corruption

· Application incorrect configuration

· Documented and undocumented bugs

· Compatibility issues (hardware and software)

Security-related faults

These faults are faults that develop in systems, and might have their source in hardware, software, configuration or design.

More often than not, security related faults are the consequence of:

· Other faults (for instance, a hardware fault with a firewall device might expose systems that would normally be protected by the firewall device)

· improper configuration,

· un-patched software bugs

· system design flaws

· undiscovered security holes/backdoors

Generally, the occurrence of any of the above issues, will result in security being compromised, possibly exposing confidential and private information. Generally, to rectify this type of fault requires engaging personnel with expertise in the area.

Security faults are sometimes referred to as ‘exploits’ since, the security fault does not in itself represent a real threat unless someone malicious discovers and chooses to exploit the fault. It is imperative that proactive action be taken to minimise the effect of security compromises.

Boot time faults

Boot time faults are faults that occur during the start-up sequence of a computer system. Boot time faults are critical in that they can potentially halt the boot sequence possibly halting the system altogether, rendering it unusable.

Boot time faults can have their source in software – usually due to improper configuration, missing system files or incompatibilities (usually after new software has been deployed), or hardware – usually due to boot device (typically hard disk drive) failure, or other major component failure such as RAM, Video etc. Failed hardware peripherals might have an impact on booting up, but not necessarily halt the system or make it unbootable.