Uptime Magazine Cover Article

Its a privilege to work with the good people at Reliability Web in producing a cover article for the latest Uptime magazine. I hope that you enjoy the read! You can download a copy from here.

Let me know what you think?

Changing the Status-Quo

Many people have this habit of falling into a routine and are afraid to change or step outside the norm. But they also like following trends, and it takes bravery to start them. The rapidly expanding and evolving technology landscape requires that Reliability Engineers everywhere evolve with it, leaving no room for routine. Not everything can be fixed by the tools in your shed, like how not every problem can be fixed by the resources in your Reliability toolbox. Sometimes you need to venture out for something new. Is it daunting? Yes. Worth it? Definitely.

Two important questions

  1. What can we do about this problem, and how can we stop it from reoccurring?
  2. What else can we do?

I pop these both, one after the other, at every problem-related meeting. The first question usually has people think within the box, but the second one shoves them out of it, turns them towards a greater world of possibility. Inside the box, you can only “react” to problems when they find you. But outside the box, you can be proactive, ambushing your issues.

How do you know if you need to look for alternatives?

An aging plant will always find new ways to break, and “the way we’ve always done it” won’t always be a sufficient solution. Reliability Engineering should not be about repeating procedures. It should be problem solving.

If alternative solutions are necessary, you’ll observe the following:

  • Reoccurring problems
  • Previously unseen problems arising
  • Solutions suggested by your peers are repetitive
  • A need to save resources
  • A need to further increase plant output

How do you start looking for alternatives?

  • Consult the internet – By far the most obvious. It’s okay to not have all the answers, especially since they could be one search away.
  • Asking “dumb” questions – Yes, they may be embarrassing to ask, but you won’t get any answers by holding them in. What matters more anyway? Solving the problem, or avoiding a small moment of vulnerability?
  • Joining an industry group – One of the best ways to access valuable human resources. These are communities, and most communities are always willing to help their own. Someone may have experienced the very same issue and gained useful wisdom from it. Just put yourself in a position where you can potentially intercept that person.
  • Using Bisset’s Formula – Partially discussed in the previous blogpost. Make sure your plant is smooth, clean, cool, dry, correctly lubricated, and not overloaded.

My advice on how to implement these alternatives is the same as what I have preached previously many times. Get your team actively involved. Convince them that change is necessary, that it is worth it. Make sure they share your enthusiasm for the exciting and new, and together you can make revolutionary improvements to your plant.


Do you want to know more about what it takes to be an extraordinary reliability engineer who can effectively implement change in your place of work? Try our new Extraordinary Reliability Engineer course taught by highly experienced reliability engineer Peter Horsburgh. You can easily register on Eventbrite here.

Curing Problems

Last blogpost I talked about introducing important habits to your workplace. This week I will further explain why this will help you solve your problems. Think of it this way, your plant and/or machines are sick, and if you arm your colleagues with knowledge and good habits, they become the immune system created to eradicate “diseases” and “heal” what is broken and/or “infected”.

Human bodies are massive, and infections can hide, but less so if there are armies of leukocytes (white blood cells) stationed everywhere. Infections can also be tough to beat, hence why so many leukocytes are used. And some infections are so unfamiliar that leukocytes don’t know how to deal with them initially and thus need to learn how to. That is what you want your colleagues to be, an army of leukocytes learning to search for, recognise, and solve problems in order to keep the plant healthy together.

If you try to explain a concept in an overly complex way, it will pass through them like a ‘ghost’. You need to find a simple, memorable ways to explain it. This is a skill had by all great leaders. A guy called Wayne Bissett once told me you should run your plant “smooth, clean, cool, and dry” Short. Sweet. Roles off the tongue. People love it. If we do run our plants smooth, clean, cool, and dry, it won’t vibrate itself to bits, it won’t build up foreign contaminants, it won’t overheat, it won’t corrode your water ingress, and it will live a long life. The solutions to four major issues summed up in four words only. These simple, snappy phrases can help experienced people make clearer sense of what they already know, and explain it to those who are new.

Another thing I like about that phrase is how it can relate to RCA. It tells us exactly what a healthy plant should look like, and encourages us to ask the following question when searching for problems. Is the smoothness, cleanliness, coolness, and/or dryness being disrupted in some way, and by what? See how well that statement fits in many contexts? See how these simple, four words branch off into something else? That’s what you want, a core idea that people can use as a launching point. If they notice the four signs of healthiness being disrupted in some way, our leukocyte workers know to find a disease that needs attacking.

Allow me to demonstrate how best to communicate issues and ideas to your colleagues in a few sentences. Imagine you have a oil breather being used in a situation it is not suited for. According to our RCA, it allows in too many contaminants. Let’s call this one Breather X. We decide that Breather X is not working and decide to replace it with Breather Y which will allow our plant to run cleaner. Simple enough to explain right? I once heard someone say that people who truly understand something can simplify it. If you’re trying to explain something but can’t do so coherently, perhaps it’s a sign that you don’t understand the subject matter well enough, so you should ensure that is not an issue before you pass that knowledge on.

In summary, to cure defects, we need to effectively summarise and standardise what we do. Gaining a shared understanding is the first step towards implementing improvements across the plant. Make sure those standards are easy to understand, so they can be properly practiced. I will never stop stressing the importance and effectiveness of consistency and communication in the workplace.


If you want to know more about curing problems and other reliability-related subjects, why not register for our Extraordinary Reliability Engineers course? It will offer you all the knowledge and wisdom Peter Horsburgh wished he had long ago. If you’re interested, register for a free webinar at our Eventbrite here.

Learning a Natural Habit

In the previous blogpost I introduced you to Root Cause Analysis and the 5 Whys. These processes are most effective when being used by your whole team. I briefly touched on this in the last blogpost, but will elaborate on it here. How do you encourage your workplace to adopt new habits like the 5 Whys and RCA?

One of the ways I did it was by teaching our trusts and trades how to do the 5 Whys, so as to implement it at every level. We chose the 5 Whys because it’s a simple, easy-to-understand process that perfectly encapsulates RCA. We tailored our explanation to all three maintenance groups, ensuring they understood it in terms and analogies that were relevant to them.

To elaborate, we did a workshop for each group, where we each found the cause of a problem by working through the 5 Whys process as a team. It started out in groups where they worked with the instructors who offered hints and guidance, and then split them into pairs to work independently. Then we checked to see if they found answers that would lead them to a solution to the problem.

From there, we asked supervisors to check if the 5 Whys process had been completed on breakdown work orders. If they discovered something during the process that no one else knew about, new protocol commanded further investigation. Team meetings involved people sharing their 5 Whys process and results so all workers could understand the problem from all known angles thus far, allowing for an equal, confident understanding among them. This meant that machines were actually fixed rather than “held together” by means that only hid or treated the symptoms.

Through this method, we made RCA and the 5 Whys the norm by introducing it as a simple, yet integral part of the workplace. In summary, teach everyone the habit until they are confident with it, then actively encourage them to use it, then make it necessary to workplace processes. Note, they should find the habit helpful before you make it compulsory, otherwise they’d be wasting their time with a process that will not improve their work. This is about people adopting the same habit to work smoother together, after all.

Want to learn more about learning useful habits and implementing them into your workplace? Peter Horsburgh teaches that and more in his Extraordinary Reliability Engineer course. If you’re interested, you can register at our Eventbrite page here.

How to be a Reliability Detective

Gilbert the Reliability Detective
The Pain

So you have a problem, a recurring issue, and you don’t understand why it’s happening. Maybe your machinery isn’t as durable or efficient compared to other plants. Maybe your co-workers keep complaining after you think you’ve solved it. Something is wrong, something is causing this, and you don’t know what. No matter how far you try to stretch your brain, a reasonable conclusion never surfaces. Well, detectives never get far on sheer guess work. They need clues, evidence, and so do you. But how do you find them? There are two strategies. Root Cause Analysis, and the 5 Whys, both of which I will briefly outline here.

Root Cause Analysis

Root Cause Analysis (RCA) is a process used to determine the source of a problem. As I say in my book, it’s incredibly useful in Reliability because it helps you fix the issue at its source, rather than applying Band-Aid “solutions” just to cover the symptoms. As useful as this process is, many overlook it. They guess the reason behind a problem, or they formulate a solution without thinking about the core issue, and they work from there. Sometimes these guesses are spot on, or near the mark, but not always, and you shouldn’t rely on them. If you’re wrong, problems will persist. Instead, find your clues. Analyse data and trends. Monitor machine behaviour. Talk to your colleagues for suggestions on what to do, and ask for their observations. You’ll find things you can link together, that will lead you to the root cause. Don’t know RCA yet? Learn it and introduce it to your colleagues. You can find RCA tools for purchase online as well. Here is some to start with.

Asking Why

One excellent strategy to arrive at your root cause is to ask “why?” five times. Like RCA, this is a habit you need to introduce to your workplace. So how does it work?

  1. When you encounter a problem, ask why it occurred.
  2. Once you know, ask why that is the case.
  3. Repeat the above steps three to five times, jotting down any ideas that come to mind.

Allow me to reiterate the importance of educating your colleagues on this process. A team of people asking why casts a wider net to catch your problems more effectively than you alone. How do you introduce this process in the workplace? I personally recommend doing a hands on workshop. Tailor examples to each group you educate so they can best understand it. Ask the supervisors of said groups to check if the 5 Whys process has been used on any breakdown orders. This is a simple RCA process, and if you employ it right, you’ll soon discover how valuable a process it truly is.

So there are your two strategies to lessen the pain of not knowing the issue. Hopefully now there will be less guesswork and more clues that connect. RCA and the 5 Whys are like your spy glass, or your fingerprint dusting kit. So go forth, Reliability Detectives, and find your culprit –I mean– root cause!

If you would like to know more about how to solve an issue at your plant, there’s a course for that. The Extraordinary Reliability Engineer course is available for registration on Eventbrite now.

Chronic Issues – Plotting Trends

Problems always leave traces.

I’ve discussed chronic issues multiple times in previous blog posts. As a refresher, chronic issues are large problems that manifest from numerous small and easily missed issues. There are two basic steps to identifying your chronic problems. You need to find these smaller issues and look for trends between them. Allow me to elaborate on both these steps.

Finding the Dots

As said in a previous post, a lot of little issues are symptoms of a larger problem, the unknown chronic disease. You’ll need a fine, widely cast net to catch these smaller issues. If you miss them, or deem them too inconsequential to deal with, they will continue to build until you have a real mess. It’s like allowing hairs to wash down the drain in the shower, and then when it gets blocked, having to fish out the gunky wad months later. But what exactly do you look for? Well, do you ever find yourself encountering several small issues that constantly interrupt your progress? They’re like bricks slowly building to create a barrier between you and your goal. It’s frustrating, I know, but on a sunnier note, you’ve found your dots.

Connecting the Dots

Now we get to the fun part; plotting trends. As part of your loss elimination process, routinely check for chronic issues across the site. You can do this monthly, quarterly, or annually, so the trends have time to develop. (Side note, if someone comes to you outside the meeting room to discuss a potential chronic problem, pay attention. These issues aren’t always easy to find, so listen to them and analyse information when it’s freely served to you.) I recommend grouping common types of failure together (Eg. Electrical, mechanical) across sights. Then it’s easier to spot which ones have the greatest negative impact. My favourite way to do this is creating 3D plots of all the groups together. I can add to it as I gain more data, thus highlighting any rising trends. If something is getting worse, like rapidly increasing cost, you know where to act. When spotting chronic issues, plotting trends is essential. Since most issues are small, they fly undetected by Pareto. Therefore you must brush through the whole plant with a fine-toothed comb. Do not allow the little things to grow big. Do not allow the wall to build. Once you’ve found your chronic issue, you’ll know what you need to fix. Hooray!

 While some problems make themselves known like a slap to the face, chronic issues are more “passive aggressive”. Hints suggest something wrong, but an obvious answer refuses to present itself, and it can be agonising. Knowing how to identify these problems is key since, as we all know, you can’t fix a problem if you don’t know it exists.

Want more information on dealing with chronic issues or other hurdles you face as a reliability engineer? Peter Horsburgh’s Extraordinary Reliability Engineer course could be for you. If you are interested, register at Eventbrite here.

One on One coaching sessions – now available

Having been a Reliability Engineer myself, I know that when I wanted a specific question answered finding someone that could help was a hard thing to do. When getting a consultant in for one question can be a little hard to justify. Rather than tolling through the internet for the answer and wasting a whole heap of time, now you can just ask me!

The coaching session is designed to get answers to your questions fast. They are delivered online, so no travel is required. You can share your screen and explain the question or problem. Included is a recording of the session so you can refer to it again and again.

The Coaching works in two stages. First, you ask a question or explain the problem. I will take some time and formulate an answer or solution. If I can answer it there and then, I will. I explain the answer in a second session over a maximum of 90 minutes.

Coaching slots are available now, you can find out more information on Reliability Extranet here.

The First Habit – Identifying Problems

Steps to success: Find a problem – Solve it.

Not sure what the five habits are? Read this blogpost to find out.

As I’ve said before, you can’t fix a problem if you don’t know it exists. That gives us the first step towards success in Reliability – Identifying problems. How do you do that? I’m here to dish out the answers. I’ll tell you where to look, what to find, and how to understand them in this very blogpost.

What’s the problem?

First off, finding problems is the foundational step to having your plant work to its greatest capacity. Look for the largest problems first. According to economist Pareto, 20% of your problems waste 80% of your time. Since you should always aim to get the greatest production out of the smallest investment, it is most common to locate an issue by looking for increased/increasing trends in cost and downtime. This has helped me track down larger issues in the past, and thus our team were able to get them fixed before they caused major damage.

How do you know if you have a problem?

While it is tempting to fix the first obvious issue you see and be done with it, that mindset distracts you from the wider scope. First, you need to understand if and where you have a problem. As a Reliability Engineer, you should always be searching for issues even if they aren’t immediately apparent. “But Peter!” you cry. “Where do I start?” Yes, I was rather vague when talking about trends in cost and downtime, but for a detailed list, check out the ‘5 Habits’ book.

How do you know what kind of problem it is?

Identify what part of the plant needs the most improvement, and then work on that. There are two common ways problems present themselves:

The chronic problem.

Symptoms include numerous smaller problems making a bigger one. If you don’t realise all the little problems you’re having are part of a big problem, you’ll get an even bigger problem, like a systemic shutdown.

The ‘Big Bangs’.

It’s almost as literal as it sounds. These problems are fast, obvious, destructive, and unexpected. They require instant action.

Here are some strategies to combat these problems. I identify the top three worst problem machines, then begin with the first. I check records for downtime and lost production capacity to do this. The team should have regular meetings to discuss defect elimination or continuous improvement.  You need time and room for this work. Other things can cause you to loose focus and drop the ball on this task of identifying problems. You will be back to where you started if this happened. And if nothing is going wrong, keep improving the plant to form the habit. If you aren’t struggling to stay afloat, it’s time to swim and get ahead.

Congrats! You’re about to take your first step towards improvement as a Reliability Engineer. If you know where, how, and what problems to search for, you’re doing just fine. But as mentioned before, this is only the first step. For more information, look out for future blogposts and check out the Extraordinary Reliability Engineer course available for registration on Eventbrite to learn more about future steps.