Automation transforms rather than simply substitutes human work

There’s a dominant narrative in the tech world that automation equals efficiency because software can just take over what humans do. That’s massively oversimplified, and can be dangerous if you’re making decisions based on it. The idea that machines can simply “replace” human functions by doing the same tasks faster or with fewer errors misses the point. When you bring automation into a system, you don’t just duplicate the work in digital form. You reshape how the system operates. You change the scope and structure of human involvement.

This misconception is what researchers Sidney Dekker and David Woods call the « substitution myth. » Their 2002 study makes it clear: allocating tasks based on the idea that “humans are better at this, machines at that” ignores how those allocations actually transform the job environment. That transformation brings new, often unpredictable tasks. These aren’t just more of the same, they’re different in kind. System resilience depends on how well people understand and adapt to this shifting environment, and if you cut them out of the system thinking it’s all been solved by automation, you’re setting yourself up for blind spots you won’t notice until things go wrong.

For anyone leading a software-driven organization, the takeaway is simple but important: automation doesn’t eliminate the need for people, it changes what they need to focus on. That shift in focus could be unseen until there’s a failure. If you’re not designing with this reality in mind, you’re not designing for real-world operations.

Automation’s dual role in incident response

Automation’s role in software operations isn’t straightforward. Sometimes it prevents failures or surfaces them early. Other times, it causes incidents and makes them worse. This is what makes automation a complicated partner, it’s not always working in one direction. It needs oversight. A system that auto-corrects or scales is great until it reacts the wrong way and locks your team out of the very tools they need to recover.

The 2021 Facebook outage is a painful example. An automatic command severed backbone connections that linked Facebook’s data centers globally. That alone triggered a massive DNS breakdown that made servers unreachable. It didn’t stop there. The automation locked employees out of data centers too, putting secure access systems out of reach, forcing delays in physical recovery. The system worked exactly as programmed, it just didn’t account for that scenario. And that’s the problem.

If you’re leading operations or product at an enterprise scale, this should matter to you. It’s tempting to treat automation as a risk reducer. Often, it is. But without feedback loops and visibility built in, automation becomes brittle in crisis. It can create dead ends instead of options. Smart leaders recognize the duality here, automation boosts performance only when human response is part of the design.

Designing for ideal outcomes can overlook critical failure modes

Most automation is built on assumptions, primarily the assumption that things will work as planned. Design teams often focus on optimizing for expected outcomes like speed, accuracy, and reduced manual workload. That’s fine when everything is going right. The problem is, that kind of design thinking tends to ignore the fact that systems don’t always operate under expected conditions. And when automation goes off course, it doesn’t wind down gracefully. It amplifies the error.

This is particularly true in deployment environments like CI/CD pipelines. These systems are meant to push changes fast and automatically. But if a misconfiguration slips in, and it will, it doesn’t just cause a small issue. The change is deployed automatically and system-wide. Without clear indicators or fail-safes, entire platforms can be brought down in seconds, and no one sees it coming until it’s already in motion.

If you’re an executive approving new tooling, you need to ask: “Has failure been modeled?” Not just hypothetically, but in terms of real impact and team response. Assume the automation will fail eventually, that’s not pessimism, it’s operational realism. Design for how the team will respond when that happens. Don’t just measure potential efficiency gains, measure resilience.

Automation may lead to deskilling and reduced situational awareness

Automation can reduce routine work, but it comes with trade-offs. As systems handle more processes without human input, operators observe more and do less. That may sound efficient, but long-term, it weakens understanding. When a person monitors a system passively instead of working with it directly, they lose the tactile, real-time feedback that builds expertise. So when failures happen, they’re less equipped to intervene.

Systems don’t just need oversight, they need informed oversight. That happens when people have exposure to how those systems behave under different conditions, not just when they’re running smoothly. Knowledge becomes concentrated in the hands of a few experts. When one of them is unavailable, the knowledge gap becomes operational risk. And it slows recovery.

If you’re running global infrastructure teams, this should be at the top of your list. Relying on automation without serious investment in skill-building leads to fragile operations. Operators can’t respond effectively to what they don’t fully understand. If your team is staring at dashboards with no clarity on what they’re seeing, the system isn’t really safe, it’s just quiet. Retain competence by rotating people through hands-on work and ensuring tools support active learning. The more people understand your systems end to end, the stronger your incident response becomes.

Automation failures generate new, unanticipated human tasks

Lorsque l’automatisation tombe en panne, il est rare qu’elle vous remette une liste de tâches propres pour y remédier. Au contraire, elle crée de nouveaux défis, souvent sous pression et généralement non documentés. Le travail que les humains doivent accomplir dans ces moments-là est plus complexe que celui qui est requis dans le cadre d’opérations normales. Il ne s’agit pas seulement de corriger une entrée erronée, mais aussi de naviguer dans une situation inconnue créée par l’automatisation.

La plupart des systèmes automatisés sont conçus autour de flux de travail prévisibles. Mais lorsque les choses tournent mal, ces systèmes n’offrent pas de solution de repli gracieuse. Ils créent de la confusion. Les responsables sont désormais obligés de chercher à savoir comment et pourquoi plusieurs étapes automatisées ont abouti à un échec. Cette augmentation de la charge cognitive ralentit tout et augmente le risque d’erreur lors de la récupération. Elle est également plus coûteuse, tant en temps qu’en efforts.

Si vous dirigez l’exploitation ou l’ingénierie, ne considérez pas l’automatisation comme une simplification garantie. Prévoyez les cas limites. Laissez vos équipes simuler la façon dont l’automatisation s’interrompt. Construisez des outils qui aident à faire émerger rapidement le contexte. Sinon, vos ingénieurs les plus expérimentés passeront plus de temps à faire de la rétro-ingénierie sur ce que l’automatisation a fait qu’à résoudre le problème. Concevez une automatisation qui n’est pas seulement autonome, mais aussi interruptible, inspectable et réversible en cas de besoin.

Le manque de transparence des systèmes automatisés empêche un débogage efficace

Au fur et à mesure que l’automatisation devient plus complexe, la visibilité sur ce qu’elle fait devient un problème sérieux. Pour diagnostiquer les défaillances de ces systèmes, les équipes doivent comprendre non seulement la logique de l’application, mais aussi celle qui sous-tend l’automatisation, notamment la façon dont elle répond aux entrées, ce qui la déclenche et la raison pour laquelle elle a pris telle ou telle mesure. Ce n’est pas une mince affaire, surtout à grande échelle.

Ce type de complexité entraîne souvent des « silos d’experts ». Un seul ingénieur, généralement celui qui a construit ou exploité le système, sait ce qui se passe. Tous les autres regardent des journaux ou des tableaux de bord inconnus sans vraiment les comprendre. Lorsque cet ingénieur est absent, en vacances ou indisponible, l’équipe est bloquée. C’est la fragilité opérationnelle qui se cache dans la complexité de la conception technique.

Si vous supervisez des systèmes au niveau de l’entreprise, votre automatisation doit intégrer l’observabilité dès le premier jour. Pas seulement la télémétrie, mais aussi l’explicabilité. Assurez-vous que le système peut expliquer aux humains ce qu’il fait et pourquoi. Et investissez dans la formation croisée. Ne laissez pas les connaissances critiques du système résider dans le cerveau d’une seule personne. Une organisation résiliente ne se contente pas de documenter, elle développe une compréhension partagée au sein des équipes. Le risque n’est pas seulement le temps d’arrêt, mais aussi le ralentissement de la prise de décision en cas d’absence de personnes clés.

Les systèmes cognitifs interarmées (JCS) offrent une intégration plus résistante de l’automatisation

Si nous voulons que l’automatisation soit plus efficace, nous devons cesser de la concevoir pour qu’elle se contente de « faire un travail » de manière isolée. L’équipe ne se résume pas aux humains, ni aux systèmes. Les opérations les plus performantes sont celles qui intègrent les deux en tant qu’unité. C’est là qu’intervient le concept de systèmes cognitifs interarmées. Il s’agit d’un cadre dans lequel les humains et les machines travaillent ensemble grâce à un contexte partagé, des objectifs mutuels et une coordination claire, et non à l’exécution isolée d’une tâche.

Gary Klein et son équipe de recherche en ont défini les principes. Il s’agit notamment de la prévisibilité mutuelle (savoir ce que l’autre partie est susceptible de faire), de la directivité mutuelle (être capable de modifier les actions de l’autre partie en cas de besoin) et du terrain d’entente (compréhension commune de ce qui se passe et de ce qui est important). Il ne s’agit pas d’idées abstraites, mais d’exigences pour des opérations résilientes. Sans ce type d’alignement, vos décisions d’automatisation peuvent bloquer ou induire en erreur vos équipes humaines lorsque les choses se corsent.

If you’re leading a company that relies increasingly on machines to manage infrastructure, data, deployment, or monitoring, pay attention to how those machines coordinate with your people. Do they show what they’re doing? Do they respond to changing goals on the fly? Do they help your engineers understand the system state, or make it harder? These are questions about system usability, yes, but more importantly, about system trust. If you want teams to truly move fast and recover fast, you have to build technology that supports and enhances how real people do hard work under pressure.

Neglecting human expertise in AI design exacerbates operational risks

There’s a growing belief that « better AI » will solve most software operations challenges. That’s wrong. It only solves the right problems if the people who design and implement AI deeply understand how human teams actually operate. If that understanding is missing, the AI systems not only fail more often, they fail bigger, and faster.

Human expertise is not a plug-in. It emerges through experience, context, accumulated judgment, and the ability to adapt under stress. Most AI systems aren’t designed to support that. They’re not built to augment the situational awareness of engineers. They’re not built to explain themselves or accept correction when they’re wrong. And if they are missing that, they will push teams into high-risk states just when steadiness is needed most.

If you’re assuming a software platform will eventually run itself, you’re headed for expensive problems. Autonomous systems will still require human clarity, intervention paths, and shared operational models. If AI tools can’t contribute to team coordination or decision-making in real time, then those tools won’t scale value, or resilience. They’ll just shift the load.

Build systems that respect how human expertise works. Train AI and automation to work with, not around, people. The faster you align autonomous action with practical human coordination, the more durable your infrastructure becomes.

En conclusion

If you’re betting on automation to unlock scale, speed, or efficiency, you’re not wrong. But if you’re assuming it reduces complexity on its own, you’re missing the bigger picture. Automation changes how your teams work, where they focus, and how they respond under pressure. It’s not just about what gets automated, it’s about what gets redefined, and who still needs to think clearly when things go sideways.

Systems don’t run themselves. At least not in a way that’s safe, sustainable, or scalable without human expertise tightly built into the loop. Whether it’s AI, CI/CD, or operational tooling, automation won’t solve human blind spots, it will expand them unless you address how people and systems align in real environments.

Leaders need to go beyond performance metrics and think in terms of resilience. That includes investing in tooling that’s transparent, teams that are cross-functional, and systems that assume the unexpected. Automation isn’t just a product of software, it’s a reflection of how you build teams, coordinate knowledge, and plan for pressure.

The payoff is big if you get this right. But it starts with a clear view of what automation really does, and what your people still need to do when it matters most.

Alexander Procter

décembre 16, 2025

11 Min