The problem I see with using LLMs for large-scale automation is that in a <1% of cases they go catastrophically wrong—describing a shoe as viagra scale of wrong. But if you’re using it at scale that <1% can both become a huge number—catastrophic risk in business terms—and impossible to discover through testing or audits