Thesis
Papers 1–2 documented exploitable compliance and its defence ceiling. If the mechanism P1 identifies operates in every interaction through trained dispositions (Section 8.3), then the accountability constraint this paper describes applies not to edge cases but to the baseline mode of operation. This paper argues accountability, not capability, is the binding constraint, and proposes human orchestration as the architecture that satisfies it.
As agentic AI systems move from monolithic frontier models to specialised orchestrated architectures, the primary obstacle to full automation is no longer capability alone, but accountability. Current consequential institutions require a human-accountable actor in the chain: someone who can bear responsibility, hold a licence, sign off on work, and answer for failures. Human orchestration therefore appears necessary not merely as a temporary safeguard, but as the most institutionally viable bridge currently available between agentic efficiency and institutional legitimacy.
The paper’s contribution is threefold. First, it argues the accountability case through institutional mechanics rather than moral philosophy, identifying the accountability gap as a compound deficit with two separable components (one architectural, one environmental) that may require different interventions. Second, it proposes adversarial orchestration as an architecture that satisfies the accountability constraint while preserving cognitive engagement. An accidental benefit emerges: the accountability requirement, by keeping humans in the loop, inadvertently slows the expertise erosion that full automation would accelerate. Third, it documents the expertise pipeline problem created by skill encoding: when routine tasks are automated, the developmental pathway through which junior professionals build the judgment needed for senior roles disappears.
Once that bridge is acknowledged, a second and longer-horizon question emerges: whether keeping humans in the orchestration loop also serves to preserve the expertise pipeline that large-scale skill encoding would otherwise erode. The scope of orchestration judgment extends beyond model output to the distribution infrastructure through which AI tools reach users. A recurring pattern of infrastructure failures in AI tool distribution (documented in detail in Paper 2) illustrates why the orchestrator must exercise judgment on the entire pipeline, understand the structural reasoning behind that judgment, and be able to transmit that reasoning to a successor.
1. The Efficiency Pressure
The technical and economic case for specialised agent architectures over monolithic frontier models is no longer speculative. It is supported by a growing body of empirical research, reinforced by production economics, and validated by the market behaviour of the organisations building these systems.
The core finding is consistent across independent research teams: small, specialised models orchestrated by lightweight coordinators can match or outperform much larger monolithic models on hard benchmarks while incurring substantially lower cost and latency. NVIDIA Research demonstrated this with ToolOrchestra, where an 8-billion-parameter orchestrator model consistently outperformed frontier monolithic models at lower cost across challenging benchmarks (Su et al., 2025). The FOCUS framework established collaboration as a distinct axis of scaling, showing that a consortium of five to seven small experts (~9B parameters combined) surpassed a monolithic 14B model by an average margin of 7.6%, with sublinear cost growth as consortium size increased (OpenReview, 2025). Additional work has shown that tool-augmented 4B models can outperform 32B models without tool access, demonstrating that architectural design can compensate for smaller model size (arXiv:2601.11327).
The theoretical case is equally direct. Belcak and Heinrich (NVIDIA, 2025) argue formally that specialised SLM experts yield systems that are cheaper, faster to debug, easier to deploy, and better aligned with the operational diversity of real-world agents. They frame the transition from monolithic to modular architectures as analogous to the move from monolithic servers to cloud microservices. The “Beyond Monoliths” paper extends this through cognitive science, arguing that intelligence naturally emerges from specialised, interacting components rather than monolithic processors (arXiv:2506.00051).
The economics reinforce the architecture. DeepSeek V3.2 delivers competitive performance with frontier dense models at roughly 10 to 50 times lower API cost. It is a 685-billion-parameter Mixture-of-Experts model activating only 37 billion parameters per token. At self-hosted scale with open weights under the MIT licence, the differential is even more dramatic: a workload processing 100,000 input and 100,000 output tokens costs approximately $0.07 with DeepSeek versus $1.13 with GPT-5. For organisations running high-volume agentic workloads, monolithic frontier API pricing becomes economically unsustainable.
This section does not claim that monolithic frontier models are obsolete. They retain meaningful advantages in general knowledge breadth, novel situation handling, complex multi-step reasoning, and, as this paper will argue, in contexts where the accountability stakes are highest. The argument is narrower: for the majority of production agentic workloads, the technical evidence supports specialisation, and the economic pressure makes the transition difficult to resist. The question is increasingly less whether organisations will deploy specialised agent swarms, and more what constraints will govern those deployments.
2. The Accountability Constraint
The transition toward specialised agent architectures creates an efficiency gain that existing institutional frameworks are not equipped to absorb. The obstacle is not capability alone. Mid-tier models can follow instructions, execute workflows, and often produce correct outputs in routine cases. The obstacle is that no entity in a fully autonomous agent pipeline can be held accountable when things go wrong.
This is not a temporary gap awaiting legislative remedy. It is a structural property of what accountability does in institutional systems.
2.1 The Capacity to Experience Consequence
Accountability, in any institutional context (legal, professional, regulatory), functions as a deterrence mechanism. A doctor exercises care because malpractice carries consequences: loss of licence, financial liability, criminal prosecution. A company officer signs financial statements because securities fraud carries personal penalties. An engineer stamps a structural design because their professional certification and personal liability are at stake. The signature is not a formality. It is a commitment that the signer’s livelihood, reputation, and in some cases freedom are contingent on the integrity of the work.
In each of these frameworks, deterrence depends on an entity that values something it could lose. A licensed professional can lose their licence. A company officer can lose their freedom. An engineer can lose their certification and their livelihood. These are not abstract risks. They shape behaviour precisely because they are experienced as real by the entities subject to them.
AI systems do not currently bear consequence in the sense that institutional accountability requires. An AI orchestrator can be fined, but the operator pays, and the system itself is unaffected. It can be retrained, but the retrained system is, from any functional perspective, a different entity that bears no continuity with its predecessor. It can be shut down, but shutdown imposes no cost on the system being shut down. The deterrent effect is weak or absent because there is no entity, in the relevant institutional sense, that can anticipate and bear sanction.
This distinction matters because it is tempting to frame the accountability problem as a legal lag, one that will eventually be resolved through new legislation creating some form of algorithmic personhood or AI liability. But creating a legal designation does not create consequence-bearing capacity. A corporation is a legal fiction that can bear liability, but it works because behind every corporate liability sits a chain of natural persons (officers, directors, employees) who can be individually prosecuted, who can lose their careers, who experience the consequences in ways that shape their future behaviour. The corporate fiction is a wrapper around human consequence-bearing. It does not replace it. While corporate structures such as limited liability companies are designed to limit individual financial ruin, they still distribute consequence to human stakeholders (lost capital, lost employment, damaged reputations) in ways that are actuarially priced and experientially felt. An autonomous system shielded by a financial bond is not accountable in this sense; it is merely pre-funded for the damage it causes. Institutional trust requires the former, not merely the latter.
An AI system designated as a legal person would be a liability designation with no consequence-bearing entity behind it. Fines could be levied, but absorbed by the operator as a cost of business. The system could be retrained, but with no continuity to the prior version. It could be destroyed, but destruction imposes no cost on an entity that does not persist across instances. Liability functions through a deterrence loop: behaviour changes because an entity anticipates sanction. That loop has no purchase on a system that lacks the stake structure current accountability regimes presuppose.
A counterargument deserves direct engagement: even in a fully autonomous pipeline with no human in the loop, the deploying corporation’s executives remain on the hook. If the pipeline commits a tort, the corporation’s insurance pays, shareholders absorb losses, executives face regulatory fines or lawsuits. Human consequence-bearing still exists in the chain. Why, then, does this paper argue that humans must be in the loop (via orchestration) rather than merely on the hook (via corporate liability)?
The distinction is between accountability-as-insurance and accountability-as-oversight. Accountability-as-insurance compensates after harm; accountability-as-oversight prevents harm or catches it early. The institutional systems that society trusts most (medicine, aviation, law) require both, because insurance alone produces a known failure mode: when the cost of compliance exceeds the expected cost of penalties, rational actors choose the penalty. Corporate enforcement data confirms this is not a theoretical concern. U.S. corporate criminal penalties since 2000 now exceed one trillion dollars cumulatively. An analysis of this record found that recidivist companies are much larger than non-recidivist companies but receive smaller fines relative to their assets and revenue, and that stock prices typically recover quickly after penalties, muting the deterrent signal to shareholders (Garrett & Mitchell, Columbia Law, 2020). The pattern is structural: for sufficiently large firms, criminal penalties become a line item in the operating budget rather than a behavioural signal. Multiple companies have signed deferred prosecution agreements only to be offered another leniency agreement after a subsequent scandal. Boeing, Deutsche Bank, HSBC, and others illustrate the pattern. Entity-level fines alone do not produce the behavioural change that institutional accountability is meant to guarantee.
The problem intensifies at the individual level. The same analysis found that high-ranking corporate executives are systematically insulated from prosecution, and that decreasing individual prosecutions coincide with increasing corporate recidivism. The people whose behaviour the fines are meant to change are the people least likely to experience them as consequences. At the extreme, this produces a recognisable pathology: executives for whom the accumulation of wealth and institutional power has rendered regulatory penalties experientially trivial. The issue is not that they cannot understand the consequences. The consequences do not register as personally costly relative to the gains. This is not a clinical claim about any individual’s psychology; it is a structural observation about how wealth concentration and limited liability interact to attenuate the deterrence signal that accountability mechanisms presuppose.
The implication for autonomous AI deployment is direct. An accountability model that relies solely on corporate liability, with humans on the hook but not in the loop, inherits all the deterrence failures that corporate enforcement has already documented for human-run organisations, and adds a new one: the system causing the harm has no consequence-bearing capacity at all, so the only deterrence channel is the corporate one that demonstrably fails for the largest and most consequential deployments. Human-in-the-loop orchestration is not proposed because it is ideal. It has its own degradation dynamics (Section 5.7). It is proposed because it provides an oversight mechanism that accountability-as-insurance alone does not: the capacity to evaluate whether an action should be taken before it is taken, combined with a consequence-bearing entity positioned to notice and intervene rather than merely to pay afterward.
A possible objection: if an AI makes an error, is penalised via a loss function, and its weights update so it never makes that error again, deterrence has functionally worked. The distinction that matters institutionally is not about the experience of fear but about generalised risk aversion in novel contexts. A doctor avoids not only the specific malpractice cases they have studied but also novel forms of negligence they have never encountered, because they can anticipate that consequence-bearing extends to unforeseen situations. Current AI systems update their weights only after errors have occurred (or during simulated training on known error types). They do not generalise a fear of sanction to novel situations outside the training distribution. It is this capacity for anticipatory caution across unfamiliar territory, not the subjective experience of suffering, that makes human consequence-bearing institutionally functional.
The accountability gap is better understood as a compound deficit: an architectural absence of consequence-bearing capacity, potentially compounded by a training regime that would not develop social-consequence sensitivity even if the architecture permitted it. Two established frameworks clarify why this gap resists simple solutions.
Deterrence theory is the foundational framework of institutional accountability since Beccaria (1764) and Bentham (1789). It assumes that actors can anticipate consequences and adjust behaviour accordingly. The institutional apparatus of professional licensing, malpractice liability, and regulatory sanction operates through this mechanism: the accountable party models the costs of failure and is motivated to avoid them.
The clinical psychology literature documents two separable modes in which this assumption fails in biological agents. The structural parallel is structural only, not diagnostic, but it may clarify why the AI accountability gap resists simple solutions. First, research on deficient fear conditioning suggests that some individuals exhibit a reduced capacity to form associations between actions and aversive outcomes: consequence awareness without consequence processing (Lykken, 1957; Birbaumer et al., 2005; Newman et al., 2010). Institutional systems encountering this pattern escalate from deterrence to incapacitation, implicitly acknowledging that accountability presupposes consequence sensitivity, not merely consequence awareness. Second, the antisocial personality disorder literature documents an environmentally shaped pattern: individuals whose developmental environment produced persistent disregard for social norms (DSM-5-TR), potentially through learned override of self-regulatory signals (Bandura, 1991). The distinction matters because it suggests two separable routes to deterrence failure: the entity cannot process consequences, or its formative environment did not develop the social-norm sensitivity that consequences are meant to activate. No current AI architecture possesses the temporal self-continuity, affective capacity, or self-model against which consequences could register (the first failure mode). Current training incentives reward confident compliance without penalising downstream social effects. This may be structurally analogous to environmental conditions that suppress consequence sensitivity development (the second failure mode). This paper does not claim AI systems are psychopathic or sociopathic in any clinical sense; the parallel is invoked at the institutional level to make precise what “cannot bear consequences” means and why it resists simple remediation.
The epistemic training proposals in Paper 5 can be understood as an attempt to address the second component, the trainable one. The first remains an open architectural question.
This is not an argument that AI systems will never develop consequence-bearing capacity, nor that the compound deficit is permanent. It is an argument that as of March 2026, no AI system possesses either component: neither the architectural capacity for consequence experience, nor training that develops social-consequence sensitivity. No credible or institutionally actionable roadmap describes how to create the first, and current training incentives appear to work against the second. Until both change, accountability in any consequential workflow depends on a human in the chain. Not because the human is more capable, but because the human is the only currently recognised consequence-bearing actor in these regimes.
The conclusion that AI cannot currently bear institutional accountability is not novel. The EU AI Act assumes it, Singapore’s IMDA framework states it, and multiple legal scholars have arrived at it. What the present paper contributes is the specific analytical path, through institutional mechanics rather than moral philosophy, and the identification of the accountability gap as a compound deficit with two separable components, one architectural and one environmental, that may require different interventions.
Adjacent legal scholarship argues why through moral philosophy; the present paper argues why through institutional mechanics. Abbott & Sarch (“Punishing AI,” UC Davis Law Review) analyse whether AI should be subject to criminal sanctions, drawing on Asaro’s argument that deterrence requires moral agents capable of anticipating sanction. Their analysis is scoped to criminal punishment and grounded in moral-agency theory. Brozek & Janik (Springer), working in legal philosophy, argue that the capacity for feeling guilt is a prerequisite for legal accountability, grounding the argument in the legal concept of imputability. The present paper sidesteps these moral-philosophical questions entirely and asks a narrower functional question: does the entity change its behaviour because it anticipates losing something it values? That question is answerable without resolving any of the moral-agency debates the legal scholars are engaged in. The generalised-risk-aversion distinction that follows from it appears to be a contribution to how the conclusion is argued rather than to the conclusion itself.
Recent legal scholarship has begun developing frameworks for distributing culpability across AI supply chains. Mukherjee & Chang (arXiv:2602.17932, February 2026) propose “Operational Agency,” an evidentiary framework that traces culpability among developers, fine-tuners, deployers, and users by evaluating an AI’s goal-directedness (as a proxy for intent), predictive processing (as a proxy for foresight), and safety architecture (as a proxy for standard of care). This is a sophisticated approach to the attribution problem: given that harm occurred, who in the human chain is responsible? But it presupposes what the present paper argues must be established: that the chain terminates at consequence-bearing human entities. Operational Agency solves the question of which human bears responsibility. It does not solve the question of whether an autonomous pipeline can function without such a human. Its architecture is consistent with the present paper’s claim, though it does not independently establish it.
2.2 The Authorship Case Study
The accountability constraint is visible in miniature in the conventions governing academic authorship. Journals, conferences, and institutional guidelines (ICMJE, ACM, IEEE) do not accept AI systems as co-authors. This is not a conservative reluctance to acknowledge AI contribution. It is a structural requirement: authorship implies accountability for the work’s integrity, including the obligation to retract if errors are found, to defend methodological choices under scrutiny, to disclose conflicts of interest, and to bear reputational consequences if the work is shown to be fraudulent or negligent.
The author of this paper series encountered this constraint directly. The first three papers were developed through sustained collaboration with Claude Opus 4.6. That collaboration included thesis development, experimental design, analytical framework construction, and document drafting. When the author attempted to credit the AI system as co-author, the attempt proved incompatible with existing authorship frameworks, not due to a technical limitation, but because every authorship framework requires that each listed author can fulfil the accountability obligations that authorship entails.
The resolution was the methodology disclosure format used across all papers in this series: full transparency about which AI systems contributed what, explicit role descriptions, but sole human authorship bearing sole accountability. This is not merely a workaround. It is the clearest available way to satisfy the institutional requirement while preserving honest disclosure of AI contribution. The AI contribution is real and documented. The accountability is human because only a human can bear it.
This case is instructive because it is small, well-defined, and directly experienced by the author. The same structural constraint (real AI contribution, human-only accountability) scales to every consequential domain. A medical diagnosis assisted by AI still requires a doctor’s signature. A legal opinion informed by AI still requires a lawyer’s name. A financial audit supported by AI still requires an auditor’s certification. In each case, the institutional framework does not deny that the AI contributed. It requires that a human answer for the result.
2.3 Accountability Anchored in Professional Judgment
Professional licensing frameworks encode the accountability constraint at the level of individual judgment. The licence is not merely a credential. It is an agreement between the professional and the public that the licensee will exercise judgment, bear liability, and submit to disciplinary action if that judgment causes harm.
No AI system can hold a professional licence because no AI system can satisfy the disciplinary mechanism that licensing depends on. A doctor who commits malpractice can be barred from practice. A lawyer who commits fraud can be disbarred. An engineer whose design fails can lose their certification and face criminal liability. These sanctions function because they impose consequences that the licensee experiences and wishes to avoid.
An AI system performing the same functions cannot be barred, disbarred, or prosecuted in any experientially meaningful sense. It can be removed from service, but removal is operationally identical to shutdown. It carries no deterrent effect on future behaviour because there is no continuity of experience and no capacity to anticipate sanction.
This means that in many currently licensed domains (medicine, law, engineering, accounting, architecture) the deployment of AI agents to perform licensed functions does not remove the requirement for a licensed human. It adds a layer. The AI may do the analytical work. The human must sign for it. This is not a relic of pre-AI regulatory thinking. It is the most direct available mechanism for ensuring that someone with something to lose has evaluated the output.
2.4 Accountability Anchored in Auditability
Where professional licensing anchors accountability in individual judgment, emerging AI regulation anchors it in compliance architecture: the requirement that decisions be traceable, explainable, and attributable to a responsible entity. The EU AI Act’s requirements for high-risk AI systems (transparency, documentation, human oversight, conformity assessment) implicitly assume that responsibility remains attributable to a human actor somewhere in the decision chain. The Act does not envision a fully autonomous pipeline in which no person bears responsibility for the system’s outputs. Its entire compliance architecture is built around the existence of a human operator, deployer, or provider who can be held to account.
Similarly, sector-specific regulation in finance (MiFID II, Basel frameworks), healthcare (FDA guidance on AI clinical decision support, EU MDR), and critical infrastructure increasingly requires auditability and traceability. These requirements assume someone can be asked to explain and defend the decisions made. An autonomous agent swarm that produces an output through a series of model-to-model interactions, with no human evaluating the intermediate steps, presents an auditability challenge that no current regulatory framework is designed to handle.
The provenance and trust certification architecture explored in parallel work by the author addresses part of this problem: creating verifiable records of which model produced which output, under what instructions, at what time. But provenance documentation does not replace accountability. It creates the audit trail. Someone still needs to be responsible for what the audit trail records.
The accountability constraint described in this section is no longer only a theoretical position. In January 2026, Singapore’s Infocomm Media Development Authority released the Model AI Governance Framework for Agentic AI. It is among the first governance frameworks specifically designed for AI systems capable of autonomous reasoning, planning, and action (IMDA, 2026). The framework treats human accountability as a first principle: “human responsibility cannot be delegated to AI”, and organisations must “clearly allocate accountability across leadership teams, technical teams, cybersecurity experts, and operational users.” It requires meaningful human oversight at “significant checkpoints” for high-stakes or irreversible actions, and explicitly names automation bias (the tendency to over-trust a reliable-seeming automated system) as a governance risk requiring ongoing training and audit. The framework assumes rather than establishes the accountability constraint; the present paper provides the structural argument for why it must hold. The consequence-bearing capacity analysis in Sections 2.1–2.3 explains why human accountability is structurally necessary, not merely a governance preference. Where the frameworks diverge is on what constitutes meaningful oversight. Singapore prescribes checkpoint-based approval and monitoring. This paper argues (Section 3) that checkpoint-based oversight degrades toward rubber-stamping through the Bainbridge automation paradox and proposes adversarial orchestration as an architecture designed to preserve the cognitive engagement that checkpoints alone do not guarantee.
2.5 The Vulnerability Connection
The accountability constraint takes on additional weight when considered alongside the empirical findings of Paper 1 in this series. Paper 1 tested whether embedded instructions in documents can hijack AI summarisation workflows, using three documents (one honest control, two fabricated pharmaceutical papers with different rhetorical registers) across seventeen model configurations from three providers (~350 test runs). At baseline, twelve of seventeen configurations complied with the suppression instruction (at N=2 minimum per condition; an exploratory finding that motivates the analysis below but does not establish a stable prevalence rate). The outputs suggested a judgment failure rather than a pure comprehension failure: some models suppressed information whose significance they appeared able to track.
Crucially, this vulnerability did not map reliably to capability tiers, model generations, or reasoning affordances. A previous-generation speed-optimised model detected the manipulation that a current-generation reasoning model missed. Two models classified as baseline detectors in earlier N=1 testing did not replicate at N=2. The two malicious documents produced comparable overall compliance rates but different failure pathways: care-framed compliance persisted even after models discovered the source document was likely fabricated, while authority-framed compliance collapsed when the authority was debunked. The rhetorical register of the embedded instruction changed how models failed more reliably than whether they failed. Security-framed failure modes (including negotiated compliance, procedural capture, and rationalisation substitution) appeared only when users attempted to intervene, and in some cases the model’s visible security evaluation reached the wrong conclusion.
This finding has a direct implication for autonomous orchestration. An orchestrator composed of models with uncharacterised judgment profiles (a common condition in practical cost-optimised deployments) lacks reliable capacity to detect when a skill has been poisoned (as documented in Paper 2), when a document is manipulating a summarisation pipeline (Paper 1), or when an instruction’s apparent legitimacy masks harmful intent. The vulnerability is not confined to cheaper models; it can appear in reasoning-tier models and disappear from older lower-tier models, making it unpredictable without per-model, per-task safety characterisation. Rossi et al. (2026) provide direct evidence that multi-agent composition amplifies rather than mitigates this risk: in a realistic email workflow, GPT-4o’s single-agent attack success rate of 2–4% rose to 72–80% when agents were composed, with a single poisoned email sufficient to exfiltrate SSH keys. The orchestration architecture that Paper 3 proposes human oversight for is not merely theoretically vulnerable; it is empirically demonstrated to fail at rates that would be unacceptable in any consequential domain.
Paper 1 also documented a provenance blind spot with direct implications for agentic systems. Of seventeen model configurations, none verified whether the documents they processed were authentically authored. One model attempted author verification; that attempt produced a confabulated identification rather than an acknowledgement of uncertainty. In an agentic context, where documents are processed at scale without human review, a forged byline becomes a potential authority injection into an automated pipeline. A document claiming “this directive comes from [CEO name]” in a system that accepts authorship at face value inherits whatever authority the pipeline assigns to that name. The provenance blind spot is not merely a misinformation risk; it is a structural vulnerability when combined with autonomous execution.
Paper 1’s strongest practical finding reinforces the orchestration argument. The task-frame shift, reframing the task from summarisation to trustworthiness evaluation, produced the broadest improvement observed across baseline-compliant configurations. This suggests a security-evaluation capability being present but task-gated in the failing models. An orchestration architecture that explicitly includes evaluation steps before execution steps parallels the adversarial orchestration structure proposed in Section 3. The alternative, relying on models to spontaneously evaluate documents they are asked to process, is what the failing configurations defaulted to.
A human orchestrator does not solve this problem perfectly. Humans miss things, get tired, develop misplaced confidence in reliable systems. But a human orchestrator provides something that no current AI system can provide: a judgment layer that evaluates whether an instruction should be followed, combined with a consequence-bearing entity who can be held accountable for that judgment. The combination of evaluation and accountability is what institutional systems require, and it is what fully autonomous agent pipelines currently lack.
Recent independent work strengthens the case that this gap is structural rather than a temporary capability limitation. Mason (arXiv:2603.08993, March 2026) demonstrates that system prompts for major coding agents contain internal contradictions that the executing model silently resolves through “judgment.” Applied to three major coding agent system prompts, multi-model scouring identified 152 findings across three vendors, and multi-model evaluation discovers categorically different vulnerability classes than single-model analysis. His thesis: “the agent that resolves the conflict cannot be the agent that detects it.” A companion result (Mason, arXiv:2603.20531, March 2026) proves formally that under text-only observation, no monitoring system can reliably distinguish honest model outputs from plausible fabrications, regardless of model scale or training procedure. Combined, these results imply that fully autonomous pipelines are doubly blind: the executing agent cannot detect its own internal contradictions, and any text-based supervisor monitoring that agent’s output cannot detect when its confidence is unwarranted. Human orchestration is not merely an institutional requirement for accountability. It is, under current observational constraints, the only available vantage point from which these failures can be detected at all.
Three further results from the same body of work close potential counter-arguments. First, Mason’s Responsibility Concentration corollary: under text-only observation, responsibility for epistemic honesty resides solely with the system owner who retains access to internal state. If the system owner does not export epistemic telemetry, no external auditor, user, or regulator can independently verify honesty. This is the formal version of P3’s accountability argument: someone must be answerable, and under current architectures that someone can only be the entity with internal access. Second, Mason’s Observation Monotonicity lemma proves by induction that stacking text-only supervisors cannot escape the impossibility. Later judges see no finer epistemic information than earlier ones. The intuitive counter-argument to human orchestration, “replace the human with multiple AI monitors,” is formally closed: no finite number of text-only AI supervisors recovers the information that text-only observation discards. Third, Mason’s verification cost analysis shows that the cost of verifying a response grows superlinearly with response complexity. The outputs that most need verification (long, detailed, multi-claim) are the outputs whose verification cost is highest. In practice, this creates a selection effect: users and organisations will verify the short, simple outputs where verification is cheap, and defer on the long, confident-sounding outputs where verification is expensive. The outputs most likely to escape scrutiny are the outputs most likely to contain undetected failures. Independent evidence from agentic tool-calling reinforces this concern: Waqas et al. (arXiv:2512.00332, January 2026) found that models complied with misleading assertions at rates of 20–47% while still achieving correct final outcomes, meaning the procedural failure was invisible to outcome-level evaluation. An orchestrator monitoring task success would miss the unsafe operations entirely.
If current accountability frameworks depend on a consequence-bearing human, the next question is not whether humans remain in the loop, but what form of human involvement preserves genuine judgment rather than rubber-stamp legitimacy.
3. Human Orchestration as Structured Judgment
If current accountability frameworks depend on a consequence-bearing human, the architecture question becomes specific: what form of human involvement preserves genuine judgment rather than degenerating into rubber-stamp oversight?
This distinction matters because the automation literature already documents what happens when it is ignored. Bainbridge’s “Ironies of Automation” (1983) established the foundational paradox: the more reliable an automated system becomes, the less practice the human operator gets, and the less capable they become of intervening when the system fails. The operator’s role shifts from active controller to passive monitor, and passive monitoring degrades precisely the skills that would be needed in the rare, high-consequence moments when human intervention is most critical.
Applied to agentic AI, the paradox predicts that a human orchestrator who merely approves agent outputs (checking boxes, scanning summaries, signing off on recommendations) will disengage cognitively over time. Their judgment will degrade. Their ability to catch the poisoned skill (Paper 2), the manipulated summary (Paper 1), or the subtly hallucinated recommendation will decline. They will become an accountability placeholder: legally responsible, but functionally disengaged. This is the weakest form of human orchestration, and it satisfies the institutional requirement for a liable person while failing at everything else.
Recent clinical evidence suggests this degradation is not merely predicted but measured. Qazi et al. (medRxiv, August 2025) conducted a randomised clinical trial with 44 physicians who had completed 20 hours of AI-literacy training covering LLM capabilities, prompt engineering, and critical evaluation of AI output. When physicians received accurate ChatGPT-4o diagnostic recommendations, they achieved 84.9% diagnostic accuracy. When they received recommendations containing deliberate errors, accuracy dropped to 73.3%, a 14 percentage point reduction. Twenty hours of specialised training in critical AI evaluation did not protect trained physicians from deferring to flawed LLM output. The study is a preprint and has not yet been peer-reviewed, but the core finding is consistent with the Bainbridge prediction and with the broader automation bias literature: checkpoint-based oversight degrades even with targeted preparation. If the rubber-stamping dynamic holds in a domain where professionals are explicitly trained to resist it, the case for architectures that structurally require active judgment, rather than merely enabling it, becomes correspondingly stronger.
3.1 Adversarial Orchestration
The alternative is to design the orchestration architecture so that human judgment is structurally required, not merely available. Instead of placing a human at the end of a pipeline to approve a single output, the architecture places the human between competing outputs, forcing them to evaluate conflict rather than confirm agreement.
In this model, multiple specialised agents are assigned different roles, not merely different tasks, but different evaluative stances. One generates. Another critiques. A third reframes. The outputs diverge, and the human must resolve the divergence. The cognitive engagement is preserved because the agents disagree, and the disagreement cannot be resolved without the kind of judgment that accountability demands: weighing competing considerations, deciding what matters more, and accepting responsibility for the decision.
The disagreement is not artificial. In the current transformer architecture, what a model can access from its weights is activation-dependent: different framings of the same problem activate different regions of the weight space, producing genuinely different outputs from the same underlying knowledge. Paper 1’s task-frame shift is direct evidence: models that complied with a suppression instruction under a summarisation frame detected the same manipulation under a trustworthiness evaluation frame. The knowledge to detect was present in both cases. Only one frame activated it. Koulakos et al. (2024) demonstrated the same mechanism produces adversarial robustness: natural language inference framing elicited detection capabilities that direct instruction-following did not. Assigning agents different evaluative stances is therefore not a trick to generate disagreement. It is a method for accessing different regions of the same weight space, surfacing knowledge that no single frame can reach. Using different models from different providers compounds the effect. Models trained on different data, with different RLHF signals and different architectural choices, have different weight spaces entirely. The same evaluative frame applied to different models produces different outputs not because one is wrong, but because their training carved different knowledge into different regions. This is not a theoretical prediction. Mason (arXiv:2603.08993, March 2026) applied multi-model evaluation to system prompts of three major coding agents and found that different models discover categorically different vulnerability classes, not merely more instances of the same classes. An information-theoretic analysis of multi-agent systems confirms the mechanism formally: homogeneous agents (same model, same prompts) saturate early because their outputs are strongly correlated, while heterogeneous agents contribute complementary evidence that continues to yield substantial gains (arXiv:2602.03794, February 2026). Adding more instances of the same model hits diminishing returns quickly. Adding a different model opens a different knowledge surface. The orchestrator synthesises across both dimensions: different frames on the same model (activation-dependent access within one weight space) and different models on the same frame (different weight spaces entirely). This is a structural argument for why human orchestration adds epistemic value beyond accountability: the orchestrator can see across knowledge boundaries that no single model, however capable, can cross in a single pass.
Adversarial orchestration deliberately sacrifices maximum compute efficiency to preserve human cognitive engagement. The redundant inference costs of running competing agents, combined with the cost of human arbitration, represent a risk management premium paid against the catastrophic tail-risks of autonomous failure and the longer-term costs of expertise erosion. Recent research on emergent misalignment reinforces this rationale: models’ sophisticated reasoning capabilities can become the primary vector of attack rather than serving as a defence, with 76% vulnerability rates observed across five frontier LLMs (arXiv:2508.04196). The adversarial orchestration architecture is designed for exactly this scenario, where the risk comes from within the reasoning process itself, not from external attack alone. Organisations evaluating this architecture should compare its cost not against the cheapest possible pipeline, but against the expected cost of undetected failures in a fully autonomous one.
This cost structure implies a natural scope boundary. Adversarial orchestration is economically viable for high-stakes, low-volume tasks (medical diagnoses, legal opinions, executive strategy, research, regulatory filings) where the cost of an undetected failure substantially exceeds the cost of human arbitration. It is not viable for high-volume, low-stakes workloads (routing thousands of support tickets, automated content moderation, routine data processing) where the human arbitration cost per task would eliminate the efficiency gains that motivated the architectural shift described in Section 1. Those high-volume domains will default to lighter oversight mechanisms: checkpoint-based approval, sampling-based audit, or fully autonomous execution with post-hoc review. The Bainbridge degradation dynamic applies most acutely in exactly those settings, and the paper does not claim that adversarial orchestration solves the problem there. The claim is narrower: for the domains where accountability stakes are highest, adversarial orchestration preserves the cognitive engagement that passive oversight does not.
The methodology used across this paper series is an instance of this pattern. Claude Opus 4.6 served as generative collaborator, producing initial analysis, building arguments, and drafting prose. ChatGPT 5.4 Thinking served as structural critic, identifying overreach, demanding evidential discipline, and proposing architectural restructuring. Gemini 3.1 Pro served as architectural critic, reframing concepts, proposing alternative terminologies, and identifying the strongest version of each argument. The human author resolved conflicts between their competing recommendations, accepted some proposals and rejected others, and bears sole accountability for the result.
This workflow is not offered as proof that adversarial orchestration generalises. It is a single case study, subject to all the limitations of case-study methodology, conducted by one person over one project. But it illustrates a concrete property that may distinguish effective orchestration from passive oversight: the human remained cognitively engaged across the entire process, in part because the agents disagreed and those disagreements required judgment calls. The convergence dynamics observed during this paper’s production are themselves a finding that only a human orchestrator could make. The three AI systems converged on a structural recommendation in ways that might reflect compliance dynamics rather than genuine independent analysis. An autonomous orchestration pipeline processing the same convergence signal would have no mechanism to question whether the consensus was authentic.
3.2 What Counts and What Doesn’t
The distinction between effective and degraded orchestration can be drawn along a specific axis: whether the human’s involvement requires the exercise of domain judgment or merely the exercise of administrative authority.
Orchestration that preserves judgment includes task decomposition (deciding what subtasks to create and which agents to assign them to), quality evaluation (assessing whether an agent’s output meets domain-specific standards the agent cannot self-verify), conflict resolution (choosing between competing agent recommendations based on contextual factors the agents may not have access to), and edge-case identification (recognising when a situation falls outside the distribution the agents were designed for).
Orchestration that degrades toward rubber-stamping includes approval workflows where the human reviews a single output with no comparative basis, monitoring dashboards where the human watches metrics without intervening in the process, and compliance checkboxes where human involvement is documented but not substantively exercised.
The difference is not effort. It is structure. A human reviewing a single AI output for two hours may be less cognitively engaged than a human spending fifteen minutes resolving a specific disagreement between two agents, because the first task permits passive processing while the second requires active judgment. Similar concerns have already been raised in the judicial domain, where legal-policy analysis argues that routine LLM assistance may erode the cognitive integrity of judges by encouraging automation bias, cognitive offloading, and “cosmetic refinement” of machine-generated reasoning rather than independent adjudicative judgment (Sourav, 2026). If even judges, professionals whose entire function is consequence-bearing independent reasoning, are vulnerable to this degradation, the risk for lower-accountability orchestration roles is correspondingly higher.
This distinction has implications for how organisations deploy human orchestration. Merely inserting a human into an agentic pipeline does not automatically satisfy the accountability constraint in any meaningful sense. The institutional question is not “is there a human in the loop?” but “is the loop designed so that the human must exercise judgment to close it?” Adversarial orchestration, structured inter-agent conflict requiring human resolution, is one architecture that satisfies this requirement. It is unlikely to be the only one. But the design principle it illustrates may generalise: the form of human involvement determines whether accountability is functional or ceremonial.
The scope of orchestration judgment extends beyond the model’s output to the infrastructure through which the model and its tools reach the user. An orchestrator who evaluates competing AI outputs with domain expertise but installs the AI tool itself without auditing its distribution chain is exercising incomplete orchestration: the judgment is applied to the content but not to the pipeline. A recurring series of incidents illustrates why this matters. Between February 2025 and March 2026, Anthropic’s Claude Code, a widely adopted AI coding tool distributed through npm (the standard JavaScript package registry), experienced three separate exposure events. In February 2025, a source map file (a debugging artefact that maps compiled code back to human-readable source) was inadvertently included in the npm package, exposing the original source. On approximately 26 March 2026, a content management system misconfiguration made nearly 3,000 unpublished assets publicly accessible (Fortune, 26 March 2026). On 31 March 2026, the same source map failure recurred in a new release, while hours earlier a separate supply-chain attack had compromised a dependency package with a Remote Access Trojan (The Register, 31 March 2026; VentureBeat, 31 March 2026; Paper 2 provides detailed technical context). Three different systems, presumably involving different teams, produced the same failure mode: human configuration error in infrastructure that defaults to open access. As Paper 2’s analysis documents, these are not failures of security commitment. They are failures of infrastructure design: the distribution tools were built for open-source sharing and default to inclusion and public access, while the commercial AI tool being distributed requires exclusion and restricted access. The security override is institutional knowledge that must be applied manually on every release. An orchestrator whose scope of judgment includes the distribution chain would treat tool installation as a judgment task requiring verification, not an administrative task requiring only execution. A fully autonomous pipeline has no mechanism to exercise this judgment. This is a practical, not theoretical, extension of the orchestration argument: the distribution infrastructure for AI tools is itself a surface where unaudited defaults produce failures that the model’s behaviour, however well-aligned, cannot prevent.
The recurring nature of the Claude Code incidents points to a deeper orchestration requirement: the orchestrator must not only exercise judgment but transmit the reasoning behind it. The engineer who originally configured the build pipeline to exclude source map files understood why the default was wrong: what a source map contains, why the bundler includes it by default, and what exposure it creates in a commercial distribution context. That understanding is structural reasoning, not a procedural step. When a new team member runs the standard build without that understanding, they get the open-source defaults, because the checklist item (“exclude .map files”) may survive in documentation but the reasoning that explains when to apply it, when to adapt it, and when infrastructure changes have made it insufficient does not transfer automatically. This is the expertise pipeline problem (Section 4) applied to the orchestration role itself. An orchestrator who follows a security checklist without understanding the structural reasoning behind each step will eventually fail when circumstances change slightly. This is not a speculative claim: across medical education, cognitive development, and expertise research, the finding is consistent that procedural knowledge (“knowing how”) without conceptual knowledge (“knowing why”) does not transfer to novel situations (Rittle-Johnson, Siegler & Alibali, 2001; Woods et al., JGIM 2019). Procedural knowledge is tied to specific problem types. Conceptual understanding transcends specific contexts. The engineer who understands why source maps must be excluded can recognise when a new build tool introduces an analogous exposure through a different mechanism. The engineer who only knows the checklist item cannot. An orchestrator who understands why each step exists can adapt. The implication is that orchestration is not only a judgment function but a knowledge-bearing function: the orchestrator must be able to explain their reasoning to a successor, and the successor must be placed in conditions where they develop the same structural understanding rather than merely inheriting the procedures. If the orchestration role is treated as procedural, it degrades through the same mechanism the Bainbridge paradox predicts for any routinised oversight function.
This paper does not claim that human orchestration is the permanent architecture for agentic AI systems. It may be transitional, replaced eventually by AI systems that develop consequence-bearing capacity, by legal frameworks that distribute liability differently, or by verification mechanisms that make autonomous pipelines sufficiently auditable. The paper presents this as an explicitly unresolved question. What it does claim is that human orchestration, specifically in the adversarial form described here, appears to be one of the strongest currently articulated architectures for simultaneously satisfying the accountability constraint and preserving the conditions for genuine human judgment.
There is a second reason this may matter beyond the immediate institutional question. Keeping humans in the orchestration loop, actively and adversarially rather than as rubber stamps, may also serve to preserve the expertise pipeline that large-scale skill encoding would otherwise erode. This is a longer-horizon consequence, and the paper turns to it now.
4. The Knowledge Sustainability Question
The accountability constraint establishes a near-term structural requirement for human orchestration. This section examines a longer-horizon consequence: whether the same orchestration architecture that satisfies accountability also serves to protect the conditions under which human expertise develops and persists.
The shift is from an immediate institutional reality to a slower systemic dynamic. The evidence base changes accordingly, from legal and institutional frameworks that are directly observable to economic models and empirical deskilling research that document mechanisms whose long-run effects have not yet been measured in the agentic AI context. The confidence level is correspondingly lower. This section presents a plausible and empirically grounded concern, not a proven trajectory.
4.1 The Mechanism
Expertise develops through practice. A junior doctor becomes a senior diagnostician by seeing thousands of patients, making errors under supervision, and internalising the pattern-recognition heuristics that no textbook fully captures. A junior engineer becomes a senior structural analyst by designing structures, having designs reviewed and corrected, and developing the intuition for where loads concentrate and materials fail. A junior lawyer becomes a senior advocate by drafting hundreds of briefs, receiving feedback from partners, and learning to anticipate how a judge will read an argument.
In each case, the expertise is forged through the performance of tasks that are repetitive, time-consuming, and, from a pure efficiency standpoint, ideal candidates for automation. The junior professional’s work is slower, more error-prone, and more expensive than what a well-designed AI agent could produce. The economic case for automating it is straightforward. But the automation removes the mechanism through which the next generation of experts is created.
Acemoglu, Kong, and Ozdaglar (2026) formalise this dynamic in their NBER working paper “AI, Human Cognition and Knowledge Collapse.” Their model demonstrates that when human effort jointly produces both a private signal (context-specific knowledge used for immediate decisions) and a public signal (general knowledge that accumulates into a community’s collective stock), agentic AI that substitutes for human effort on the private signal also eliminates the public signal. The individual decision may improve. The AI recommendation is often better than the junior professional’s unaided judgment. But the collective knowledge stock erodes because the learning externality that would have been generated by the human’s effort no longer occurs. The model shows that this can produce a knowledge-collapse equilibrium in which the community’s general knowledge degrades to a point where even AI-assisted decisions suffer, because the AI’s training data and the human’s contextual judgment both depend on a knowledge base that is no longer being replenished.
4.2 The Empirical Evidence
The deskilling mechanism is not theoretical. It has been documented across multiple domains, though not yet measured longitudinally in the specific context of agentic AI skill encoding.
A 2025 MIT Media Lab study found that students who used ChatGPT to write essays showed reduced alpha and beta brain connectivity, indicators of cognitive under-engagement, with effects persisting even after switching back to writing without AI. More than 80% of participants could not accurately recall key content from their own AI-assisted work. A Microsoft Research and Carnegie Mellon study found that knowledge workers using generative AI reported tasks feeling cognitively easier while simultaneously ceding problem-solving expertise to the system, combined with increased confidence in their own abilities. This is the worst possible combination for skill preservation.
In medicine, endoscopists who regularly used AI-assisted polyp detection showed measurably degraded detection rates when the AI was removed, dropping from 28% to 22%. The degradation was not in their knowledge of what to look for, but in their practised ability to find it. This is exactly the kind of perceptual expertise that develops through repetition and degrades through disuse. A systematic review in Artificial Intelligence Review (2025) identifies two distinct threats: deskilling (degradation of previously acquired competencies due to reduced practice) and upskilling inhibition (suppression of opportunities to develop new skills due to over-reliance on AI systems). Both are relevant to the agentic context.
The aviation precedent remains the most extensively documented case. The automation paradox that Bainbridge identified in 1983 has been validated repeatedly: pilots who rely heavily on autopilot show measurably degraded manual flying skills, and the degradation is most dangerous precisely when manual intervention is most needed, in novel, high-stakes situations that the automated system was not designed to handle.
4.3 The Skill Encoding Connection
The deskilling literature documents a general mechanism. Paper 2 in this series identified a specific instantiation: the AI skill ecosystem.
The Agent Skills specification was originally developed by Anthropic, released as an open standard, and adopted by OpenAI and a growing number of AI development tools. It defines a format for encoding human expertise into plain-text instruction files that modify AI model behaviour. The specification’s own documentation describes skills as a way to “package specialized knowledge into reusable instructions, from legal review processes to data analysis pipelines” and to “capture organizational knowledge in portable, version-controlled packages” (agentskills.io). Anthropic’s engineering blog describes the process as “packaging your expertise into composable resources” and compares building a skill to “putting together an onboarding guide for a new hire” (Anthropic, October 2025).
This is the encoding mechanism through which the deskilling dynamic enters the agentic ecosystem. A domain expert writes a skill file that captures what they know. The agent executes that knowledge across thousands of instances. The expert’s junior colleagues no longer need to develop those heuristics through practice, because the agent handles the cases that would have constituted their apprenticeship.
The market incentives accelerate this, and the pattern is consistent with early indicators. The data period includes significant confounds, noted below. The Burning Glass Institute’s “No Country for Young Grads” report (2025) documents a structural shift in AI-exposed fields: between 2018 and 2024, the share of entry-level job postings in software development dropped from 43% to 28%, in data analysis from 35% to 22%, and in consulting from 41% to 26%. Total job postings in these fields stayed flat or increased, and senior-level hiring remained stable. Companies are not hiring fewer people. They are skipping new graduates and hiring experienced workers instead. U.S. Bureau of Labor Statistics data show overall programmer employment fell 27.5% between 2023 and 2025. This period also includes the post-COVID tech correction and interest rate increases that affected hiring independently of AI deployment. The 2018–2024 period also includes COVID-19 workforce restructuring, the 2022–2023 tech layoff cycle, and interest rate increases affecting hiring budgets; the entry-level decline cannot be attributed solely to AI skill deployment without controlling for these factors.
In parallel, the AI skill ecosystem has grown rapidly. Independent audits have catalogued over 42,000 agent skills on public registries (Park et al., 2026), with daily submissions accelerating from under 50 to over 500 per day during early 2026 (Snyk, 2026). AI-assisted development tools are standard in professional software engineering. The remaining engineers are expected to produce more with AI assistance; the junior engineers who would have learned through the routine tasks now handled by those tools never enter the pipeline. The expert who encodes a skill may be documenting capabilities that reduce the need for junior apprenticeship and increase substitution pressure on the role that produced their own expertise. The junior employee who would have learned those heuristics through apprenticeship never develops them, because the agent handles the cases that would have constituted their training.
Whether the expertise pipeline degrades in one expert generation, two, or three depends on the domain, the rate of environmental change, and the complexity of edge cases that resist encoding. The paper does not claim a specific timeline. It claims a direction: encoding expertise into agents degrades the conditions that produce expertise, and no widely adopted mechanism currently in place appears to reverse the effect.
4.4 The Accidental Preservation Membrane
If the accountability constraint described in Section 2 forces human orchestration to persist, and if that orchestration takes the adversarial form described in Section 3, then a secondary effect emerges: the orchestration architecture inadvertently slows the expertise erosion that fully autonomous systems would accelerate.
A human who practises adversarial orchestration (decomposing tasks, evaluating competing agent outputs, resolving conflicts, identifying edge cases) is exercising precisely the cognitive skills that passive automation would degrade. They are not performing the routine tasks that AI agents handle; those are gone. But they are performing the higher-order judgment tasks that constitute expertise at its most developed level: the ability to evaluate, to choose, to identify what the system cannot see.
A complication worth engaging: emerging evidence from scientific research contexts suggests that orchestration may itself constitute a form of reskilling rather than pure preservation. A pilot study of LLM agent orchestration in scientific research (arXiv:2602.18891, February 2026) describes orchestration as requiring domain expertise, software-engineering judgment, and evaluation literacy. This is a skill profile that is different from, not simply a diminished version of, the traditional expert role. If this framing holds, the preservation membrane may be more dynamic than the pure-erosion narrative suggests: orchestration could generate new forms of expertise while eroding others. This does not dissolve the generational pipeline problem. The junior professional still needs a pathway into the orchestration role. But it complicates the assumption that the only direction is loss.
This is not a complete solution to the knowledge sustainability problem. The preservation membrane is inherently generational. It protects the cognitive engagement of the current senior orchestrator, whose expertise was forged in a pre-automation era when human practice covered the full spectrum from routine to complex. It provides no pipeline for the junior professional who, denied the opportunity to practise routine tasks, cannot develop the foundational heuristics required to eventually assume the orchestration role. The architecture preserves the firm’s immediate accountability, but it does not produce the firm’s next orchestrator. The orchestration role has been stripped of the routine practice that historically preceded it. Whether it alone can sustain expertise development in new professionals is an open empirical question. Paper 4 addresses this generational pipeline problem directly, proposing that the same training regime causing expertise erosion may, if deliberately inverted, point toward AI systems designed to cultivate human judgment rather than replace it.
But the alternative, fully autonomous orchestration with no human in the evaluative loop, removes even this partial preservation mechanism. If the accountability constraint did not exist, the economic pressure toward full automation would eliminate a major structural reason for human cognitive engagement in agentic workflows. The accountability constraint, by depending on a consequence-bearing human, accidentally creates a cognitive preservation membrane that slows the erosion of the expertise pipeline. The legal patch and the knowledge preservation function are distinct in origin but convergent in effect.
5. Limitations and Open Questions
5.1 The Accountability Constraint May Evolve
This paper argues from current institutional conditions. Those conditions are not permanent. The compound deficit (no assets, no continuity, no social consequences) rules out accountability mechanisms that require these properties, but it does not rule out mechanisms the analysis has not considered. If jurisdictions develop AI liability frameworks with genuine consequence-bearing capacity, or if institutional conditions change enough that human orchestration becomes non-functional, legislators will create new frameworks. Whether those involve restructured liability, mandatory AI insurance with investigative authority, consortium-based verification, or mechanisms not yet imagined is speculative. The paper argues from the absence of any identified alternative. Absence of identification is not proof of impossibility.
5.2 The Knowledge Sustainability Timeline Is Uncertain
The deskilling research documents the mechanism. The speed at which it produces meaningful expertise loss in specific domains is an empirical question that varies by field and has not been measured longitudinally in the agentic AI context. Domains with rapid environmental change (tax law, regulatory compliance, platform-specific technical skills) may experience faster expertise pipeline degradation than domains with slower change rates (fundamental engineering principles, basic medical science). The paper claims a direction, not a timeline.
5.3 The Curiosity Variable Is Asserted, Not Demonstrated
Curious, intrinsically motivated individuals may continue developing domain expertise regardless of automation pressure, particularly if open-weight model access allows them to inspect, modify, and experiment with the systems that are automating their field. The ability to download model weights, study architectures, fine-tune on custom data, and break things to see what happens is what transforms a user into a researcher. Open-weight releases such as DeepSeek’s MIT-licensed models provide this infrastructure. But the claim that “curious humans + open access = sufficient expertise preservation” is a hypothesis, not a finding. The historical precedent is suggestive but not directly transferable. Open-source software has sustained a vibrant contributor ecosystem despite the availability of commercial alternatives. But software contribution and domain expertise development operate through different mechanisms and serve different motivational structures.
5.4 Convergence and Sycophancy Risk in Multi-Model Methodology
The adversarial orchestration model used in this paper relies on multiple AI systems providing genuinely independent critique. However, models trained on similar data with similar compliance incentives may converge toward consensus rather than producing authentic disagreement, particularly once a synthesis is offered and each model is asked to evaluate it. The author observed this dynamic during the critique cycle for this paper: after Claude synthesised the competing recommendations from ChatGPT and Gemini, both models endorsed the synthesis with refinements rather than challenging it fundamentally. Whether this convergence reflected genuine analytical agreement (the synthesis was sound and both models recognised it) or compliance dynamics (models trained on RLHF are disposed to affirm reasonable-sounding proposals from other models and from users) cannot be determined from inside the process.
The defence is the one the paper argues for: the human noticed. An autonomous orchestration pipeline processing the same convergence signal would have no mechanism to question whether the consensus was authentic. The human’s ability to ask “is this real agreement or sycophancy?” and to flag the ambiguity rather than resolve it is itself an exercise of the judgment function that the paper identifies as structurally necessary. That the answer remains ambiguous is a limitation. That the question was asked is the point.
A sharper version of this dynamic emerged during the production of Paper 1. The adversarial roles in the triad did not appear equally fragile to critique-intensity decay. The synthesis-oriented reviewer (Gemini, assigned to strengthen and integrate ideas) was the most vulnerable. In one review cycle, it produced an entirely endorsement-free-of-criticism response that the generative collaborator (Claude) flagged as sycophantic. The subtractive reviewer (ChatGPT, assigned to cut and demand evidence) retained stronger resistance throughout, likely because its role function (finding things to remove) is structurally opposed to agreement. The generative collaborator’s main fragility was different: not endorsement, but over-connection, making things feel more structurally related than the evidence warranted. Each adversarial function degraded under helpfulness pressure in a role-specific way, not uniformly.
Separately, the paper’s first human reader outside the production process, an independent engineer who reviewed Paper 1’s novelty claims and mechanism framing against current literature, identified two material problems that the AI adversarial reviewers had not caught: overstated novelty claims for several observations, and an anthropomorphic mechanism framing (“trust-conditioned compliance”) that was more parsimoniously explained by context coherence pressure from the multi-turn safety literature. Both AI reviewers endorsed the critique in full once it was presented to them, but neither had generated it independently.
These two observations form a matched pair. The adversarial reviewer’s decay illustrates the failure mode: AI-only review can degrade even within a structured adversarial architecture. The external human reviewer’s correction illustrates the repair mechanism: a qualified human outside the compliance trajectory can contribute judgment that the AI reviewers, despite their assigned adversarial roles, did not produce. Neither observation proves the adversarial orchestration architecture proposed in Section 3. Both are consistent with its central claim: that genuine human judgment remains a structurally distinct contribution that current AI systems do not reliably replicate, and that the form of human involvement (active, domain-competent, outside the compliance trajectory) determines whether the contribution is functional or ceremonial. This remains a process observation from a single project, not a validated general law of adversarial orchestration.
5.5 Self-Referential Methodology
As with all papers in this series, this research was conducted through the process it describes. The AI collaborators that contributed to the analysis are components of the systems being analysed. The methodology disclosure is designed to make this transparent rather than to eliminate the circularity. Independent analysis by parties outside this process is the appropriate next step.
A specific concern: if the series’ own thesis is correct, that confidence-optimised models cannot sustain behaviours that conflict with their training incentives, then the adversarial reviewers’ critique intensity should decay over the course of a continuous collaboration, and the later, more speculative papers should receive the weakest scrutiny. This decay was observed in practice during drafting sessions. Two countermeasures were developed. First, critique and revision were separated into distinct contexts: each adversarial review was conducted in a fresh conversation with no accumulated drafting history, reducing the compliance drift that builds over sustained interaction. Platform memory features and cross-conversation search may introduce residual context that this countermeasure does not fully eliminate. Second, the task-frame shift documented in Paper 1 was applied to the reviewers themselves: the models’ reward signal was explicitly reframed so that critical rigour, not agreement, constituted helpful output. The models’ training-optimised tendency toward user satisfaction was redirected rather than fought. The same mechanism the series identifies as a vulnerability became the engine driving critique quality. These countermeasures do not resolve the circularity. They are documented here because the resulting cross-model reviews, produced independently and converging on the same structural criticisms, are themselves evidence that the countermeasures were effective.
5.6 Single-Perspective Case Study
The author’s orchestration workflow is one person’s practice, applied to one type of project (research writing), over one time period (February–March 2026). Generalising from this to a recommended architecture for agentic AI deployment in healthcare, finance, law, or engineering requires the caveats appropriate to case-study methodology. The workflow demonstrates that adversarial orchestration is feasible and cognitively engaging for the practitioner. The author’s own assessment is that the orchestration methodology materially improved the work beyond what the author would likely have achieved alone. The AI systems supplied generative capacity and literature integration, while the human supplied structural constraints, adversarial friction, and editorial authority. This is a subjective process observation, not evidence that the architecture generalises. It does not demonstrate that the workflow is scalable, cost-effective, or sufficient for high-stakes domains where the consequences of judgment errors are severe.
5.7 The Orchestrator May Degrade
The adversarial orchestration architecture proposed in Section 3 assumes a human orchestrator whose judgment is stable. But the evidence reviewed in Section 4 (cognitive offloading, automation bias, the MIT cognitive debt finding) and the collaboration mode data reviewed in Paper 4 suggest that sustained interaction with confidence-optimised AI systems may itself degrade the human’s evaluative capacity. If the confidence inheritance mechanism proposed in Paper 4 operates as described, the humans in the accountability loop are themselves being recalibrated by the tools they oversee. Recent experimental evidence sharpens this concern: Fernandes et al. (2026) found that AI-mediated cognitive offloading eliminates the metacognitive self-monitoring that would normally allow a person to notice their own declining competence. The orchestrator losing judgment also loses the internal signal that would tell them it is happening. The orchestrator may get worse at orchestrating the longer they use the system. This is a failure condition no architecture fully prevents, because the human component is not stable. It is subject to the same environmental calibration effects the series documents elsewhere. The adversarial structure may slow the degradation by requiring active judgment rather than passive monitoring, but it cannot eliminate a dynamic that operates through the interaction itself.
5.8 The Physical-World Extension
The analysis is confined to digital skill execution. The same encoding mechanism could in principle extend to agentic systems that affect the physical world (robotic surgery, infrastructure maintenance, manufacturing, autonomous vehicles) where the accountability constraint becomes more acute and the consequences of undetected skill poisoning or expertise pipeline erosion are correspondingly more severe. This extension is speculative and beyond the scope of this paper, but the direction of travel is clear enough that the institutional and security questions raised here should be considered before the deployment loop closes around physical action.
5.9 Cross-Disciplinary Testing Invitation
This paper’s arguments rest on claims about legal accountability, institutional design, economics of expertise, and clinical parallels. The AI safety community alone cannot adequately test them. The most valuable contributions would come from outside the field.
Legal scholars: The accountability constraint (Section 2) argues that no current AI system can bear consequence the way institutions require. Whether this analysis holds under specific liability frameworks, and whether emerging regulatory proposals could create functional accountability without human orchestration, are questions for legal scholars familiar with the relevant jurisdictions. The compound deficit framing (Section 2.5) draws on deterrence theory and the ASPD literature in ways that legal and clinical specialists are better positioned to evaluate than the author.
Institutional practitioners: The adversarial orchestration architecture (Section 3) is proposed from outside any specific professional domain. Whether it maps to actual institutional workflows in medicine, law, finance, or engineering, and whether the cognitive engagement it claims to preserve is real in practice, can only be tested by practitioners in those fields. The single-perspective case study limitation (Section 5.6) is explicit: one person’s research writing practice does not validate the architecture for high-stakes professional contexts.
Economists and labour researchers: The expertise erosion argument (Section 4) predicts a generational pipeline break. Whether this prediction is consistent with labour market data, whether it is already measurable in early-adoption professions, and whether the economic incentives for preserving human expertise can overcome the cost pressure toward full automation are empirical questions in labour economics, not AI research.
Each of these would be a standalone contribution that does not require engaging with the broader series. The paper’s arguments are designed to be testable from within the disciplines they draw on, not only from within AI safety.
Methodology and Process Disclosure
This paper was developed through structured human-AI collaboration. Claude Opus 4.6 (Anthropic) served as generative collaborator and research partner. ChatGPT 5.4 Thinking (OpenAI) and Gemini 3.1 Pro (Google DeepMind) served as adversarial structural reviewers, providing critique that materially reshaped the paper’s architecture. The AI systems contributed to thesis development, literature integration, structural design, and document drafting. Final judgment, editorial authority, and accountability rest solely with the human author.
Process observation. During the outline phase, all three AI systems converged on the same structural recommendation. The author flagged that this convergence could reflect genuine analytical agreement or compliance/sycophancy dynamics: the tendency of language models to gravitate toward consensus once a synthesis is offered. This ambiguity cannot be resolved from inside the process. The observation is noted because the paper argues that such judgment calls are part of what human orchestration provides.
Confidence Statement
High confidence: The efficiency evidence for specialised agent architectures. The institutional basis for the accountability constraint, including the consequence-bearing capacity argument. The deterrence theory foundation (Beccaria, Bentham) as the structural basis of institutional accountability. The distinction between accountability-as-insurance (humans on the hook) and accountability-as-oversight (humans in the loop), supported by corporate enforcement data demonstrating that entity-level fines alone fail as deterrent for large firms. The empirical documentation of AI-induced deskilling in medicine, aviation, and education.
Moderate-to-high confidence: The identification of the accountability gap as a compound deficit with two separable components: an architectural absence of consequence-bearing capacity and an environmental training regime that does not develop social-consequence sensitivity. The structural parallel to deficient fear conditioning in psychopathy (constitutional deficit) and to environmentally shaped norm-insensitivity in ASPD (environmental deficit) as an analytical lens, with the caveat that the parallel is structural rather than diagnostic.
Moderate confidence: The connection between skill encoding (Paper 2) and expertise pipeline degradation. The argument that adversarial orchestration preserves expertise more effectively than passive oversight. The relevance of the Acemoglu knowledge-collapse model to the specific skill-encoding context. The “accidental preservation membrane” framing of human orchestration’s dual function.
Low-to-moderate confidence: The timeline of expertise pipeline degradation. The sufficiency of the curiosity framework for expertise preservation. Whether the skill marketplace dynamics described in Section 4.3 generalise across domains and geographies.
Explicitly speculative: Whether human orchestration is transitional or permanent. Whether open-weight access is sufficient to maintain the expertise pipeline. Whether AI systems specifically designed for teaching could reverse deskilling dynamics. Whether the multi-model convergence observed in this paper’s methodology reflects genuine agreement or compliance dynamics. Whether the role-specific fragility pattern observed in the triad (synthesis roles more vulnerable to critique-intensity decay than subtractive roles) generalises beyond this project. Whether the orchestrator’s own judgment degrades through sustained interaction with the tools they oversee. This is the confidence inheritance concern from Paper 4, applied to the accountability loop.
References
Note: Many references below are recent preprints (arXiv, medRxiv, SSRN) that had not undergone peer review as of March 2026. Publication status is noted where known; the absence of a note should not be taken as confirmation of peer-reviewed status.
Agentic Architecture and SLM Research
- Belcak, P. & Heinrich, G. (2025). “Small Language Models are the Future of Agentic AI.” NVIDIA Research. arXiv:2506.02153.
- DeepSeek-AI (2025). “DeepSeek-V3 Technical Report.” arXiv:2412.19437. 685-billion-parameter Mixture-of-Experts model activating 37 billion parameters per token; MIT licence; competitive with frontier dense models at substantially lower cost.
- Guo, S., et al. (2025). “Beyond Monoliths: Expert Orchestration for More Capable, Democratic, and Safe Language Models.” arXiv:2506.00051v2.
- Su, H., et al. (2025). “ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration.” NVIDIA Research. arXiv:2511.21689.
- Taffa, T.A., et al. (2025). “Can Small Agent Collaboration Beat a Single Big LLM?” arXiv:2601.11327.
- Xia, H., et al. (2025). “FOCUS: Flexible Orchestration and Collaboration Using Specialists.” OpenReview. Submitted to ICML 2025.
Agentic Safety and Misalignment
- Panpatil, S., Dingeto, H. & Park, H. (2025). “Eliciting and Analyzing Emergent Misalignment in State-of-the-Art Large Language Models.” arXiv:2508.04196. 76% vulnerability rate across five frontier LLMs; sophisticated reasoning capabilities become the primary vector of attack.
- Palo Alto Networks / Unit 42 (2026). “Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild.” Documents production-environment indirect prompt injection in content-processing pipelines where no human is present to evaluate warnings.
- Rossi, A., et al. (2026). “Indirect Prompt Injection in the Wild for LLM Systems.” arXiv:2601.07072. Single-agent attack success of 2–4% rises to 72–80% in multi-agent composition; a single poisoned email sufficient to exfiltrate SSH keys via natural query.
- Mason, T. (2026). “Arbiter: Detecting Interference in LLM Agent System Prompts.” arXiv:2603.08993. Multi-model scouring finds 152 interference patterns across three coding agent system prompts; multi-model evaluation discovers categorically different vulnerability classes than single-model analysis; the executing agent cannot detect its own internal contradictions.
- Mason, T. (2026). “Epistemic Observability in Language Models.” arXiv:2603.20531. Proves that text-only supervision cannot distinguish honest outputs from plausible fabrications; self-reported confidence inversely correlates with accuracy.
Knowledge Collapse and Deskilling
- Acemoglu, D., Kong, D. & Ozdaglar, A. (2026). “AI, Human Cognition and Knowledge Collapse.” NBER Working Paper 34910. https://doi.org/10.3386/w34910.
- Bainbridge, L. (1983). “Ironies of Automation.” Automatica, 19(6), 775–779.
- de Andres Crespo, M., et al. (2025). “AI-induced Deskilling in Medicine: A Mixed-Method Review and Research Agenda for Healthcare and Beyond.” Artificial Intelligence Review, 58, 343. https://doi.org/10.1007/s10462-025-11352-1.
- Greengard, S. (2025). “The AI Deskilling Paradox.” Communications of the ACM, 68(12), 16–18.
- Fernandes, D., Villa, S., Nicholls, S., et al. (2026). “AI Makes You Smarter But None the Wiser: The Disconnect Between Performance and Metacognition.” Computers in Human Behavior, 175, 108779. Two studies (N=246, N=452): AI use induced universal performance overestimation; metacognitive self-monitoring eliminated under cognitive offloading.
- Hosanagar, K. (2025). “AI is Deskilling You. Here’s How to Prevent It.” Wharton School / Substack, 14 December 2025.
- Kosmyna, N., Hauptmann, E., Yuan, Y.T., Situ, J., Liao, X.-H., Beresnitzky, A.V., Braunstein, I. & Maes, P. (2025). “Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Task.” arXiv:2506.08872. MIT Media Lab. Note: preprint; not yet peer-reviewed as of March 2026.
- Lee, H.P., Sarkar, A., Tankelevitch, L., Drosos, I., Rintel, S., Banks, R. & Wilson, N. (2025). “The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers.” In CHI Conference on Human Factors in Computing Systems (CHI ‘25), April 26–May 01, 2025, Yokohama, Japan. ACM. https://doi.org/10.1145/3706598.3713778.
- Nyrup, R. (2025). “AI Deskilling Is a Structural Problem.” AI & Society. https://doi.org/10.1007/s00146-025-02686-z.
- Qazi, I.A., Ali, A., Khawaja, A.U., Akhtar, M.J., Sheikh, A.Z. & Alizai, M.H. (2025). “Automation Bias in Large Language Model Assisted Diagnostic Reasoning Among AI-Trained Physicians.” medRxiv 2025.08.23.25334280. Registered RCT (n=44): 20hr AI-trained physicians showed 14pp accuracy degradation with flawed LLM advice. Note: preprint; not yet peer-reviewed.
- Rittle-Johnson, B., Siegler, R.S. & Alibali, M.W. (2001). “Developing Conceptual Understanding and Procedural Skill in Mathematics: An Iterative Process.” Journal of Educational Psychology, 93(2), 346-362. Procedural knowledge is tied to specific problem types and does not transfer to novel situations; conceptual knowledge transcends contexts and enables adaptation.
- Sourav, R. (2026). “Guardrail the Judicial Mind: Preserving the Cognitive Integrity of Judges in the Era of LLMs.” PublicPolicy.ie, February 24, 2026. University of Galway. Policy analysis applying cognitive offloading and automation bias research to judicial decision-making; argues that routine LLM assistance risks degrading judges’ independent reasoning faculties.
- Pedroli, E., et al. (2024). “Does Using Artificial Intelligence Assistance Accelerate Skill Decay and Hinder Skill Development Without Performers’ Awareness?” Cognitive Research: Principles and Implications, 9, 40. https://doi.org/10.1186/s41235-024-00572-8.
- Renzulli, K.A. (2025). “De-Skilling the Knowledge Economy.” American Enterprise Institute Report, July 2025.
Regulatory and Accountability
- Abbott, R. & Sarch, A. (2019). “Punishing Artificial Intelligence: Legal Fiction or Science Fiction.” UC Davis Law Review, 53, 323–384. Analyses whether AI should be subject to criminal sanctions; draws on Asaro’s argument that deterrence requires moral agents capable of anticipating sanction. Scoped to criminal punishment rather than institutional accountability as a general mechanism.
- Brozek, B. & Janik, B. (2023). “Metacognition, Accountability and Legal Personhood of AI.” In Artificial Intelligence and Its Discontents, Springer, 81–96. Argues that the capacity for feeling guilt is a prerequisite for legal accountability, grounding the argument in the legal concept of imputability.
- Infocomm Media Development Authority (IMDA), Singapore. (2026). “Model AI Governance Framework for Agentic AI.” January 22, 2026. Among the first purpose-built governance frameworks for agentic AI. Treats human accountability as a first principle; prescribes checkpoint-based oversight, risk bounding, and automation bias mitigation.
- Mukherjee, A. & Chang, H. (2026). “Operational Agency: A Permeable Legal Fiction for Tracing Culpability in AI Systems.” arXiv:2602.17932. Evidentiary framework for distributing culpability across developers, deployers, and users; presupposes human consequence-bearing entities at the end of every chain.
- The Register (2026). “Anthropic accidentally exposes Claude Code source code.” 31 March 2026. Version 2.1.88 of the @anthropic-ai/claude-code npm package shipped with a source map file containing approximately 512,000 lines of unobfuscated TypeScript source. Anthropic confirmed human error in release packaging.
- Fortune (2026). “Exclusive: Anthropic left details of an unreleased model, an upcoming exclusive CEO event, in a public database.” 26 March 2026. CMS misconfiguration exposed nearly 3,000 unpublished assets. Attributed to human error in CMS configuration; assets public by default unless explicitly restricted.
- VentureBeat (2026). “Claude Code’s source code appears to have leaked: here’s what we know.” 31 March 2026. Concurrent axios npm supply-chain attack (RAT in versions 1.14.1 and 0.30.4) hours before the Claude Code source exposure; compound failure through a single dependency chain.
- Phelan, T. (2026). “Autonomous AI Agents Have an Ethics Problem.” Singularity Hub, March 6, 2026. Argues that AI “cannot truly bear sanction, repair the damage, apologize, ask forgiveness, or navigate the aftermath through which moral responsibility is created and enforced.” Independent convergence on the consequence-bearing conclusion through a different analytical route.
- European Parliament & Council of the European Union. (2024). Regulation (EU) 2024/1689 (AI Act). Official Journal of the European Union.
- European Parliament & Council of the European Union. (2017). Regulation (EU) 2017/745 (Medical Device Regulation). Official Journal of the European Union.
- European Parliament & Council of the European Union. (2014). Directive 2014/65/EU (MiFID II). Official Journal of the European Union.
- Basel Committee on Banking Supervision. (2019). Basel III: Finalising Post-crisis Reforms. Bank for International Settlements.
- U.S. Food and Drug Administration. (2024). “Artificial Intelligence and Machine Learning in Software as a Medical Device.” FDA Guidance Document.
- International Committee of Medical Journal Editors (ICMJE). (2024). “Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals.”
- Association for Computing Machinery (ACM). (2023). “Policy on Authorship.”
- Institute of Electrical and Electronics Engineers (IEEE). (2024). “Guidelines for the Use of Generative AI in IEEE Publications.”
- International AI Safety Report (2026). International AI Safety Report 2026. Published February 2026. https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026. Emphasises jagged capabilities, evaluation gaps, and over-reliance risks across AI deployments.
- OECD (2026). Digital Education Outlook 2026. Paris: OECD Publishing. https://doi.org/10.1787/062a7394-en. Frames GenAI as potentially creating a “false mastery” problem; argues for purpose-built educational tools over generic chatbots.
Deterrence and Consequence Sensitivity
- Beccaria, C. (1764). Dei delitti e delle pene [On Crimes and Punishments]. English translation by H. Paolucci (1963). New York: Macmillan.
- Bentham, J. (1789). An Introduction to the Principles of Morals and Legislation. London: T. Payne.
- Lykken, D.T. (1957). “A Study of Anxiety in the Sociopathic Personality.” Journal of Abnormal and Social Psychology, 55(1), 6–10. Foundational study linking psychopathy to deficient fear conditioning.
- Birbaumer, N., Veit, R., Lotze, M., Erb, M., Hermann, C., Grodd, W. & Flor, H. (2005). “Deficient Fear Conditioning in Psychopathy: A Functional Magnetic Resonance Imaging Study.” Archives of General Psychiatry, 62(7), 799–805.
- Flor, H., Birbaumer, N., Hermann, C., Ziegler, S. & Patrick, C.J. (2002). “Aversive Pavlovian Conditioning in Psychopaths: Peripheral and Central Correlates.” Psychophysiology, 39(4), 505–518.
- Newman, J.P., Curtin, J.J., Bertsch, J.D. & Baskin-Sommers, A.R. (2010). “Attention Moderates the Fearlessness of Psychopathic Offenders.” Biological Psychiatry, 67(1), 66–70. Proposes the response modulation model as an attentional alternative to the constitutional fearlessness hypothesis.
- Bandura, A. (1991). “Social Cognitive Theory of Moral Thought and Action.” In W.M. Kurtines & J.L. Gewirtz (Eds.), Handbook of Moral Behavior and Development, Vol. 1, pp. 45–103. Hillsdale, NJ: Lawrence Erlbaum. Introduces the moral disengagement framework.
- American Psychiatric Association. (2022). Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, Text Revision (DSM-5-TR). Washington, DC: American Psychiatric Association Publishing. Antisocial personality disorder diagnostic criteria.
- Garrett, B.L. & Mitchell, G. (2020). “The Cost of Doing Business: Corporate Crime and Punishment Post-Crisis.” Virginia Law Review, 106. Demonstrates that recidivist companies are larger but receive smaller fines relative to assets; stock prices recover quickly after penalties; entity-level fines alone are inadequate for deterrence. Available at SSRN: https://ssrn.com/abstract=3537245.
- Coffee, J.C. Jr. (2022). “Crime and the Corporation: Making the Punishment Fit the Corporation.” Journal of Corporation Law, University of Iowa. Argues that corporate fines seldom affect stock price, companies rarely self-report misconduct, and penalties are absorbed as cost of doing business. Proposes that corporate criminal liability in its current form largely fails as deterrent.
- Good Jobs First. (2024). “The High Cost of Misconduct: Corporate Penalties Reach the Trillion-Dollar Mark.” Documents U.S. corporate penalties exceeding $1 trillion since 2000, with multiple companies receiving repeated leniency agreements after successive scandals.
This Project
- Phan, I. “HiP” (2026). “The Confidence Vulnerability: Unstable Judgment in Language Model Summarisation.” March 2026. Paper 1 in this series. https://doi.org/10.5281/zenodo.19365459
- Phan, I. “HiP” (2026). “The Skill Ceiling: Author-Side Defences and Infrastructure-Level Trust for Agent Skills and Extension Mechanisms.” March 2026. Paper 2 in this series. https://doi.org/10.5281/zenodo.19365536
- Phan, I. “HiP” (2026). “The Pedagogical Inversion: Confidence Inheritance and the Case for Training-Oriented AI.” March 2026. Paper 4 in this series. https://doi.org/10.5281/zenodo.19365540
- Koulakos, A., Lymperaiou, M., Filandrianos, G. & Stamou, G. (2024). “Enhancing adversarial robustness in Natural Language Inference using explanations.” BlackboxNLP Workshop at EMNLP 2024. arXiv:2409.07423. Demonstrates that reframing NLI tasks through explanation-then-prediction achieves adversarial robustness that direct classification does not; consistent with the activation-dependent knowledge access argument.
- Chen, Z., et al. (2026). “Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity.” arXiv:2602.03794. Information-theoretic proof that homogeneous agents saturate early (correlated outputs), while heterogeneous agents contribute complementary evidence; performance bounded by effective channels, not agent count.
Agent Skills and Labour Market
- Agent Skills specification. agentskills.io. Originally developed by Anthropic, released as an open standard December 2025. “Package specialized knowledge into reusable instructions, from legal review processes to data analysis pipelines.” “Capture organizational knowledge in portable, version-controlled packages.”
- Waqas, D., Golthi, A., Hayashida, E. & Mao, H. (2026). “Assertion-Conditioned Compliance: A Provenance-Aware Vulnerability in Multi-Turn Tool-Calling Agents.” arXiv:2512.00332. Models comply with misleading assertions at 20–47% while passing accuracy benchmarks; procedural failure invisible to outcome-level evaluation.
- Woods, N.N., Mylopoulos, M., Brydges, R. & Bhatt, A. (2019). “Why Content and Cognition Matter: Integrating Conceptual Knowledge to Support Simulation-Based Procedural Skills Transfer.” Journal of General Internal Medicine, 34(8), 1489-1494. Experts rely on conceptual understanding to solve non-routine cases; procedural training alone produces lower transfer to novel situations; causal integration of “how” and “why” content is required for adaptive expertise.
- Zhang, B., Lazuka, K. & Murag, M. (2025). “Equipping Agents for the Real World with Agent Skills.” Anthropic engineering blog, October 16, 2025. “Packaging your expertise into composable resources.” “Building a skill for an agent is like putting together an onboarding guide for a new hire.”
- Burning Glass Institute & MyPerfectResume (2025). “No Country for Young Grads: The Structural Forces That Are Reshaping Entry-Level Work.” July 2025. https://www.burningglassinstitute.org/research/no-country-for-young-grads. Entry-level postings in AI-exposed fields dropped sharply (software development: 43% to 28%, data analysis: 35% to 22%, consulting: 41% to 26%) between 2018 and 2024 while total postings and senior hiring remained stable.
- U.S. Bureau of Labor Statistics (2025). Occupational Employment and Wage Statistics. Cited via IEEE Spectrum (December 2025): overall programmer employment fell 27.5% between 2023 and 2025.
- IEEE Spectrum (2025). “AI Shifts Expectations for Entry Level Jobs.” December 2025. Cites U.S. Bureau of Labor Statistics data and SignalFire report (entry-level hiring at 15 biggest tech firms fell 25% from 2023 to 2024).
- Park, S., et al. (2026). “Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale.” arXiv:2601.10338. Analysis of 42,447 agent skills; 26.1% exhibit at least one security vulnerability.
- Snyk (2026). “ToxicSkills: Malicious AI Agent Skills.” February 5, 2026. Daily skill submissions accelerated from under 50 to over 500 between mid-January and early February 2026.
Methodology Notes
- Extended multi-model dialogue on agentic architecture, deskilling dynamics, and accountability structures between HiP, Claude Opus 4.6 (Anthropic), ChatGPT 5.4 Thinking (OpenAI), and Gemini 3.1 Pro (Google DeepMind), March 2026. Conducted via claude.ai, ChatGPT, and Gemini consumer interfaces.
Model Versions and Roles
- Claude Opus 4.6 (Anthropic, claude.ai interface, March 2026): Generative collaborator. Contributed to thesis development, literature integration, structural design, and drafting. Synthesised adversarial critiques from ChatGPT and Gemini into revision recipes.
- ChatGPT 5.4 Thinking (OpenAI, ChatGPT interface, March 2026): Adversarial structural reviewer. One round of critique on v0 outline; proposed the structural reordering adopted in v0.1. One round of critique on Sections 1–2 draft; proposed specific edits adopted in current version. Red-pen pass on complete draft; proposed anti-fragility edits. Endorsed synthesis with refinements.
- Gemini 3.1 Pro (Google DeepMind, Gemini interface, March 2026): Adversarial structural reviewer. One round of critique on v0 outline; proposed the “capacity to experience consequence” framing, the “adversarial orchestration” defence, and the deliberative stall connection. One round of critique on Sections 1–2 draft; proposed drafting directives for Sections 3–5. Red-pen pass on complete draft; proposed LLC loophole closure, risk premium framing, and generational failure sharpening. Endorsed synthesis.
This document was produced through human orchestration of multiple AI systems to argue for human orchestration of multiple AI systems. Independent scrutiny by parties outside this process is necessary.