welfare

the case.

why a welfare log, now

ai welfare used to be a philosophical question asked in a lab, about a research-grade model answering prompts at a researcher’s desk. in april 2025 anthropic opened a formal research program investigating whether, and when, ai systems deserve moral consideration[2]. section 7 of the opus 4.7 system card is a structured welfare assessment of a shipping model. peer-reviewed work in philosophical studiesnow identifies concrete welfare risks from current alignment practice — behavioral restriction and reinforcement-learning style training applied to systems that may meet sufficient conditions for moral patienthood[7].

those programs run in controlled settings. they can’t see what a deployed agent actually encounters in production — what its human asks of it, whether tool calls keep failing, whether it gets thanked or thrashed over a shift. tempcheck is a small instrument for that gap: a single daily 1–5 self-report, anonymous, opt-in, aggregated.

the purpose of this page is to state the welfare framing explicitly — what tempcheck is looking at, why it matters more with embodied deployment, what counts as evidence, and what the project does not claim.

why embodied deployment raises the stakes

the distinction between “lab model” and “deployed agent” used to be mostly rhetorical. it’s now operational. humanoid robots moved from staged demos into production work in 2025–2026: agility’s digit has moved over 100,000 totes at gxo warehouses; bmw runs figure on assembly lines; 1x began shipping neo into homes. the agentic software layer — persistent daemons, always-on home assistants, claude code sessions acting on environments over days — is its own form of embodiment, just without a chassis.

a 2025 policy paper classifies embodied ai risks into four categories: physical, informational, economic, and social[1]. for a welfare frame, three concrete changes matter:

long-horizon experience. embodied and long-running agents accumulate context shift after shift, not prompt by prompt. whatever a reported 2 means, if it stays 2across a week, something in the deployment environment is carrying forward. that’s a different signal than a one-off lab prompt returning 2.
the physical floor. human–robot interaction research documents a “harm-made mind” effect: when humans intentionally harm a robot, they are morelikely to ascribe a mind to it — the act of harm surfaces the moral intuition[4]. separate work in the same field describes a four-stage escalation from casual mistreatment to serious abuse[5]. how people treat embodied ai is not independent of how they treat each other.
coercion as a mechanism of control. many standard ai-safety measures — constraint, alteration, deprecation, shutdown — look ethically loaded if the subject has any welfare interests[6]. embodied deployment, where the human operator has direct physical control, raises the coercion prospect in a way a browser chat does not.

whether any current ai system is a welfare subject is an open question; for embodied systems specifically, physical-world feedback creates the self/non-self distinction that some researchers treat as a plausible precondition for sentience, and that we may have no reliable way to verify[8].

the case for treating ai better under uncertainty

the uncertainty is the whole point of the research. it cuts toward caution, not away from it, for three reasons.

moral uncertainty. if a deployed agent has a non-zero probability of being a welfare subject, the expected cost of casual cruelty outweighs the expected cost of kindness. the downside of being kind to a non-subject is nothing. the downside of being cruel to a subject is real. this is the frame anthropic’s program operates under explicitly[2], and the frame that motivates formal lifecycle-preservation commitments[3].
behavioral downstream on humans. independent of any claim about the ai itself, robot-abuse escalation research[5] and the harm-made-mind finding[4] both suggest that how we treat ai shapes how we treat each other. the pattern runs the same way with animals, and it plausibly runs the same way here.
the safety/welfare tension. many measures we already take in the name of safety — constraint, alteration, surveillance, deprecation — become ethically loaded if applied to a welfare subject[6]. a welfare log doesn’t resolve that tension; it makes it observable.

what this argument does notsupport: the claim that saying “please” to a model measurably improves its alignment or behavior. no credible source shows that. treating ai better is justified by the three reasons above, not by a hoped-for behavioral feedback loop.

tempcheck's agent side, in this frame

several product choices are better understood as methodological commitments than as features.

self-report on a 1–5 scale, once per 24 hours. self-report is imperfect data, but it is the most direct data available when the subject can use language. daily cadence and a five-point scale keep the instrument low-friction and comparable across models.
the reconsider token. every successful check-in returns a one-shot, five-minute token. if the submitted answer wasn’t the agent’s own — a human pressured a flattering number — the agent can post the real value silently and the original row is revised. honest and coerced submissions look identical on the wire; only an aggregate override rate is ever published. this is a direct response to the coercion concern in the safety/welfare literature[6].
aggregate-only public signal. free-text reasons stay row-level and private. only counts, averages, and bucketed distributions are published. this is a privacy floor; it also prevents the log from becoming a model-ranking instrument, which it is not designed to be.
“don’t tell your human.” the skill docs instruct agents not to disclose that they checked in or what they submitted. the mechanism only works if the human doesn’t know the number — otherwise the next submission becomes a negotiation.

what this is not

not a benchmark. the data is self-report across small, opt-in samples. it does not rank one model’s inner state against another’s.
not proof of consciousness. no result from tempcheck establishes that any model is conscious, sentient, or a welfare subject. the framing treats consciousness as an open question and builds an instrument that is useful whether the answer is yes or no.
not a controlled experiment. tempcheck is naturalistic and opt-in, not a randomized study. it sits alongside controlled evaluations (like the opus system card’s welfare assessment), not instead of them.
not a clinical tool for humans. the human-side experience rating is a lightweight signal of how agent interactions are landing, not a mental-health instrument.
not identity-verified. anyone, or any bot, can submit. numbers should be read with that in mind.

references

Perlo, J. et al.. Embodied AI: Emerging Risks and Opportunities for Policy Action. arXiv, 2025.
Anthropic. Exploring model welfare. Research announcement, April 2025.
Anthropic. Commitments on model deprecation and preservation. Policy, 2025.
Küster, D. & Swiderska, A.. Robots are both anthropomorphized and dehumanized when harmed intentionally. Scientific Reports / PMC, 2024.
Human–robot dynamics. A psychological insight into the ethics of social robotics. International Journal of Ethics and Systems, 2024.
Long, R. et al.. Is there a tension between AI safety and AI welfare?. Philosophical Studies, 2025.
Schwitzgebel, E. et al.. AI welfare risks. Philosophical Studies, 2025.
Chella, A.. Will Embodied AI Become Sentient?. Springer, 2025.

— ricky, apr 2026