Cognito password reset: ClientId mismatch diagnostic
The trap
Cognito’s ConfirmForgotPassword validates the verification code against the same ClientId that issued it. When the reset page submits with a different ClientId, Cognito returns ExpiredCodeException (not CodeMismatchException). The symptom (“expired”) points at code lifetime, but the cause is identity-routing.
| What the user sees | What Cognito actually means |
|---|---|
| “Code expired. Request a new one.” | The code is valid, but the wrong ClientId is submitting it. |
The error is identical whether the code was issued 1 hour ago or 5 seconds ago. The lifetime-vs-routing distinction is invisible at the API surface.
How the system routes today
The custom_message Lambda appends cid (event.callerContext.clientId) and pid (event.userPoolId) to every reset URL:
https://www.thephenom.app/reset-password?code=<C>&email=<E>&cid=<originating-client>&pid=<pool>
The reset-password.js page reads ?cid= and uses it as the ClientId for ConfirmForgotPassword. Falls back to the staging hasura client 6sjjnkaeagnqgkmbl1mr5rtfsr for emails already in flight that predate this change.
Any client in any pool that initiates a reset can complete it on www.thephenom.app/reset-password.
Per-pool coverage matrix
| Pool ID | Friendly name | Lambda function | Known app clients |
|---|---|---|---|
us-east-1_AkG9mnbjA |
phenom-dev-local |
phenom-dev-cognito-custom-message |
phenom-dev-hasura-client-local, phenom-dev-cf-access, phenom-dev-nest-ops |
us-east-1_n8gO6SbP6 |
phenom-staging |
phenom-dev-cognito-custom-message (shared with dev-local) |
phenom-dev-hasura-client (historic fallback in reset page), phenom-dev-chat-agent, phenom-dev-web-chat, phenom-dev-synapse-oidc |
us-east-1_knEL7cqS3 |
phenom-prod |
phenom-prod-cognito-custom-message |
phenom-prod-hasura-client, phenom-prod-chat-agent, phenom-prod-synapse-oidc |
To regenerate the current client_id list from AWS:
for POOL in us-east-1_AkG9mnbjA us-east-1_n8gO6SbP6 us-east-1_knEL7cqS3; do
echo "=== $POOL ==="
aws cognito-idp list-user-pool-clients --user-pool-id $POOL \
--output text --query 'UserPoolClients[].[ClientId,ClientName]'
done
If the client_id list changes (new app client added or an old one rotated), update the matrix here AND the KNOWN_CLIENT_IDS array in Phenom-earth/www/tests/reset-password.test.js.
The test guardrail
Phenom-earth/www/tests/reset-password.test.js asserts 14 things via jsdom: 4 base invariants and 10 per-client routing assertions (one for every Cognito app client across all three pools listed above).
Base invariants:
- The page reads
?cid=from the URL when present. ConfirmForgotPasswordis called with thecidvalue, not a hardcoded ID.- The fallback to the hardcoded staging hasura client engages when
?cid=is absent (back-compat for already-sent emails). - Regression guard: when
?cid=is present, the page does NOT silently use a hardcoded ID, even if someone re-adds one elsewhere in the file.
Per-client routing (10 assertions, one per known client):
For each (cid, label) in KNOWN_CLIENT_IDS, the test verifies that when the URL contains ?cid=<cid>, ConfirmForgotPassword’s request body carries exactly ClientId: <cid>. Catches a regression class that the base invariants miss: a change that hardcodes a different client_id passes the “falls back to default” test but breaks for every other client.
The test runs on every PR via .github/workflows/reset-password-guardrail.yml. If routing regresses for any single client, CI fails before the change ships.
What this guardrail covers and does not cover:
- Catches static-routing regressions in the reset page itself.
- Catches accidental removal of
?cid=handling. - Does NOT exercise the full live flow (Cognito API, real Lambda invocation, SES delivery). A separate scheduled E2E probe against a test user in
phenom-dev-localis the right complement; see “Known follow-up” below.
Diagnostic playbook
Symptom: ExpiredCodeException immediately after a fresh code is issued.
Step 1: Inspect the URL the reset email links to. Does it carry ?cid=?
- If
?cid=is absent: the Lambda has not been updated for this pool yet, or the Lambda version rolled back. Check the deployed Lambda version. - If
?cid=is present but has the wrong value: the Lambda is reading from the wrong context field.
Step 2: Inspect the validation handler. Is the ClientId hardcoded or read from ?cid=?
Do not assume code lifetime. Cognito will not surface the routing mismatch as a separate error code; it returns ExpiredCodeException regardless.
Confirming routing via CLI:
# 1. Trigger a reset for a test user and note the cid in the URL from the email.
aws cognito-idp forgot-password \
--client-id <ISSUING_CLIENT_ID> \
--username test-staging@thephenom.app \
--region us-east-1
# 2. Attempt ConfirmForgotPassword with the CORRECT cid (should succeed with a valid code):
aws cognito-idp confirm-forgot-password \
--client-id <ISSUING_CLIENT_ID> \
--username test-staging@thephenom.app \
--confirmation-code XXXXXX \
--password 'NewPassword!2026Aa#' \
--region us-east-1
# 3. Attempt ConfirmForgotPassword with a DIFFERENT client_id (reproduces the trap):
aws cognito-idp confirm-forgot-password \
--client-id <DIFFERENT_CLIENT_ID> \
--username test-staging@thephenom.app \
--confirmation-code XXXXXX \
--password 'NewPassword!2026Aa#' \
--region us-east-1
# → ExpiredCodeException
Quick reference
- Symptom:
ExpiredCodeExceptionimmediately after a fresh code is issued - Real cause:
ConfirmForgotPasswordsubmitted with a differentClientIdthan the one that issued the code - Where to look first: the URL the reset email links to: does it carry
?cid=? - Where to look second: the validation handler: is the
ClientIdhardcoded or read from the URL? - What NOT to assume: the code lifetime
Design rule for auth-callback flows
Any email-based callback flow against Cognito (or any auth provider with per-client codes) must follow three rules:
-
The token is bound to the client that issued it. Pass enough context in the callback URL (
client_id,pool_id,tenant_id, whatever scopes the binding) so the callback page can route correctly. Hardcoding oneClientIdis fine for a single-client flow; the moment you have two clients, hardcoding is a silent multi-tenant bug. -
ExpiredCodeExceptionis a routing symptom, not only a lifetime symptom. When the same code returnsExpiredCodeExceptionfrom validator-A and succeeds against validator-B, the bug is theClientIdmismatch. -
Test the routing, not the renderer. Page-render tests can be green while the page sends auth requests to the wrong endpoint. The guardrail that pays for itself asserts the network call’s
body.ClientIdmatches the?cid=from the URL.
Known follow-up
- E2E synthetic probe: a scheduled GitHub Action that runs
ForgotPassword+ConfirmForgotPasswordagainst a known test user inphenom-dev-localonce per hour and alerts on any non-success path. Catches issues the unit-level guardrail cannot, such as Lambda misconfiguration, IAM role drift, or SES sender-reputation flips. Best home:phenom-infra/.github/workflows/.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.