2026-01-16//LOG
Building Payment Systems That Don't Lose Money
I have been building payment processing infrastructure at Yooga for a while now. POS systems for restaurants. Real money, real transactions, real consequences when things go wrong. This is everything I have learned about not losing money in production.
RULE ONE: IDEMPOTENCY IS NOT OPTIONAL
Every single payment operation needs an idempotency key. Every. Single. One. I do not care if you think your system will never retry a request. It will. The network will hiccup. The client will double-tap. The load balancer will timeout and retry. If your payment endpoint is not idempotent, you WILL process duplicate charges.
We use a composite key: merchant ID plus a client-generated UUID plus the operation type. This gets stored before we even talk to the payment processor. If we see the same key twice, we return the cached result from the first attempt. No second charge. No drama.
RULE TWO: RACE CONDITIONS WILL FIND YOU
Here is a fun story. We had a race condition where two concurrent requests could both read the same account balance, both verify sufficient funds, and both deduct from it. The result? A merchant could spend more than they had. We found this because a test restaurant managed to process negative balances during a lunch rush.
The fix was pessimistic locking on the balance check. SELECT FOR UPDATE on the account row, verify funds, deduct, commit. Yes, it serializes concurrent payments for the same account. Yes, that is slightly slower. No, I do not care. Correctness beats performance in payment systems EVERY time.
But that was the easy race condition. The hard one was with concurrent card terminal transactions. Two POS devices, same merchant, both hitting the payment processor at the same time. The processor returns success for both, but our webhook handler processes them out of order and the second transaction overwrites the status of the first. We lost visibility into completed payments.
Solution: event sourcing for transaction state changes. Every status update is an append-only event with a sequence number. We reconstruct current state from the event log. No overwrites. Full audit trail. If events arrive out of order, we resequence based on the processor timestamp.
RULE THREE: THE R$47,000 INCIDENT
I need to talk about this because it is the most expensive bug I have ever shipped.
We had a webhook retry mechanism. When our processor sent a payment confirmation and we failed to acknowledge it (HTTP 200), they would retry. Standard stuff. Except our handler was not checking if the transaction had already been recorded. So every retry created a NEW transaction record in our system.
One busy Friday night, our webhook endpoint went down for about 90 seconds due to a deployment. The processor queued up retries. When we came back online, we got hammered with retry webhooks. Each one created a duplicate transaction. The total duplicated amount across all affected merchants: R$47,000.
We caught it within two hours because of balance reconciliation alerts. But those two hours were the longest of my career. We had to manually reverse every duplicate, contact every affected merchant, and explain what happened.
The fix was embarrassingly simple. Check the processor transaction ID against our records before creating a new entry. If it exists, acknowledge and skip. A five-line fix for a R$47,000 mistake.
RULE FOUR: RECONCILIATION IS YOUR SAFETY NET
Every night at 3 AM, we run a reconciliation job. It pulls every transaction from our system and every transaction from the payment processor for the past 24 hours. It compares them. Any mismatch triggers an alert.
This has caught bugs that nothing else would have found. Silent failures where we recorded a payment but the processor actually declined it. Edge cases where partial refunds got lost. Timezone bugs where a transaction showed up on different days in our system versus the processor.
Reconciliation is not glamorous. Nobody puts it on their resume. But it is the single most important piece of infrastructure in a payment system. Build it before you process your first real transaction.
RULE FIVE: NEVER TRUST THE CLIENT
The POS terminal sends the amount to charge. Do not trust it. Recalculate on the server from the order items. We had an incident where a modified APK on a compromised terminal was sending lower amounts than the actual order total. The merchant was essentially giving discounts they did not intend to give.
Server-side amount calculation from the source of truth (the order in our database) is non-negotiable. The terminal amount is for display purposes only.
FINAL THOUGHTS
Payment systems are not hard in the algorithmic sense. The code is straightforward. What makes them hard is that every edge case costs real money. A race condition in a blog platform means a duplicate post. A race condition in a payment system means someone loses money. The stakes change how you think about every line of code.
Build idempotency first. Lock aggressively. Reconcile everything. Trust nothing from the client. And for the love of all that is holy, test your webhook handlers with duplicate deliveries before you go to production.