How do four of the most-discussed FHIR servers actually compare when they run on the same hardware, with the same data, under the same load? Health Samurai published an open benchmark on 2026-06-29 that does exactly that for Aidbox, HAPI FHIR, Medplum, and the Microsoft FHIR Server. The suite reruns daily, the repo is public, and the snapshot is worth a careful read. For the surrounding clinical-data context, the clinical data exchange hub sits next to this on the site.
The Setup, in One Paragraph
The benchmark runs on a single bare-metal machine with 64 CPU cores and 500 GB of RAM. Each server gets 8 vCPU and 24 GB of memory. Medplum runs as eight 1-vCPU 3-GB replicas to match its supported shape. Aidbox, HAPI, and Medplum sit on PostgreSQL 18; the Microsoft FHIR Server runs on SQL Server 2022 Developer Edition. The dataset is Synthea, 1,000 patient records, around 2 million resources. The load generator is Grafana k6. It is worth saying once that Health Samurai is the company behind Aidbox, so this is a vendor-run benchmark; the open repo and daily CI are what make the bias inspectable rather than hidden.
CRUD Throughput
The CRUD workload exercises create, read, update, and delete on a mix of nine resource types. In the 2026-06-29 snapshot:
- Aidbox: 5,212 RPS
- HAPI FHIR: 3,058 RPS
- Medplum: 1,420 RPS
- Microsoft FHIR Server: 440 RPS
The spread is wider than most procurement teams expect. The benchmark author, Marat Surmashev, attributes the gap to architectural choices in indexing and persistence rather than to hardware, since the hardware is the same for everyone.
Bundle Import Throughput
Bundle import is the operationally critical metric for migrations. The same workload measured in resources per second:
- Aidbox: 2,678 resources per second
- HAPI FHIR: 2,214 resources per second
- Medplum: 764 resources per second
- Microsoft FHIR Server: 448 resources per second
HAPI lands much closer to Aidbox here than on CRUD, which is consistent with the indexing trade. HAPI pre-builds search indexes on write, paying for that during import; Aidbox ships without default search indexes, so the import side is faster and indexing becomes a separate operator choice.
Storage After the Same Load
After loading the identical 1,000-patient corpus, the on-disk footprints differ in ways that mirror the indexing choices:
- Microsoft FHIR Server: 4.24 GB
- Aidbox: 6.83 GB
- Medplum: 11.8 GB
- HAPI FHIR: 22.6 GB
A smaller number here is not automatically better; it usually means fewer pre-built indexes, which moves the cost from disk to query time.
Where the Comparison Ends
The honest limit of this snapshot is the corpus. 1,000 patients fits in memory, which makes this a baseline rather than a scale test. The project notes that the next post in the series tests at a larger scale, where the index strategy gaps usually look different. For the terminology side that lives next to the FHIR server in most clinical stacks, best FHIR terminology servers for WHODrug lookup in 2026 and top 5 commercial terminology servers for pharma stacks in 2026 cover that adjacent layer.
The short version: the four servers landed in a wide spread on every metric, and the next post in the series is the one to wait for if scale matters most.