Introduction

Medblocks turns health data from EHRs, wearables, claims, and clinical documents into verified answers, with full provenance back to every source.

How healthcare data analytics works today

Two hard phases stand between you and an answer: getting the data, then analysing it. Both have traditionally taken months, and millions of dollars.

Getting the data into your warehouse can take months and multiple bespoke integrations with EHRs, and Payer systems. These take time to get requested, scoped, and signed off.

Once the data is in your warehouse, you now have to make sense of this data.

Most analytics vendors provide an Enterprise Master Patient Index (EMPI) to merge duplicates and cover well-defined metrics like eCQM, HEDIS, and AHRQ. There’s still the work of mapping your data sources into the input format they expect.

The quality of every answer downstream depends on getting this transformation right, so months of debugging tend to happen at this interface alone, and this kind of ad-hoc work is already at the limit of what most healthcare organizations can afford to do in-house. What you still get is a black box, and when the numbers look off you spend days going back and forth with the vendor team to figure out why.

At Medblocks, we think there is a different way to do this.

Why now

Getting the data out of healthcare systems has gotten drastically simpler in the last few years thanks to enforcement of information blocking rules in the US. These new data sources bypass the expensive traditional paths.

Providers and payers have unprecedented data access to EHR systems through standardised APIs like FHIR. Traditional HIEs like CommonWell and Carequality are going FHIR-first through TEFCA and the CMS aligned networks initiative.

Patient-mediated FHIR access through ONC g(10), CMS-9115 and TEFCA IAS lets patients pull their own records in minutes and contribute previously non-available data. Wearable device data is increasingly available through patient-authorised feeds.

These changes have made obtaining data 10x cheaper and faster and more direct than the traditional brokered routes used by most healthcare enterprises today.

The delivery of healthcare is also getting more data-driven. Value-based providers, ACOs, and condition-specific clinics are taking on more risk than ever and it’s becoming critical to “know the patient” deeply. A whole segment of new care delivery models like Direct Primary Care, Chronic Disease Management, Longevity Care, Employee-sponsored Wellness Programs don’t just bill for a procedure or diagnosis code, but need to own outcomes and have to bear the financial risk if the outcome of patients don’t meet expectations.

They NEED to understand the complete patient picture across claims, wearables, social determinants of health, and EHR records. AND be able to ask specific clinial questions on this longitudinal record and get reliable answers.

The traditional clinical quality measures that drove the previous decade of analytics adoption, like HEDIS, eCQMs, and AHRQ, have mostly been downstream lagging indicators of decisions that should have been made much earlier. For example: catching a depressed patient’s sleep patterns drifting and intervening before things get worse is a different problem from learning at the end of the measurement year that they stopped taking their medications. The first needs fresh signals from many sources with real-time queries monitoring this; the second is a measure calculated at the end of the year, after the damage is done.

It’s been traditionally too expensive for organizations to ask these kinds of questions on the patient’s data:

Insufficient data pipelines to get the data points inside their systems.
Not enough data processing capability within the organization to process unstructured data.
Not enough data analytics capability within the organization to make sense of structured data.
Analytics vendors are not interested in building data products for these questions because they can’t amortize the cost of developing these “be-spoke analytics” across other customers.

We believe there is now a new option for consideration: AI coding agents that self-improve on your data and answer all your questions across your data. Long-running AI agents can now take a complex problem like building a C compiler or a web browser and run for hours or even days to complete it autonomously.

Together these shifts make a new kind of data analytics platform possible: fast connections with standardized data sources and a self-learning agent that watches your incoming data, transforms it answers that make a difference in the clinical context. That’s what we’re building.

How we’re different

Instead of bespoke ETLs and depending on vendors to provide exports to your data warehouse, we focus on direct integrations with systems using standardised regulated APIs like FHIR to retrieve as much data as possible. Our [[Connectors]] will enable you to connect with most leading EHRs, HIE Networks and Payers through a standardized FHIR API. Patients have the option to connect their own data sources like their wearables, data from other health systems through a link sent via SMS that takes 5 mins for them to complete.

Once the data is available, you would like to answer specific questions. “What was this patient’s last HBA1c value?” or “Is this patient due for an appointment?”

Most AI analytics tools provide a chat window where you can ask a question. They then run a search over your data (which sometimes surfaces relevant data), generate a detailed answer, and maybe provide citations. You have no idea what data the model didn’t see. Sometimes the model answers without knowing the full context or value sets that show up in your dataset. Your dataset might have custom codes for HbA1c - while the model might search using the well-known LOINC codes and miss out on the real answer. Accuracy of this approach is really poor: FHIR-AgentBench shows current AI agents achieving only around 50% accuracy on EHR data with frontier LLMs.

This approach is also fundamentally not scalable. Every single question you pose to the AI burns tokens, which gets prohibitively expensive at terabyte-scale analytics.

With Medblocks, you define what you want to know about the data as simple [[Tables]]. You define which columns you are interested in - like “Patient Name”, “Last HBA1C reading”, “Missed appointment?” with as much description of these questions as you can provide.

Our AI agent will not give you an answer immediately. It instead runs a coding agent that understands your data deeply, and runs for hours or even days inside a secure HIPAA-compliant environment. It then builds robust deterministic SQL mappings from your data to your tables. On the same FHIR-AgentBench, our approach scores 98% - almost solved except for a few formatting errors.

Our team rigorously reviews SQL mappings on your dataset before it goes live, and provenance from the answer to the data source is built-in.

Running queries on these tables, even on terabytes of data is just what you pay your data warehouse for running the SQL query, and your tables update real-time with new incoming data.

If the agent detects that the shape of the incoming data has changed significantly, it’s able to run the same improvement cycle again and self-improve the mappings over time.

The platform also deals with unstructured data: PDFs, text notes, handwritten documents, and free-text fields inside FHIR and CCDA resources are processed , and show up on the tables you define as well. If we detect the same data point eg: “Latest HbA1c” comes from different data sources and they say the same thing, we merge them all into a single cell for you to review with all the sources in one place.

Next steps

The Quickstart is the fastest way in. It walks you through your first connection and your first table, end to end, in about ten minutes.

If you’d rather understand the abstractions before building, the Concepts page covers them in depth.