How We Build Our Custom ERP Ai Chatbots
Using ERP AI chatbots, empower your employees to boost your sales, enhance customer satisfaction, and make smarter decisions in real time.
A major problem in health care is the amount of time spent on paperwork. Many facilities still rely on paper records and migrating them to electronic health records is not that simple.
Digitized paper records are full of complexities like bad handwriting, handwritten notes and markings, a variety of page layouts from forms to tables, image quality and noise issues, irrelevant text like footers, and more. Overcoming such problems requires complex processing pipelines that combine the latest techniques in large language models, natural language processing, and computer vision.
In this article we dive into our SOTA medical record summarization system we've deployed in production for a number of customers in both the medical and legal fields. Our pipeline is customizable at each step to give customers full control over what the final summaries look like. We'll focus on processing pages individually, but most of the systems we build use this pipeline as a step in a larger process to combine all medical history into a chronological summary.
We'll briefly explain some basics of medical records and their contents here with a focus on hospital inpatient health records.
A medical record is a record of a patient's single encounter with medical care in a medical office (doctor, ER, therapy etc).
In contrast, a health record is a comprehensive collection of all aspects of a patient's health & medical history over an extended period of time and across multiple care providers. Health care is broader than medical care, covering mental health, nutrition plans, health insurance, and other such aspects.
A health record typically contains many medical records, lab reports, and medical images.
In this article, we use the terms "health record," "medical record," and "medical report" interchangeably.
A typical medical record consists of different kinds of information:





Other important aspects are the media and formats in which health records are stored:
The primary use case of patient records in a healthcare setting is for clinical decisions and treatment plans by health care professionals. The records enable health care providers to render high-quality patient care because they contain the entire, objective history of patient data without having to rely on that information directly from patients. Medical record summaries builds on this by distilling lengthy charts into concise overviews of a patient’s record. These summaries highlight the most important details—such as recent hospitalizations, active problems, and current medications—so clinicians can quickly focus on what matters during time‑constrained visits.
Health records are also useful for administrative purposes like automated invoice generation for patients. Patients can also avoid billing fraud by running their patient records through third-party evaluations. Actionable summaries make it much less time consuming to review the details of a visit and any medication paid for.
These medical record summaries are sent by medical record departments to insurance companies when requested to justify a claim, and are easier to send than the entire chart. On a deeper level these summaries can be used to evaluate a persons medical history over time and help insurance professionals see a correlation between the current claim and previous pre-existing conditions.
These medical summaries are used heavily in mass tort litigation and personal injury demand letters. In these legal cases attorneys must extract clear narratives from patient medical records to support claims of injury based on causes in a police report with the claims backed by physicians.
Demand letter generation involves creating a full chronological medical narrative where this pipeline is used for each document and uses summaries very focused on key dates, events, and information from multiple parties. These use cases are where we build the bulk of our medical summarization use cases.
In the sections below, we'll walk through our custom medical record summarization pipeline we've built to help the above generate customized ai generated summaries with full control over the structure, format, and information used.
The illustration below shows our medical report summarization pipeline based on GPT-4 and other deep-learning models.

The pipeline consists of these components:
Each of these components uses different substeps and different machine learning algorithms to perform the task. This breaks the process into easy to understand segments that allow us to optimize specific pieces of the puzzle easily for customers.
The digitized health record is processed page by page in the initial stages. Related pages can be recombined later through page classification and combinator module. But first, the record is split into pages or chunks suited to the document's file format. PDF documents are easy to split into pages. For image formats like TIFF, we use object localization models to identify and locate the page boundaries.
Text extraction is done using custom optical character recognition (OCR) approaches focused on learning both text extraction and positional OCR information. Applying suitable preprocessing to the digitized pages of a health record improves the accuracy of text extraction in the next stage.
In this section, we cover some of the common preprocessing techniques used to improve text extraction.
The following image preprocessing operations are applied to the digitized health record images to improve character recognition for both printed and handwritten text:
This step is incredibly important as missing key details in the data points we need to use in the summaries will affect all downstream tasks.
Image preprocessing techniques to improve handwriting recognition are based on the paper, "Continuous Offline Handwriting Recognition Using Deep Learning Models". They include:


Text extraction identifies all the printed or handwritten characters on a digitized image, combines them into syntactic elements like words and punctuation marks, and returns them as text fragments like words, phrases, and sentences.
Character identification using just the image pixels can often be inaccurate if the image has text with poor handwriting (a widespread problem in the medical field), image defects, blurring, missing pixels, glare, and similar imaging artifacts.
State-of-the-art text extraction uses multi-modal language-vision models. They don't identify characters based on just the image pixels. Instead, they use a lot of additional information like:
All these additional criteria drastically improve the accuracy of the text extraction, including from the tough handwritten sections of a digitized health record.
Below, we explore some state-of-the-art text extraction approaches.

LayoutLMv3 is an OCR-based image-text model that's been trained for document layout tasks. Given a document image, it identifies layout elements like sections, field-value pairs, or tables and produces their bounding boxes and text as results.
The architecture is a pure transformer model with no convolutional elements. During training and fine-tuning, it must be supplied with the word embeddings, image patches, and position embeddings (from an off-the-shelf OCR package) of the training documents. Its multi-modal transformer model learns to associate patterns in the document text with their appropriate layout elements.
For fine-tuning, we supply a small dataset of annotated digitized health records. The pre-trained model adjusts its layer weights to activate for the layouts, layout elements, and text found in health records.

The document understanding transformer (Donut) is an alternative text extraction approach that is OCR-free. That means it does not use or generate information at the character level. Instead, it learns to directly generate text sequences from visual features without producing intermediate information like character labels and text bounding boxes.
Donut has a typical encoder-decoder transformer architecture:
For example, for the downstream task of document layout understanding, the decoder produces an output structured sequence like “<layout><section><heading>medications</heading><line><fragment>Aspirin</fragment> <fragment>10mg</fragment></line></section></layout>” as its output sequence.
Since Donut doesn't use OCR at all, this approach is faster and lighter with far fewer parameters than OCR-based models. It also reports high accuracy.
The text extraction must not only understand the regular text but also special marks like check marks and circles. In the example below, a doctor has selected "Y" as their choice but the text extraction model has ignored it.

In 2026 we now use a custom LLM to do this for our summaries in a set of steps that looks like:
- Identify which special mark or circle is applied.
- Identify the other options that were not selected to create a full scope of what the context for this document looks like.
- Tie it to specific specific relevant information so the final summarizer knows how to use our special mark recognition (so it can say "yes it was prescribed/no it was not prescribed).
I recommend still choosing an LLM that is fine-tunable so you can improve it over time.
The extracted text fragments may not be in an ideal state for medical processing use cases. For example, the text in signatures, page numbers, seals, logos, or letterheads just acts as noisy text that doesn't add any relevant information to health report summaries but may affect the quality of the summaries or extracted information fields.
Text preprocessing uses deep learning models to ignore such noisy text. One approach we use is fusing the approach of language model pretraining over clinical notes for our text extraction model to teach it to recognize the text patterns unique to health records.
Unlike the paper, our approach does not use long short-term memory models but instead adapts its sequential, hierarchical, and pretraining approaches to our multi-modal text extraction model. The approach involves fine-tuning the layout understanding model with additional embeddings for medical information like section labels and diagnosis codes.
Determining section labels for each page is an essential step for the accurate processing of digitized paper records and understanding complex patient information.
As we saw earlier, every section in a health record has a distinct structure and set of fields. Not all sections can be processed the same way. The processing heavily depends on the nature of the medical information in a section, its structure, and the goals of the health care professional doing the processing. For example:
The appropriate GPT prompts and models for each section and use case are also different. So, every page is labeled with appropriate section labels and additional goal-specific labels by a page classification model.
Some examples of labels are:
The classification models that label an input health record are implemented in one of two ways explained next.
GPT-5 is already trained on medical corpora and is capable of scoring high in medical examinations. As such, it's inherently capable of classifying each page of a health record based on that page's text contents. For labels that are simple and obvious, straightforward prompt instructions are sufficient; we don't even have to provide any examples as few-shot guidance.
For some use cases, we need special labels that zero-shot classification is unable to classify accurately. To handle them, we maintain a reference set of manually labeled sections and examine how similar an input record's section is to each section in that set. The reference sections that score high on content similarity with the input section are selected and their labels (manually set) are chosen as the labels for the input section.
Implementation-wise, we determine content similarity using vector similarity metrics like cosine similarity. The reference sections as well as the input sections are converted to embedding vectors using either OpenAI embeddings or Sentence-BERT. The reference embeddings are stored in a vector database like Pinecone and queried for vector similarity with an input section. The database returns the most similar reference sections and their labels.
In this stage, GPT-5 prompts are used to summarize the information on each page.
For some pages, this involves abstractive summarization of the clinical text on the page. GPT-5 rephrases that text to a shorter abstract without losing any critical details.
For other pages, GPT-5 is used for extractive summarization. Key information is extracted verbatim from a page's content.
We show some page examples and their respective prompts in the sections below.
The medications page of a sample health record annotated by the text extraction model is shown below:

We ask GPT-45 to summarize the medications page with this prompt: "Summarize the list of medications in this extract from a medications page of a health record."

GPT-5 generates the following summary:

We can see that the dosages in the summary are missing. This is because the text extraction pipeline we used here did not keep all the information on a line together though the extraction model provides the pixel coordinates to do so. So, this is really the pipeline's drawback rather than the extraction or summarization model's, and it can be easily fixed.
This focus notes page of a sample health record contains a lot of difficult-to-read handwritten text and has been annotated by the text extraction model:

Note that the text extraction model has misidentified words like "Pt." (for "Patient") as a meaningless "R t." This is because the model used here has not been fine-tuned on medical records.
We ask GPT-5 to summarize the focus notes page with this prompt: "Summarize the following extract from the focus notes of a health record:"

GPT-5 generates the following summary:

GPT-5 has done a great job of summarizing the focus notes here. Although the extracted text is not in the same order as the page layout, GPT-5 recombined and organized the information in a coherent and structured way by itself while ignoring the unnecessary details.
The crucial patient details form of a sample health record has been annotated by the text extraction model as follows:

Notice how it has accurately identified both printed and handwritten text.
We ask GPT-5 to summarize the patient details with this prompt: "Summarize the details in this patient details form from a health record:"

GPT-5 generates the following patient details summary:

Note that even in cases where the field name and field value are not together in the extracted text because of pipeline drawbacks, GPT-4 has intelligently correlated them:

Medical record number in the original record on the left. Its locations in the extracted text. GPT-4 has correctly correlated them again in the summary! (Original source: AHIMA)

The combinator module generates the final section-level summaries. For sections that span multiple pages, it combines their page summaries into a single coherent section summary. While doing so, it doesn't just squish multiple summaries together naively. Instead, it condenses their information by removing any duplicated details and generates a concise section summary that does not feel choppy.
The combinator is implemented as a custom fine-tuned transformer model with either GPT-4 or another language model like BERT as the base model. Fine-tuning enables us to generate high-quality final summaries. It also lets users tweak the size and quality of each summary because everyone has a different idea of what an ideal summary looks like for their specific use case.
Using medical report summarization as a use case, this article showed a typical report processing pipeline that uses incredible advances in large language models to streamline health care operations.
In addition to summarization, many other high-quality artificial intelligence (AI) and natural language processing solutions for health records are possible now, like question-answering chatbots and powerful search engines. Contact us to explore how you can streamline operations in your hospital, lab, or medical practice with modern AI technologies.