Width.ai

Our Customizable SOTA Ai Medical Record Summarization Pipeline for Legal & Medical Automation

Matt Payne | Patrick Hennis
·
December 2, 2025
Medical record summarization with LLMs

A major problem in health care is the amount of time spent on paperwork. Many facilities still rely on paper records and migrating them to electronic health records is not that simple.

Digitized paper records are full of complexities like bad handwriting, handwritten notes and markings, a variety of page layouts from forms to tables, image quality and noise issues, irrelevant text like footers, and more. Overcoming such problems requires complex processing pipelines that combine the latest techniques in large language models, natural language processing, and computer vision.

In this article we dive into our SOTA medical record summarization system we've deployed in production for a number of customers in both the medical and legal fields. Our pipeline is customizable at each step to give customers full control over what the final summaries look like. We'll focus on processing pages individually, but most of the systems we build use this pipeline as a step in a larger process to combine all medical history into a chronological summary.

What Are Medical Records In Our Use Case?

We'll briefly explain some basics of medical records and their contents here with a focus on hospital inpatient health records.

Health Records vs. Medical Records

A medical record is a record of a patient's single encounter with medical care in a medical office (doctor, ER, therapy etc).

In contrast, a health record is a comprehensive collection of all aspects of a patient's health & medical history over an extended period of time and across multiple care providers. Health care is broader than medical care, covering mental health, nutrition plans, health insurance, and other such aspects.

A health record typically contains many medical records, lab reports, and medical images.

In this article, we use the terms "health record," "medical record," and "medical report" interchangeably.

The Information Inside Medical Records

A typical medical record consists of different kinds of information:

  • Sections: Each section focuses on a particular area of healthcare, like medications, pathology, surgery, and so on. A health record will have multiple sections.
The medications section of a health record
The medications section of a health record (Source: AHIMA)
  • Records: These document specific aspects of a patient's medical care, like the administration of anesthesia or details of surgeries.
A medical record in a health record
A medical record in a health record (Source: AHIMA)
  • Assessments: These are also medical records with critical information of a first consultation or an initial assessment by a medical care provider.
  • Reports: These are typically lab tests and medical imaging reports.
A lab report in a health record
A lab report in a health record (Source: AHIMA)
  • Forms: A form consists of information laid out for patient input. Examples include biographical details and consent forms.
  • Flowsheets: Flowsheets track a patient's progress over time on aspects like vital signs, medications, and lab results.
A flowsheet in a health record
A flowsheet in a health record (Source: AHIMA)
  • Clinical text or clinical notes: Many sections and flowsheets contain medical opinions as free-form, unstructured text filled out by doctors or other care providers.
Sample clinical notes in a health record
Sample clinical notes in a health record (Source: AHIMA)

Electronic, Digitized, and Paper Health Record Formats

Other important aspects are the media and formats in which health records are stored:

  • Paper health records: They're stored on physical paper with printed and handwritten content.
  • Digitized health records: They're stored as digital scans of paper records in file formats, like portable document format (PDF), or image formats like the tag image file format (TIFF). Though these formats have a structure, the information in them is not stored as structured information that can be queried easily.
  • Electronic health records (EHR) and electronic medical records (EMR): These are stored on digital media as structured information in easily queryable databases. They're created and accessed using special EHR/EMR software.

Use Cases of Summarizing Medical Records and Data Analysis

Clinical care and decision making

The primary use case of patient records in a healthcare setting is for clinical decisions and treatment plans by health care professionals. The records enable health care providers to render high-quality patient care because they contain the entire, objective history of patient data without having to rely on that information directly from patients. Medical record summaries builds on this by distilling lengthy charts into concise overviews of a patient’s record. These summaries highlight the most important details—such as recent hospitalizations, active problems, and current medications—so clinicians can quickly focus on what matters during time‑constrained visits.

Administrative and billing workflows

Health records are also useful for administrative purposes like automated invoice generation for patients. Patients can also avoid billing fraud by running their patient records through third-party evaluations. Actionable summaries make it much less time consuming to review the details of a visit and any medication paid for.

Insurance claims and third-party evaluations of human error

These medical record summaries are sent by medical record departments to insurance companies when requested to justify a claim, and are easier to send than the entire chart. On a deeper level these summaries can be used to evaluate a persons medical history over time and help insurance professionals see a correlation between the current claim and previous pre-existing conditions.

Legal Use: Mass Torts and Personal Injury Demand Letters

These medical summaries are used heavily in mass tort litigation and personal injury demand letters. In  these legal cases attorneys must extract clear narratives from patient medical records to support claims of injury based on causes in a police report with the claims backed by physicians.

Demand letter generation involves creating a full chronological medical narrative where this pipeline is used for each document and uses summaries very focused on key dates, events, and information from multiple parties. These use cases are where we build the bulk of our medical summarization use cases.

In the sections below, we'll walk through our custom medical record summarization pipeline we've built to help the above generate customized ai generated summaries with full control over the structure, format, and information used.

Width.ai Medical Record Summarization Pipeline

The illustration below shows our medical report summarization pipeline based on GPT-4 and other deep-learning models.

Width.ai medical record summarization pipeline

The pipeline consists of these components:

  • Medical report splitter
  • Document preprocessing for text extraction
  • Text extraction
  • Text preprocessing
  • Page classification
  • LLM summarization module
  • Output medical summaries combinator

Each of these components uses different substeps and different machine learning algorithms to perform the task. This breaks the process into easy to understand segments that allow us to optimize specific pieces of the puzzle easily for customers.

Report Splitting

The digitized health record is processed page by page in the initial stages. Related pages can be recombined later through page classification and combinator module. But first, the record is split into pages or chunks suited to the document's file format. PDF documents are easy to split into pages. For image formats like TIFF, we use object localization models to identify and locate the page boundaries.

Document Preprocessing for Text Extraction

Text extraction is done using custom optical character recognition (OCR) approaches focused on learning both text extraction and positional OCR information. Applying suitable preprocessing to the digitized pages of a health record improves the accuracy of text extraction in the next stage.

In this section, we cover some of the common preprocessing techniques used to improve text extraction.

Preprocessing for Character Recognition

The following image preprocessing operations are applied to the digitized health record images to improve character recognition for both printed and handwritten text:

  • High-Resolution Image Synthesis with Latent Diffusion Models: High resolutions improve the accuracy of character recognition. If the scans are not high resolution (at least 1024p), we use a custom image enhancement integration to scale up the resolution. This research paper outlines Stable Diffusion based methods of image enhancement (Source)
  • Grayscale conversion: All images are converted from RGB colorspace to grayscale.
  • Noise removal: Noisy pixels are removed using hybrid contrast enhancement that combines local binarization with the preservation of the surrounding grayscale.

This step is incredibly important as missing key details in the data points we need to use in the summaries will affect all downstream tasks.

Preprocessing for Handwriting Recognition

Image preprocessing techniques to improve handwriting recognition are based on the paper, "Continuous Offline Handwriting Recognition Using Deep Learning Models". They include:

  • Slope and slant correction: Slope (the angle of the text from a horizontal baseline) and slant (the angle of some ascending and descending strokes from the vertical) are detected and corrected.
Slant and slope recognition in OCR
Source: Sueiras
  • Height normalization: The ascender and descender heights throughout the handwritten sections are identified and scaled to a standard ratio.
region recognition for height normalization
Source: Sueiras

Text Extraction and Layout Understanding

Text extraction identifies all the printed or handwritten characters on a digitized image, combines them into syntactic elements like words and punctuation marks, and returns them as text fragments like words, phrases, and sentences.

Character identification using just the image pixels can often be inaccurate if the image has text with poor handwriting (a widespread problem in the medical field), image defects, blurring, missing pixels, glare, and similar imaging artifacts.

State-of-the-art text extraction uses multi-modal language-vision models. They don't identify characters based on just the image pixels. Instead, they use a lot of additional information like:

  • The surrounding characters
  • The probable parent word, based on a vocabulary of words in a powerful language model trained on massive volumes of text
  • The semantic consistency of the parent word in its surrounding context of words based on the same language model

All these additional criteria drastically improve the accuracy of the text extraction, including from the tough handwritten sections of a digitized health record.

Below, we explore some state-of-the-art text extraction approaches.

LayoutLMv3 Model

LayoutLMv3 architecture
LayoutLMv3 architecture (Source: Huang et al.)

LayoutLMv3 is an OCR-based image-text model that's been trained for document layout tasks. Given a document image, it identifies layout elements like sections, field-value pairs, or tables and produces their bounding boxes and text as results.

The architecture is a pure transformer model with no convolutional elements. During training and fine-tuning, it must be supplied with the word embeddings, image patches, and position embeddings (from an off-the-shelf OCR package) of the training documents. Its multi-modal transformer model learns to associate patterns in the document text with their appropriate layout elements.

For fine-tuning, we supply a small dataset of annotated digitized health records. The pre-trained model adjusts its layer weights to activate for the layouts, layout elements, and text found in health records.

Document Understanding Transformer Model

OCR-free document understanding transformer
OCR-free document understanding transformer (Source: G. Kim et al.)

The document understanding transformer (Donut) is an alternative text extraction approach that is OCR-free. That means it does not use or generate information at the character level. Instead, it learns to directly generate text sequences from visual features without producing intermediate information like character labels and text bounding boxes.

Donut has a typical encoder-decoder transformer architecture:

  • The encoder is a language-image transformer block that recognizes the text in an image implicitly and generates embeddings for it.
  • The decoder is a transformer block that generates relevant structured text from the encoder’s embeddings.

For example, for the downstream task of document layout understanding, the decoder produces an output structured sequence like “<layout><section><heading>medications</heading><line><fragment>Aspirin</fragment> <fragment>10mg</fragment></line></section></layout>” as its output sequence.

Since Donut doesn't use OCR at all, this approach is faster and lighter with far fewer parameters than OCR-based models. It also reports high accuracy.

Text Extraction Challenges we tackle

The text extraction must not only understand the regular text but also special marks like check marks and circles. In the example below, a doctor has selected "Y" as their choice but the text extraction model has ignored it.

Special hand-drawn marks
Special hand-drawn marks (Original source: AHIMA)

In 2026 we now use a custom LLM to do this for our summaries in a set of steps that looks like:
- Identify which special mark or circle is applied.
- Identify the other options that were not selected to create a full scope of what the context for this document looks like.
- Tie it to specific specific relevant information so the final summarizer knows how to use our special mark recognition (so it can say "yes it was prescribed/no it was not prescribed).

I recommend still choosing an LLM that is fine-tunable so you can improve it over time.

Text Preprocessing for Medical Reports

The extracted text fragments may not be in an ideal state for medical processing use cases. For example, the text in signatures, page numbers, seals, logos, or letterheads just acts as noisy text that doesn't add any relevant information to health report summaries but may affect the quality of the summaries or extracted information fields.

Text preprocessing uses deep learning models to ignore such noisy text. One approach we use is fusing the approach of language model pretraining over clinical notes for our text extraction model to teach it to recognize the text patterns unique to health records.

Unlike the paper, our approach does not use long short-term memory models but instead adapts its sequential, hierarchical, and pretraining approaches to our multi-modal text extraction model. The approach involves fine-tuning the layout understanding model with additional embeddings for medical information like section labels and diagnosis codes.

Page Classification

Determining section labels for each page is an essential step for the accurate processing of digitized paper records and understanding complex patient information.

As we saw earlier, every section in a health record has a distinct structure and set of fields. Not all sections can be processed the same way. The processing heavily depends on the nature of the medical information in a section, its structure, and the goals of the health care professional doing the processing. For example:

  • For some sections, the clinical text requires extractive summarization. For others, abstractive summarization may suffice if it doesn't introduce any medical risks.
  • Forms and assessments may require named entity recognition.
  • Clinical images may be processed to generate Informative diagnoses as text.

The appropriate GPT prompts and models for each section and use case are also different. So, every page is labeled with appropriate section labels and additional goal-specific labels by a page classification model.

Some examples of labels are:

  • Section labels, like medications page or discharge summary
  • Form labels, like patient details and consent forms
  • Flowsheet labels, like nursing care or intravenous therapy flowsheets
  • International Classification of Diseases (ICD) codes

The classification models that label an input health record are implemented in one of two ways explained next.

1. Zero-Shot Classification With GPT-5

GPT-5 is already trained on medical corpora and is capable of scoring high in medical examinations. As such, it's inherently capable of classifying each page of a health record based on that page's text contents. For labels that are simple and obvious, straightforward prompt instructions are sufficient; we don't even have to provide any examples as few-shot guidance.

2. Classification Using Similarity Search

For some use cases, we need special labels that zero-shot classification is unable to classify accurately. To handle them, we maintain a reference set of manually labeled sections and examine how similar an input record's section is to each section in that set. The reference sections that score high on content similarity with the input section are selected and their labels (manually set) are chosen as the labels for the input section.

Implementation-wise, we determine content similarity using vector similarity metrics like cosine similarity. The reference sections as well as the input sections are converted to embedding vectors using either OpenAI embeddings or Sentence-BERT. The reference embeddings are stored in a vector database like Pinecone and queried for vector similarity with an input section. The database returns the most similar reference sections and their labels.

GPT-5 Summarization of Sections and Clinical Text

In this stage, GPT-5 prompts are used to summarize the information on each page.

For some pages, this involves abstractive summarization of the clinical text on the page. GPT-5 rephrases that text to a shorter abstract without losing any critical details.

For other pages, GPT-5 is used for extractive summarization. Key information is extracted verbatim from a page's content.

We show some page examples and their respective prompts in the sections below.

1. Medications Page

The medications page of a sample health record annotated by the text extraction model is shown below:

Text extraction from printed medications page
Text extraction from printed medications page (Original source: AHIMA)

Simple GPT-5 Summarization Prompt for Medications Page

We ask GPT-45 to summarize the medications page with this prompt: "Summarize the list of medications in this extract from a medications page of a health record."

Medications prompt (Source: ChatGPT)
Medications prompt (Source: ChatGPT)

GPT-5 Summary of Medications

GPT-5 generates the following summary:

Generated medications summary
Generated medications summary (Source: ChatGPT)

We can see that the dosages in the summary are missing. This is because the text extraction pipeline we used here did not keep all the information on a line together though the extraction model provides the pixel coordinates to do so. So, this is really the pipeline's drawback rather than the extraction or summarization model's, and it can be easily fixed.

2. Focus Notes Page

This focus notes page of a sample health record contains a lot of difficult-to-read handwritten text and has been annotated by the text extraction model:

Text extraction from focus notes page
Text extraction from focus notes page (Original source: AHIMA)

Note that the text extraction model has misidentified words like "Pt." (for "Patient") as a meaningless "R t." This is because the model used here has not been fine-tuned on medical records.

GPT-5 Summarization Prompt for Focus Notes

We ask GPT-5 to summarize the focus notes page with this prompt: "Summarize the following extract from the focus notes of a health record:"

Focus notes prompt
Focus notes prompt (Source: ChatGPT)

GPT-5 Summary of Focus Notes

GPT-5 generates the following summary:

Focus notes summary (Source: ChatGPT)
Focus notes summary (Source: ChatGPT)

GPT-5 has done a great job of summarizing the focus notes here. Although the extracted text is not in the same order as the page layout, GPT-5 recombined and organized the information in a coherent and structured way by itself while ignoring the unnecessary details.

3. Patient Details Form

The crucial patient details form of a sample health record has been annotated by the text extraction model as follows:

Text extraction from patient details form
Text extraction from patient details form (Original source: AHIMA)

Notice how it has accurately identified both printed and handwritten text.

GPT-5 Summarization Prompt for Patient Details

We ask GPT-5 to summarize the patient details with this prompt: "Summarize the details in this patient details form from a health record:"

Patient details prompt
Patient details prompt (Source: ChatGPT)

GPT-5 Summary for Patient Details

GPT-5 generates the following patient details summary:

Patient details summary (Source: ChatGPT)
Patient details summary (Source: ChatGPT)

Note that even in cases where the field name and field value are not together in the extracted text because of pipeline drawbacks, GPT-4 has intelligently correlated them:

Medical record number in the original record on the left. Its locations in the extracted text. GPT-4 has correctly correlated them again in the summary! (Original source: AHIMA)

LLM Based Summary Combinator Model

GPT-4 summary combinator model

The combinator module generates the final section-level summaries. For sections that span multiple pages, it combines their page summaries into a single coherent section summary. While doing so, it doesn't just squish multiple summaries together naively. Instead, it condenses their information by removing any duplicated details and generates a concise section summary that does not feel choppy.

The combinator is implemented as a custom fine-tuned transformer model with either GPT-4 or another language model like BERT as the base model. Fine-tuning enables us to generate high-quality final summaries. It also lets users tweak the size and quality of each summary because everyone has a different idea of what an ideal summary looks like for their specific use case.

Medical Report Summarization for Legal & Medical Use Cases

Using medical report summarization as a use case, this article showed a typical report processing pipeline that uses incredible advances in large language models to streamline health care operations.

In addition to summarization, many other high-quality artificial intelligence (AI) and natural language processing solutions for health records are possible now, like question-answering chatbots and powerful search engines. Contact us to explore how you can streamline operations in your hospital, lab, or medical practice with modern AI technologies.

References

  • Jorge Sueiras (2021). "Continuous Offline Handwriting Recognition using Deep Learning Models." arXiv:2112.13328 [cs.CV]. https://arxiv.org/abs/2112.13328
  • Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei (2022). "LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking." arXiv:2204.08387 [cs.CL]. https://arxiv.org/abs/2204.08387
  • Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park (2021). “OCR-free Document Understanding Transformer.” arXiv:2111.15664 [cs.LG]. https://arxiv.org/abs/2111.15664
  • Jonas Kemp, Alvin Rajkomar, Andrew M. Dai (2019). "Improved Hierarchical Patient Classification with Language Model Pretraining over Clinical Notes." arXiv:1909.03039 [cs.LG]. https://arxiv.org/abs/1909.03039