Smart PDF Summarizer using AI, FastAPI, React, and Docker

 

Introduction

The Smart PDF Summarizer is an AI-based web application that automatically reads PDF documents and generates short, meaningful summaries. It uses Artificial Intelligence and Natural Language Processing (NLP) models to understand the content and produce concise versions of lengthy documents. The system has two main parts — a backend built using FastAPI, which extracts text from PDFs and performs summarization, and a frontend built using React, which allows users to upload PDF files and view results. The application is fully containerized using Docker, which ensures that it can run anywhere without installation issues.
Over three parts, the same application was developed and improved step by step — starting with a single container, then splitting into multiple containers with Docker Compose, and finally deploying the containers to Docker Hub with proper documentation.

Objectives

Part 1:
The objective of Part 1 was to build the full application — both backend and frontend — and run it inside a single Docker container. The FastAPI backend handled PDF reading and summarization using BART and DistilBART AI models, while the React frontend allowed users to upload files and see summarized results. The goal was to achieve full functionality and ensure that the summarizer worked correctly inside a single Docker image.

Part 2:
In Part 2, the main objective was to improve the architecture by splitting the system into two separate containers — one for the backend and one for the frontend. Docker Compose was used to run both containers together and manage their communication. Health checks and restart policies were added for stability, and the setup was optimized for better performance and resource management.

Part 3:
In Part 3, no new coding was done. The goal was to finalize the system, push both Docker containers (backend and frontend) to Docker Hub, and prepare the complete documentation. This part focused on deployment, validation, and presentation of the final product during the Docker Showdown event.

Name of the Containers Involved and the Download Links

  1. Frontend Container – React UI
    • Purpose: Provides a web-based user interface where users can upload PDF files and view summarized results.
    • Base Image Used: node:18-alpine (lightweight and optimized for web builds).
    • Key Features: Built using React, Axios for API calls, and styled with CSS. Communicates with the FastAPI backend.
    • Download Link (Docker Hub):https://hub.docker.com/r/ha18/pdf-summarizer-frontend/tags
  1. Backend Container – FastAPI Server
    • Purpose: Handles the core logic — receives the PDF file, extracts text, performs summarization using AI models, and returns results to the frontend.
    • Base Image Used: python:3.10-slim (lightweight Python runtime).
    • Key Features: Uses FastAPI for API handling, PyPDF2 and pdfplumber for reading PDFs, and HuggingFace Transformers for summarization.
    • Download Link (Docker Hub):https://hub.docker.com/r/ha18/pdf-summarizer-backend/tags

Name of the Other Software Involved Along with the Purpose

Software / Tool 

Purpose / Usage 

FastAPI 

Backend framework for building REST APIs quickly and efficiently. 

React.js 

Frontend JavaScript library used for creating dynamic user interfaces. 

Transformers (HuggingFace) 

AI library that provides pre-trained models (BART/DistilBART) for text summarization. 

PyPDF2 / pdfplumber 

Used in the backend to extract readable text content from uploaded PDF files. 

Uvicorn 

ASGI server that runs the FastAPI backend efficiently. 

Axios 

Frontend library used in React to send HTTP requests to the backend API. 

Docker 

Containerization platform that packages the application and its dependencies. 

Docker Compose 

Tool to run multi-container applications together

VS Code 

Integrated Development Environment (IDE) used for coding and debugging. 

Node.js and npm 

Runtime and package manager for building and managing the React frontend. 

Python 3.10 

Programming language used to implement the backend logic. 

Overall Architecture of All Three Parts

The overall architecture of the Smart PDF Summarizer project is designed in three progressive parts (Part 1, Part 2, and Part 3), where each part builds upon the previous one to achieve a fully functional, containerized AI-based application. The complete system consists of two main modules — a FastAPI backend for processing and summarization, and a React frontend for user interaction — both running smoothly using Docker containers.

 Part 1- Single Container Application (Frontend + Backend Combined)

Objectives:

  • Develop the complete backend using FastAPI to handle PDF upload, text extraction, and summarization.
  • Create the React frontend for user interface and integrate it with the backend through API endpoints.
  • Combine both frontend and backend inside a single Docker container for initial deployment as given in fig 1.
  • Test and verify the summarization accuracy and response time.

Containers and Tools Used:

  • Container: smart-pdf-app (combined container for backend and frontend).
  • Software: FastAPI, React, PyPDF2, pdfplumber, HuggingFace Transformers (BART/DistilBART), Python, Node.js, Docker.

Input / Output:

  • Input: PDF file or text entered by the user.
  • Output: summary displayed on the webpage.

            Fig 1 - Single Container Application

Part 2 – Multi-Container Setup with Docker Compose and Health Checks

Objectives:

  • Split the single container into two independent containers — one for the FastAPI backend and one for the React frontend.
  • Implement Docker Compose to manage multi-container orchestration as given in fig 2.
  • Add health checks to ensure each container runs properly and restarts automatically if any issue occurs.
  • Optimize memory usage and inter-container communication.

Containers and Tools Used:

  • Container 1: pdf-summarizer-backend (FastAPI service)
  • Container 2: pdf-summarizer-frontend (React web app)
  • Software: Docker Compose, FastAPI, React, Axios, pdfplumber, PyPDF2, Transformers, Uvicorn, Python 3.10, Node.js 18.

Input / Output:

  • Input: PDF uploaded via frontend → Sent to backend API.
  • Output: Summarized text returned from backend → Displayed in the UI.

A diagram of a software system

.

             Fig 2 - Multi-Container Setup with Docker Compose

Part 3 – DockerHub Deployment and Production Optimization

Objectives:

  • Upload both containers (frontend and backend) to DockerHub for easy distribution and testing on any system as given in fig 3.
  • Ensure optimized Dockerfiles with smaller base images (python:3.10-slim and node:18-alpine).
  • Configure health checks, restart policies, and resource limits (2GB RAM usage).
  • Demonstrate smooth multi-container deployment using a single docker-compose up command.

Containers and Tools Used:

  • Container 1: pdf-summarizer-backend → https://hub.docker.com/r/ha18/pdf-summarizer-backend/tags
  • Container 2: pdf-summarizer-frontend → https://hub.docker.com/r/ha18/pdf-summarizer-frontend
  • Software: DockerHub, Docker Compose, FastAPI, React, Transformers, PyPDF2, pdfplumber, HuggingFace models, Uvicorn, Node.js, Python, GitHub for code management.

Input / Output:

  • Input: PDF file uploaded through the web interface.
  • Output: summary processed by backend and displayed in the frontend within seconds.

A diagram of a computer

.

            Fig 3 - DockerHub Deployment and Optimization

Overall System Flow

Input:
User uploads a PDF file or enters text manually through the React web app.

Processing:
The FastAPI backend extracts text using PyPDF2/pdfplumber → Cleans and segments long content → Sends text to the BART/DistilBART model for summarization → Returns the summary result.

Output:
The React frontend displays the original and summarized text with word counts and percentage reduction.

A diagram of a process

.

Fig 4 - System Flow

Architecture Description

The architecture of the Smart PDF Summarizer is designed as a modular, containerized system consisting of two major components — a FastAPI backend and a React frontend. The backend is responsible for handling the complete workflow of reading, processing, and summarizing PDF documents. It uses PyPDF2 and pdfplumber for text extraction, ensuring high accuracy across different file types. Once the text is extracted, it is processed using Facebook’s BART or DistilBART AI models from the HuggingFace Transformers library to generate concise summaries that retain all key information. The backend is lightweight, optimized with a Python 3.10-slim base image, and exposes REST API endpoints for communication with the frontend.

The frontend, built with React.js, provides an intuitive and user-friendly interface that allows users to upload PDF files or type text directly. It communicates with the backend through Axios API calls and dynamically displays results such as the original and summarized text lengths. Both services are containerized using Docker, and Docker Compose manages their orchestration, networking, and health checks. The entire system runs reliably within 2GB RAM and can summarize 10MB PDF files in under 30 seconds. The containers are deployed on DockerHub for easy access and portability, ensuring the application runs consistently on any machine without dependency conflicts. This architecture combines AI processing, web technology, and containerization to create a robust and scalable solution for intelligent document summarization.

Procedure

Part 1:

  1. Created the FastAPI backend for PDF reading and AI summarization.

A screen shot of a computer program

.

  1. Developed the React frontend for file upload and summary display.

A screen shot of a computer program

.

  1. Combined both frontend and backend into a single Dockerfile.

  1. Built and ran the image using docker build and docker run.
  2. Verified that PDFs were successfully summarized inside the container.

Part 2:

  1. Created separate Dockerfiles for the backend and frontend.

A screen shot of a computer program

.

A screenshot of a computer program

.

  1. Wrote a docker-compose.yml file to manage both containers together.
  2. Mapped ports (8000 for backend, 3000 for frontend).
  3. Added health checks and restart policies in Docker Compose.

Diagram

  1. Tested the system by uploading PDF files and checking the summarized results.

Part 3:

  1. Logged in to Docker Hub using docker login.

Diagram.

  1. Tagged both container images properly using docker tag.

  1. Pushed them to Docker Hub using docker push.

A computer screen with text on it

Diagram.

  1. Pulled the images on another system to verify that they work correctly.

A screenshot of a computer

Diagram.

Modifications Done After Downloading Containers

After downloading the base images, several modifications were made:

  1. Installed additional Python libraries like transformers, torch, and pdfplumber.
  2. Preloaded the BART and DistilBART models to speed up summarization.
  3. Replaced large base images with smaller ones (python:3.10-slim, node:18-alpine).
  4. Added health checks and restart policies to ensure uptime.
  5. Optimized the container memory limit to 2 GB and reduced build time.

Docker Hub Links

  • Backend Docker Hub Image: https://hub.docker.com/r/ha18/pdf-summarizer-backend/tags
  • Frontend Docker Hub Image: https://hub.docker.com/r/ha18/pdf-summarizer-frontend/tags

Outcomes

The Smart PDF Summarizer project resulted in the successful creation of a fully functional, AI-powered web application that can read, analyze, and summarize PDF documents automatically. The system achieved a text extraction accuracy of nearly 95% by using two different libraries — PyPDF2 and pdfplumber — ensuring it can handle a wide variety of PDF formats. The summarization process, powered by Facebook’s BART and DistilBART models, consistently produced summaries that were 60–80% shorter than the original text while preserving all the key information. The FastAPI backend and React frontend worked together seamlessly, providing users with a simple drag-and-drop interface and real-time feedback during processing. Through the use of Docker and Docker Compose, the entire application was containerized into multiple services with built-in health checks, restart policies, and optimized memory usage.

The Docker setup was fine-tuned to run efficiently within 2GB RAM and could process large files of up to 10MB in under 30 seconds. Finally, both the frontend and backend containers were successfully published on DockerHub, enabling anyone to deploy and use the system easily on any machine. Overall, the project demonstrated effective implementation of artificial intelligence, full-stack web development, and containerization in a single integrated application.

Conclusion

The Smart PDF Summarizer project successfully integrates Artificial Intelligence, Web Development, and Containerization technologies into a single, efficient application. It demonstrates how complex AI tasks like text summarization can be made accessible through a simple web interface, powered by a robust backend and optimized Docker setup. Over the course of Part 1, Part 2, and Part 3, the application evolved from a single-container prototype to a fully modular, multi-container system that can be deployed easily using Docker Compose.

This project highlights the importance of scalability, modular design, and performance optimization in modern application development. It not only performs well in terms of accuracy and speed but also ensures portability through DockerHub deployment. Overall, it helped in gaining hands-on experience in FastAPI, React, Docker, and AI model integration, bridging the gap between theory and real-world implementation, and showcasing the practical power of cloud-ready, AI-driven applications.

References

      1.     Facebook Research – Original creators of the BART and DistilBART models, accessed via HuggingFace Transformers.
2.     PyPDF2 and pdfplumber – Open-source Python libraries used for PDF text extraction.
3.     IIT Bombay Docker Tutorial – https://spoken-tutorial.org/tutorial-search/?search_foss=Docker&search_language=English
4.     Docker Hub – Used for hosting and distributing the container images of the Smart PDF Summarizer (frontend and backend).

Acknowledgement

I would like to express my heartfelt gratitude to Mrs. Subbulakshmi T., Faculty of Cloud Computing, School of Computer Science and Engineering (SCOPE), VIT Chennai, for her invaluable guidance, encouragement, and continuous support throughout the course of this project. Her expert insights and mentorship were instrumental in helping me understand and implement the key concepts effectively.

I would also like to thank the SCOPE Department of VIT Chennai for providing the necessary academic environment and resources that greatly contributed to the successful completion of this project. Finally, I extend my sincere appreciation to my parents and friends for their unwavering support and motivation during the development and documentation of this work.

Author - Hansa Leo Chemmanda

Comments