Smart PDF Summarizer using AI, FastAPI, React, and Docker
Introduction
The Smart
PDF Summarizer is an AI-based web application that automatically reads PDF
documents and generates short, meaningful summaries. It uses Artificial
Intelligence and Natural Language Processing (NLP) models to understand the
content and produce concise versions of lengthy documents. The system has two
main parts — a backend built using FastAPI, which extracts text from PDFs and
performs summarization, and a frontend built using React, which allows users to
upload PDF files and view results. The application is fully containerized using
Docker, which ensures that it can run anywhere without installation issues.
Over three parts, the same application was developed and improved step by step
— starting with a single container, then splitting into multiple containers
with Docker Compose, and finally deploying the containers to Docker Hub with
proper documentation.
Objectives
Part 1:
The objective of Part 1 was to build the full application — both backend and
frontend — and run it inside a single Docker container. The FastAPI backend
handled PDF reading and summarization using BART and DistilBART AI models,
while the React frontend allowed users to upload files and see summarized
results. The goal was to achieve full functionality and ensure that the
summarizer worked correctly inside a single Docker image.
Part 2:
In Part 2, the main objective was to improve the architecture by splitting the
system into two separate containers — one for the backend and one for the
frontend. Docker Compose was used to run both containers together and manage
their communication. Health checks and restart policies were added for
stability, and the setup was optimized for better performance and resource
management.
Part 3:
In Part 3, no new coding was done. The goal was to finalize the system, push
both Docker containers (backend and frontend) to Docker Hub, and prepare the
complete documentation. This part focused on deployment, validation, and
presentation of the final product during the Docker Showdown event.
Name of the Containers Involved and the Download Links
- Frontend Container – React UI
- Purpose:
Provides a web-based user interface where users can upload PDF files and
view summarized results.
- Base
Image Used: node:18-alpine (lightweight and
optimized for web builds).
- Key
Features: Built using React, Axios for API calls, and styled with CSS.
Communicates with the FastAPI backend.
- Download Link (Docker Hub):https://hub.docker.com/r/ha18/pdf-summarizer-frontend/tags
- Backend
Container – FastAPI Server
- Purpose:
Handles the core logic — receives the PDF file, extracts text, performs
summarization using AI models, and returns results to the frontend.
- Base
Image Used: python:3.10-slim (lightweight Python
runtime).
- Key
Features: Uses FastAPI for API handling, PyPDF2 and pdfplumber for
reading PDFs, and HuggingFace Transformers for summarization.
- Download Link (Docker Hub):https://hub.docker.com/r/ha18/pdf-summarizer-backend/tags
Name of the Other Software Involved Along with the Purpose
|
Software / Tool |
Purpose / Usage |
|
FastAPI |
Backend framework
for building REST APIs quickly and efficiently. |
|
React.js |
Frontend JavaScript
library used for creating dynamic user interfaces. |
|
Transformers
(HuggingFace) |
AI library that
provides pre-trained models (BART/DistilBART) for text summarization. |
|
PyPDF2 / pdfplumber |
Used in the backend
to extract readable text content from uploaded PDF files. |
|
Uvicorn |
ASGI server that
runs the FastAPI backend efficiently. |
|
Axios |
Frontend library
used in React to send HTTP requests to the backend API. |
|
Docker |
Containerization
platform that packages the application and its dependencies. |
|
Docker Compose |
Tool to run
multi-container applications together |
|
VS Code |
Integrated
Development Environment (IDE) used for coding and debugging. |
|
Node.js and npm |
Runtime and package
manager for building and managing the React frontend. |
|
Python 3.10 |
Programming language
used to implement the backend logic. |
Overall Architecture of All Three Parts
The overall architecture of the Smart PDF
Summarizer project is designed in three progressive parts (Part 1, Part 2, and Part
3), where each part builds upon the previous one to achieve a fully functional,
containerized AI-based application. The complete system consists of two main
modules — a FastAPI backend for processing and summarization, and a React
frontend for user interaction — both running smoothly using Docker containers.
Part 1- Single Container Application
(Frontend + Backend Combined)
Objectives:
- Develop the complete backend using FastAPI to handle PDF upload,
text extraction, and summarization.
- Create the React frontend for user interface and integrate it with
the backend through API endpoints.
- Combine both frontend and backend inside a single Docker container
for initial deployment as given in fig 1.
- Test and verify the summarization accuracy and response time.
Containers and Tools Used:
- Container: smart-pdf-app
(combined container for backend and frontend).
- Software: FastAPI, React, PyPDF2, pdfplumber, HuggingFace
Transformers (BART/DistilBART), Python, Node.js, Docker.
Input / Output:
- Input: PDF file or text entered by the
user.
- Output: summary displayed on
the webpage.
Fig 1 - Single Container Application
Part 2 –
Multi-Container Setup with Docker Compose and Health Checks
Objectives:
- Split the single container into two independent containers — one
for the FastAPI backend and one for the React frontend.
- Implement Docker Compose to manage multi-container orchestration as given in fig 2.
- Add health checks to ensure each container runs properly and
restarts automatically if any issue occurs.
- Optimize memory usage and inter-container communication.
Containers and Tools Used:
- Container 1: pdf-summarizer-backend
(FastAPI service)
- Container 2: pdf-summarizer-frontend
(React web app)
- Software: Docker Compose, FastAPI, React, Axios, pdfplumber,
PyPDF2, Transformers, Uvicorn, Python 3.10, Node.js 18.
Input / Output:
- Input: PDF uploaded via frontend → Sent to backend API.
- Output: Summarized text returned from backend → Displayed in the
UI.
Fig 2 - Multi-Container Setup with Docker Compose
Part 3 –
DockerHub Deployment and Production Optimization
Objectives:
- Upload both containers (frontend and
backend) to DockerHub for easy distribution and testing on any system as given in fig 3.
- Ensure optimized Dockerfiles with smaller
base images (python:3.10-slim and node:18-alpine).
- Configure health checks, restart policies,
and resource limits (2GB RAM usage).
- Demonstrate smooth multi-container
deployment using a single docker-compose up
command.
Containers and Tools Used:
- Container 1: pdf-summarizer-backend → https://hub.docker.com/r/ha18/pdf-summarizer-backend/tags
- Container 2: pdf-summarizer-frontend → https://hub.docker.com/r/ha18/pdf-summarizer-frontend
- Software: DockerHub, Docker Compose, FastAPI, React, Transformers,
PyPDF2, pdfplumber, HuggingFace models, Uvicorn, Node.js, Python, GitHub
for code management.
Input / Output:
- Input: PDF file uploaded through the web interface.
- Output: summary processed by backend and displayed in
the frontend within seconds.
Fig 3 - DockerHub Deployment and Optimization
Overall
System Flow
Input:
User uploads a PDF file or enters text manually through the React web app.
Processing:
The FastAPI backend extracts text using PyPDF2/pdfplumber → Cleans and segments
long content → Sends text to the BART/DistilBART model for summarization →
Returns the summary result.
Output:
The React frontend displays the original and summarized text with word counts
and percentage reduction.
Fig 4 - System Flow
Architecture Description
The frontend, built with React.js, provides an intuitive and user-friendly interface that allows users to upload PDF files or type text directly. It communicates with the backend through Axios API calls and dynamically displays results such as the original and summarized text lengths. Both services are containerized using Docker, and Docker Compose manages their orchestration, networking, and health checks. The entire system runs reliably within 2GB RAM and can summarize 10MB PDF files in under 30 seconds. The containers are deployed on DockerHub for easy access and portability, ensuring the application runs consistently on any machine without dependency conflicts. This architecture combines AI processing, web technology, and containerization to create a robust and scalable solution for intelligent document summarization.
Procedure
Part 1:
- Created the FastAPI backend for PDF reading and AI summarization.
- Developed the React frontend for file upload and summary display.
- Combined both frontend and backend into a single Dockerfile.
- Built and ran the image using docker build and docker
run.
- Verified that PDFs were successfully summarized inside the
container.
Part 2:
- Created separate Dockerfiles for the backend and frontend.
- Wrote a docker-compose.yml file to manage both containers together.
- Mapped ports (8000 for backend, 3000 for frontend).
- Added health checks and restart policies in Docker Compose.
- Tested the system by uploading PDF files and checking the
summarized results.
Part 3:
- Logged in to Docker Hub using docker login.
- Tagged both container images properly using docker
tag.
- Pushed them to Docker Hub using docker push.
- Pulled the images on another system to verify that they work
correctly.
Modifications Done After Downloading Containers
After downloading the base images, several
modifications were made:
- Installed additional Python libraries like transformers, torch, and pdfplumber.
- Preloaded the BART and DistilBART models to speed up summarization.
- Replaced large base images with smaller ones (python:3.10-slim, node:18-alpine).
- Added health checks and restart policies to ensure uptime.
- Optimized the container memory limit to 2 GB and reduced build
time.
Docker Hub Links
- Backend Docker Hub Image: https://hub.docker.com/r/ha18/pdf-summarizer-backend/tags
- Frontend Docker Hub Image: https://hub.docker.com/r/ha18/pdf-summarizer-frontend/tags
Outcomes
The Smart PDF Summarizer project resulted in the successful creation of a fully functional, AI-powered web application that can read, analyze, and summarize PDF documents automatically. The system achieved a text extraction accuracy of nearly 95% by using two different libraries — PyPDF2 and pdfplumber — ensuring it can handle a wide variety of PDF formats. The summarization process, powered by Facebook’s BART and DistilBART models, consistently produced summaries that were 60–80% shorter than the original text while preserving all the key information. The FastAPI backend and React frontend worked together seamlessly, providing users with a simple drag-and-drop interface and real-time feedback during processing. Through the use of Docker and Docker Compose, the entire application was containerized into multiple services with built-in health checks, restart policies, and optimized memory usage.
Conclusion
The Smart PDF Summarizer project successfully
integrates Artificial Intelligence, Web Development, and Containerization
technologies into a single, efficient application. It demonstrates how complex
AI tasks like text summarization can be made accessible through a simple web
interface, powered by a robust backend and optimized Docker setup. Over the
course of Part 1, Part 2, and Part 3, the application evolved from a
single-container prototype to a fully modular, multi-container system that can
be deployed easily using Docker Compose.
This project
highlights the importance of scalability, modular design, and performance
optimization in modern application development. It not only performs well in
terms of accuracy and speed but also ensures portability through DockerHub
deployment. Overall, it helped in gaining hands-on experience in FastAPI, React, Docker, and AI model integration,
bridging the gap between theory and real-world implementation, and showcasing
the practical power of cloud-ready, AI-driven applications.
References
Acknowledgement
I would like to express my heartfelt gratitude to
Mrs.
Subbulakshmi T., Faculty of Cloud Computing, School of Computer Science and
Engineering (SCOPE), VIT Chennai, for her invaluable guidance,
encouragement, and continuous support throughout the course of this project.
Her expert insights and mentorship were instrumental in helping me understand
and implement the key concepts effectively.
I would also like
to thank the SCOPE Department of VIT Chennai for providing the
necessary academic environment and resources that greatly contributed to the
successful completion of this project. Finally, I extend my sincere
appreciation to my parents and friends for their
unwavering support and motivation during the development and documentation of
this work.
Author - Hansa Leo Chemmanda
Comments
Post a Comment