LaTeX OCR Model

Reduced handwritten equation transcription time from 20 minutes to under 30 seconds with an 8.3% Character Error Rate.

TensorFlowPythonCNNLSTMCTC LossNLP

The Problem

Researchers and students manually transcribing handwritten mathematical equations into LaTeX spent 15–30 minutes per page, introducing errors and slowing publication workflows.

The Solution

Built a CNN + LSTM + CTC Loss deep learning pipeline. CNN extracts spatial features from equation images; LSTM models sequential token dependencies; CTC Loss handles variable-length output without explicit alignment. Custom StringLookup tokenizer and beam search decoding for accurate sequence generation.

The Impact

Achieved a Character Error Rate (CER) of 8.3% and Word Error Rate (WER) of 12.1% on benchmark datasets. Reduced transcription time from 20 minutes to under 30 seconds per equation.

Tech Details

CNN backbone for spatial feature extraction from equation image patches
Bidirectional LSTM layers for sequential token dependency modeling
CTC (Connectionist Temporal Classification) Loss — eliminates need for explicit input-output alignment
Custom StringLookup tokenizer mapping LaTeX tokens to integer indices
Beam search decoding (beam width = 5) for optimal sequence generation
Data augmentation: rotation, noise injection, contrast variation for training robustness
Evaluated on IM2LATEX-100K benchmark: CER 8.3%, WER 12.1%

Interested in working together?

I'm available for freelance projects and full-time roles.

Hire Me