LaTeX OCR Model
Reduced handwritten equation transcription time from 20 minutes to under 30 seconds with an 8.3% Character Error Rate.
The Problem
Researchers and students manually transcribing handwritten mathematical equations into LaTeX spent 15–30 minutes per page, introducing errors and slowing publication workflows.
The Solution
Built a CNN + LSTM + CTC Loss deep learning pipeline. CNN extracts spatial features from equation images; LSTM models sequential token dependencies; CTC Loss handles variable-length output without explicit alignment. Custom StringLookup tokenizer and beam search decoding for accurate sequence generation.
The Impact
Achieved a Character Error Rate (CER) of 8.3% and Word Error Rate (WER) of 12.1% on benchmark datasets. Reduced transcription time from 20 minutes to under 30 seconds per equation.
Tech Details
- CNN backbone for spatial feature extraction from equation image patches
- Bidirectional LSTM layers for sequential token dependency modeling
- CTC (Connectionist Temporal Classification) Loss — eliminates need for explicit input-output alignment
- Custom StringLookup tokenizer mapping LaTeX tokens to integer indices
- Beam search decoding (beam width = 5) for optimal sequence generation
- Data augmentation: rotation, noise injection, contrast variation for training robustness
- Evaluated on IM2LATEX-100K benchmark: CER 8.3%, WER 12.1%