New
New OCR model; better OCR heuristics
- New OCR model that is better all-around, but particularly at math
- Improved OCR heuristics, will now prioritize accuracy over speed
- Drop
format_lines, sinceforce_ocris generally a bit more accurate, and less error-prone
What's Changed
- feat: allow option to keep tables split across pages by @zanussbaum in https://github.com/datalab-to/marker/pull/813
- New OCR Model by @tarun-menta in https://github.com/datalab-to/marker/pull/820
- Bump surya version by @VikParuchuri in https://github.com/datalab-to/marker/pull/821
Full Changelog: https://github.com/datalab-to/marker/compare/v1.8.2...v1.8.3