New
v5.0.0
What's Changed
Major New Features
- Significantly smaller file sizes
- 54% smaller file sizes for English, 73% smaller for Chinese (see #806 for details)
- This results in a ~50% decrease in runtime for first-time users (who do not yet have the data downloaded/cached)
- Significantly lower memory usage
- Worker memory utilization in the web benchmark is reduced from 311 MB to 164 MB (47% reduction)
- The lower memory footprint makes it feasible to use more workers, significantly improving performance for projects that utilize schedulers for parallel processing
- Compatible with iOS 17 (using default settings)
- iOS 17 broke compatibility with Tesseract.js v4--upgrading to v5 should resolve
- See discussion section below for details
- iOS 17 broke compatibility with Tesseract.js v4--upgrading to v5 should resolve
Breaking Changes Impacting Many Users
createWorkerarguments changed- Setting non-default language and OEM now happens in
createWorker- E.g.
createWorker("chi_sim", 1)
- E.g.
- Setting non-default language and OEM now happens in
worker.initializeandworker.loadLanguagefunctions now do nothing and can be deleted from code- Loading the language and initialization now occurs in
createWorker - Workers can be re-initialized with different settings using
worker.reinitialize
- Loading the language and initialization now occurs in
In other words, code should be modified from this:
const worker = await Tesseract.createWorker();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const ret = await worker.recognize(file);
To this:
const worker = await Tesseract.createWorker("eng");
const ret = await worker.recognize(file);
Breaking Changes Impacting Fewer Users
- Users who manually set
corePathwill need to update the contents of theircorePathdirectorycorePathshould point to a directory that contains all 4 of the files below from Tesseract.js-core v5:tesseract-core.wasm.jstesseract-core-simd.wasm.jstesseract-core-lstm.wasm.jstesseract-core-simd-lstm.wasm.js
- Tesseract.js will automatically select the correct version to use
worker.detectfunction disabled by default- Orientation + script detection is a function of the Legacy model only, which is no longer included by default
- To enable, set arguments
legacyCore: trueandlegacyLang: trueincreateWorkeroptions- E.g.
Tesseract.createWorker("eng", 1, {legacyCore: true, legacyLang: true});
- E.g.
- Language of progress logs standardized
- This should only impact users who parse status logs (e.g. to update a loading bar)
Non-Breaking Changes
- Language data loaded from
jsdelivrby default (rather than GitHub pages)- This should result in improved performance and uptime
- Separate "development" build (that produced
tesseract.dev.jsandworker.dev.jsremoved - Documentation and examples were modified to prevent new users from using
Tesseract.recognizeandTesseract.detect- Users who already use these functions are encouraged to modify their code to use
worker.recognizeandworker.detectinstead
- Users who already use these functions are encouraged to modify their code to use
Considering upgrading from v2 to v5? See #771 for a full guide for updating.
Full Changelog: https://github.com/naptha/tesseract.js/compare/v4.1.3...v5.0.0