New
0.18.31
What's Changed
- Feat: patch pdfminer and use rendermode to detect invisible text by @badGarnet in https://github.com/Unstructured-IO/unstructured/pull/4158
- fix: add EN DASH to UNICODE_BULLETS for clean_bullets by @MkDev11 in https://github.com/Unstructured-IO/unstructured/pull/4186
- fix: fix version number by @badGarnet in https://github.com/Unstructured-IO/unstructured/pull/4189
- enhancement: render pdfs with pdfium by @qued in https://github.com/Unstructured-IO/unstructured/pull/4185
- feat: consider rotated text as low fidelityfeat: consider rotated text by @badGarnet in https://github.com/Unstructured-IO/unstructured/pull/4190
- fix: address jaraco CVE by @qued in https://github.com/Unstructured-IO/unstructured/pull/4198
- fix: hange default for languages parameter from ["auto"] to None by @eureka928 in https://github.com/Unstructured-IO/unstructured/pull/4194
- ⚡️ Speed up function
_get_optimal_value_for_bboxby 2,883% by @aseembits93 in https://github.com/Unstructured-IO/unstructured/pull/4181 - ⚡️ Speed up method
_DocxPartitioner._style_based_element_typeby 593% by @aseembits93 in https://github.com/Unstructured-IO/unstructured/pull/4179 - Luke/update dockerfile by @luke-kucing in https://github.com/Unstructured-IO/unstructured/pull/4192
- fix: reduce default dpi to 350 by @qued in https://github.com/Unstructured-IO/unstructured/pull/4199
- fix(deps): switch from pip-compile to uv pip compile by @lawrence-u10d in https://github.com/Unstructured-IO/unstructured/pull/4202
- fix: remove sandbox=True from pypandoc to fix ODT conversion by @MkDev11 in https://github.com/Unstructured-IO/unstructured/pull/4193
- Token-Based Chunking Support by @eureka928 in https://github.com/Unstructured-IO/unstructured/pull/4203
- fix: filter coordinates kwargs to prevent TypeError in hi_res PDF processing by @MkDev11 in https://github.com/Unstructured-IO/unstructured/pull/4206
- fix(deps): Update docker.elastic.co/elasticsearch/elasticsearch Docker tag to v8.19.10 by @utic-renovate[bot] in https://github.com/Unstructured-IO/unstructured/pull/4133
- fix(deps): Update opensearchproject/opensearch Docker tag to v2.19.4 by @utic-renovate[bot] in https://github.com/Unstructured-IO/unstructured/pull/4134
- fix(deps): Update semitechnologies/weaviate Docker tag to v1.35.3 by @utic-renovate[bot] in https://github.com/Unstructured-IO/unstructured/pull/4135
- fix: Preserve Line Breaks in Code Blocks During Chunking by @eureka928 in https://github.com/Unstructured-IO/unstructured/pull/4196
- chorse sep bump to resolve open CVEs by @luke-kucing in https://github.com/Unstructured-IO/unstructured/pull/4205
New Contributors
- @MkDev11 made their first contribution in https://github.com/Unstructured-IO/unstructured/pull/4186
- @eureka928 made their first contribution in https://github.com/Unstructured-IO/unstructured/pull/4194
Full Changelog: https://github.com/Unstructured-IO/unstructured/compare/0.18.28...0.18.31