v4.5.8 - Package updates and minor bug fixes
-
Update German UD POS tagger to UD 2.14 data
-
Add Austrian German month names to the German tokenizer: https://github.com/stanfordnlp/CoreNLP/pull/1454 Thank you @j3ernhard
-
Improve the constituency to dependency converter to remove quite a few validation errors. This includes adding the PTB Corrector as an earlier step when operating specifically on PTB data https://github.com/stanfordnlp/CoreNLP/pull/1445
-
SSurgeon feature to split one word into multiple words: https://github.com/stanfordnlp/CoreNLP/commit/13ede5a2656993e170c16f4b20d47b7cba8ccbd4
-
Unravel recursion in SemanticGraph - https://github.com/stanfordnlp/CoreNLP/commit/05804a35dfe8a0d013754ba19a8830b5232aa496 Fixes one server crash observed in https://github.com/stanfordnlp/CoreNLP/issues/1461
-
Package updates: update protobuf -> 3.25.5, javax -> 1.1.6 https://github.com/stanfordnlp/CoreNLP/issues/1465 Unfortunately updating Lucene to fix all dependency security issues will require dropping Java 8 support
-
Fix the server caching of tokenizer annotators to include segmenter properties as well. Avoids the server not respecting a request for a different segmentation model. https://github.com/stanfordnlp/CoreNLP/commit/6f6eb935855ec699b80100584cf2f1ffd43c2325