6.2.3
π’ Spark NLP 6.2.3: Further Improvements for NerDL
Spark NLP 6.2.3 introduces targeted improvements to training performance and stability of NerDLApproach and bug fixes for CamemBertForTokenClassification.
NerDLApproach now uses new internal data-loading behavior, and improving training speed and preventing out-of-memory errors.
π₯ Highlights
Enhanced NerDLApproach training performance through threaded data loading and optimized partitioning.
π New Features & Enhancements
NerDLApproach Training Optimizations
Significant performance improvements for training of NerDLApproach:
Threaded Data Loading: When enabling the memory optimizer (setEnableMemoryOptimizer(true)), data can now be pre-fetched through a threaded data loader. By default, it is disabled but can be tuned by using:
.setPrefetchBatches(int)
By tuning this parameter (for example 20 batches), you can get training time reductions of about 10%.
Optimized Partitioning Strategy: NerDLApproach now applies optimized dataframe partitioning when using the memory optimizer (setEnableMemoryOptimizer(true)) by default, improving parallelization efficiency during training and preventing out-of-memory errors.
For manual tuning of the input data frames, this behavior can be disabled with:
.setOptimizePartitioning(false)
π Bug Fixes
- CamemBertForTokenClassification: Fixed an issue with expected input types during inference.
β€οΈ Community Support
- Slack - real-time discussion with the Spark NLP community and team
- GitHub - issue tracking, feature requests, and contributions
- Discussions - community ideas and showcases
- Medium - latest Spark NLP articles and tutorials
- YouTube - educational videos and demos
π» Installation
Python
pip install spark-nlp==6.2.3
Spark Packages
CPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.2.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.2.3
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.2.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.2.3
Apple Silicon
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.2.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.2.3
AArch64
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.2.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.2.3
Maven
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>6.2.3</version>
</dependency>
FAT JARs
- CPU: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-6.2.3.jar
- GPU: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-6.2.3.jar
- Apple Silicon: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-6.2.3.jar
- AArch64: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-6.2.3.jar
What's Changed
- https://github.com/JohnSnowLabs/spark-nlp/pull/14701 by @ahmedlone127
- https://github.com/JohnSnowLabs/spark-nlp/pull/14699 by @DevinTDHa
Full Changelog: https://github.com/JohnSnowLabs/spark-nlp/compare/6.2.2...6.2.3