Open AI Completion and Word Embeddings, Visual Cocument Dlassifcation, Bart and XLM-RoBerta Zero-Shot-Classification and more in John Snow Labs NLU 5.3.0

We are very excited to announce NLU 5.3.0 has been released! It features support for Open AI's Completion and Word Embeddings, alongside visual document classification, Bart and XLM RoBerta for Zero Shot Classification. --- ## Open AI Completion [Tutorial Notebook](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/component_examples/sequence2sequence/OpenAI_completion.ipynb) **OpenAICompletion** combines powers of OpenAI’s completion models with the robust NLP processing capabilities of Spark NLP. This integration not only ensures the utilization of OpenAI's capabilities but also capitalizes on Spark's inherent scalability advantages. This annotator makes direct API calls to OpenAI’s Completion endpoint right from datasets. This enhancement promises to elevate the efficiency and versatility of data processing workflows within Spark NLP pipelines. Powered by [OpenAICompletion](https://sparknlp.org/docs/en/transformers#openaicompletion) Reference: [OpenAI API Doc](https://platform.openai.com/docs/api-reference/completions/create) Reference: [OpenAICompletion Doc](https://sparknlp.org/api/python/reference/autosummary/sparknlp/annotator/openai/openai_completion/index.html#sparknlp.annotator.openai.openai_completion.OpenAICompletion) | nlu.load() reference | Spark NLP Model reference | | -------------------- | ------------------------------------------------------------------------------ | | openai.completion | [OpenAICompletion](https://sparknlp.org/docs/en/transformers#openaicompletion) | ---- ## Open AI Embeddings [Tutorial Notebook](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/component_examples/sentence_embeddings/NLU_OpenAI_embeddings.ipynb) **OpenAIEmbeddings** combines powers of OpenAI’s embeddings model with the robust NLP processing capabilities of Spark NLP. This integration not only ensures the utilization of OpenAI's capabilities but also capitalizes on Spark's inherent scalability advantages. This annotator makes direct API calls to OpenAI’s Embeddings endpoint right from datasets. This enhancement promises to elevate the efficiency and versatility of data processing workflows within Spark NLP pipelines. Powered by [OpenAIEmbeddings](https://sparknlp.org/api/python/reference/autosummary/sparknlp/annotator/openai/openai_embeddings/index.html) | nlu.load() reference| Spark NLP Model reference | |---------------------------------|-----------------------------------------------------------------------------------------------| | openai.embeddings | [OpenAIEmbeddings](https://sparknlp.org/api/python/reference/autosummary/sparknlp/annotator/openai/openai_embeddings/index.html) | ---- ## Visual Document Classifier [Tutorial Notebook](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/ocr/ocr_visual_document_classifier.ipynb) The **VisualDocumentClassifier** is a DL model for document classification using text and layout data. The currently available pre-trained model on the Tobacco3482 dataset contains 3482 images belonging to 10 different classes (Resume, News, Note, Advertisement, Scientific, Report, Form, Letter, Email and Memo) Powered By [VisualDocumentClassifier](https://nlp.johnsnowlabs.com/docs/en/ocr_visual_document_understanding) | Language | nlu.load() reference | Spark NLP Model reference | | -------- | ------------------------- | -------------------------------------- | | xx | en.classify_image.tabacco | visual_document_classifier_tobacco3482 | --- ## Bart for Zero Shot Classificaiton [Tutorial Notebook](https://colab.research.google.com/https://github.com/JohnSnowLabs/nlu/tree/master/examples/colab/component_examples/classifiers/Bart_Zero_Shot_Classifiers.ipynb) BartForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks. The equivalent of BartForSequenceClassification models, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it is much more flexible. We used TFBartForSequenceClassification to train this model and used BartForZeroShotClassification annotator in Spark NLP 🚀 for prediction at scale Powered by [BartForZeroShotClassification](https://sparknlp.org/docs/en/transformers#bartforzeroshotclassification) | Language | nlu.load() reference | Spark NLP Model reference | | -------- | ---------------------------- | -------------------------------------------------------------------------------------------------------------------- | | English | en.bart.zero_shot_classifier | [bart_large_zero_shot_classifier_mnli](https://sparknlp.org/2023/08/07/bart_large_zero_shot_classifier_mnli_en.html) | ---- ## XLM RoBerta For Zero Shot Classification [Tutorial Notebook](https://colab.research.google.com/https://github.com/JohnSnowLabs/nlu/tree/master/examples/colab/component_examples/classifiers/XlmRoberta_Zero_Shot_Classifier.ipynb) XlmRoBertaForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks. Equivalent of XlmRoBertaForSequenceClassification models, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it is much more flexible. We used TFXLMRobertaForSequenceClassification to train this model and used XlmRoBertaForZeroShotClassification annotator in Spark NLP 🚀 for prediction at scale! Powered by [XlmRoBertaForZeroShotClassification](https://sparknlp.org/2023/07/20/xlm_roberta_large_zero_shot_classifier_xnli_anli_xx.html) | Language | nlu.load() reference | Spark NLP Model reference | | -------- | ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | | xx | xx.xlm_roberta.zero_shot_classifier | [xlm_roberta_large_zero_shot_classifier_xnli_anli](https://sparknlp.org/2023/08/07/bart_large_zero_shot_classifier_mnli_en.html) | ---- ## Bugfixes - Fix bug loading Albert for Question Answering Models - Fix bug for predicting on imagefiles in Databricks ---------------- :book: Additional NLU resources ---------------- * [140+ NLU Tutorials](https://nlp.johnsnowlabs.com/docs/en/jsl/notebooks) * [Streamlit visualizations docs](https://nlp.johnsnowlabs.com/docs/en/jsl/streamlit_viz_examples) * The complete list of all 20000+ models & pipelines in 300+ languages is available on [Models Hub](https://nlp.johnsnowlabs.com/models) * [Spark NLP publications](https://medium.com/spark-nlp) * [NLU documentation](https://nlp.johnsnowlabs.com/docs/en/jsl/install) * [Discussions](https://github.com/JohnSnowLabs/spark-nlp/discussions) Engage with other community members, share ideas, and show off how you use Spark NLP and NLU! --------------- Installation --------------- ```shell #PyPI pip install nlu pyspark ```

nlu

Open AI Completion

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

yt-dlp

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

yt-dlp

Open AI Completion and Word Embeddings, Visual Cocument Dlassifcation, Bart and XLM-RoBerta Zero-Shot-Classification and more in John Snow Labs NLU 5.3.0

Open AI Completion

Open AI Embeddings

Visual Document Classifier

Bart for Zero Shot Classificaiton

XLM RoBerta For Zero Shot Classification

Bugfixes

:book: Additional NLU resources

Installation