Unclaimed project

Are you a maintainer of nlu? Claim this project to take control of your public changelog and roadmap.

Changelog

nlu

1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.

JohnSnowLabs/nlu·

962140PythonNOASSERTION

bert-embeddingdependency-parsingentity-resolutionlanguage-detectionlemmatizernamed-entity-recognition+14

Last updated about 1 year ago

Back to changelog

NewJuly 13, 2024

PDF Deidentification, MPNet Classifier and Pipeline Tracer in NLU 5.4.0

We are excited to announce NLU 5.4.0 has been released! It comes with support for deidentifying PDFs leveraging a combination of OCR and Medical NLP models. Additionally you can leverage MPnet for sequence classifcation and Pipeline Tracer is now supported

Visual PDF Deidentifcation

Tutorial Notebook

Introducing our advanced healthcare deidentification model, effortlessly deployable with a single line of code. This powerful solution integrates state-of-the-art algorithms like ner_deid_subentity_augmented, ContextualParser, RegexMatcher, and TextMatcher, alongside a streamlined de-identification stage. It efficiently masks sensitive entities such as names, locations, and medical records, ensuring compliance and data security in medical texts. Utilizing OCR capabilities, it also redacts detected information before saving the processed file to the specified location.

More Python Projects

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

183.0k

Python

stable-diffusion-webui

Stable Diffusion web UI

162.1k

Python

transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

! wget https://github.com/JohnSnowLabs/nlu/raw/release/540/tests/datasets/ocr/deid/deid2.pdf  
! wget https://github.com/JohnSnowLabs/nlu/raw/release/540/tests/datasets/ocr/deid/download.pdf  
  
#provide the input and the output path  
input_path,output_path = ['download.pdf',' deid2.pdf'], ['download_deidentified.pdf',' deid2_deidentified.pdf']  
  
#predict and save the deidentified pdf's.  
dfs = model.predict(input_path, output_path=output_path)

pipe = nlp.load("en.explain_doc.clinical_oncology.pipeline")

pipe.getPossibleAssertions()
>>> ['Past', 'Family', 'Absent', 'Hypothetical', 'Possible', 'Present']

pipe.getPossibleEntities()
>>> ['Cycle_Number','Direction','Histological_Type', .... ]

pipe.getPossibleRelations()
>>> ['is_size_of', 'is_date_of', 'is_location_of', 'is_finding_of']

column_maps = pipe.createParserDictionary()  
column_maps.update({"document_identifier": "clinical_deidentification"})  
pipe = nlp.load("en.explain_doc.clinical_oncology.pipeline")
res = pipe.predict(data,parser_output=True, parser_config=column_maps)
pd.json_normalize(res['result'][0]["entities"])

pip install johnsnowlabs

nlu

PDF Deidentification, MPNet Classifier and Pipeline Tracer in NLU 5.4.0

Visual PDF Deidentifcation

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

PDF Deidentification, MPNet Classifier and Pipeline Tracer in NLU 5.4.0

Visual PDF Deidentifcation

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

MPNetForSequenceClassification

Pipeline Tracer

📖Additional NLU resources

Installation

yt-dlp