Instructor : Bassem Ben Hamed, Co-funder and Data Scientist at DataCamp, Professor at University of Sfax
Learning Objectives
By the end of this workshop, participants will be able to:
- Understand the fundamental concepts of Natural Language Processing (NLP).
- Discover preprocessing and text data representation techniques.
- Explore basic and advanced models (CNN, LSTM) in NLP.
- Extract automatically Named Entity Recognition (NER) from business texts.
- Generate automatic summaries from large documents.
Workshop Modalities
- Duration: 7h
- Number of participants: 20 to 30 people
- Location: Tunis, Alecso.
- Accessibility: In-person
- Language: French
Program
Introduction to NLP
- Definition, importance, and business use cases.
- Overview of practical applications: sentiment analysis, automatic summarization, etc.
Preprocessing and Text Data Representation
- Preparing Text Data
- Cleaning, tokenization, stop word removal, lemmatization.
- Representing text data with Bag of Words, TF-IDF.
- Introduction to embeddings: Word2Vec, GloVe.
Practical Work: Transforming a text corpus using different approaches: Bag of Words, TF-IDF, Word2Vec, GloVe. Comparison of vectors.
Textual Analysis and Classification Models
- Classify customer reviews or emails.
- Introduction to advanced models: CNN and LSTM for NLP.
- How CNNs work to capture patterns in text.
- How sequence modeling works to capture contextual dependencies.
Practical Work: Build a CNN/LSTM model, Implement sentiment analysis or customer review classification.
Advanced NLP Applications for Business
- Named Entity Recognition (NER)
- Identify important entities like amounts, names, or dates in business documents.
- Automatic Text Summarization
- Generate summaries of reports or strategic decisions.
- Introduction to Transformer-based Models BERT, GPT models for advanced NLP tasks.
Practical Work: Extract NER from business text. Produce a concise summary of a business document.
Teaching Resources
- Laptops with Internet access
- A projector, a whiteboard, or an interactive screen
- Digital materials for participants (slides, documentation, source code)