Instructor : Bassem Ben Hamed, Co-funder and Data Scientist at DataCamp, Professor at University of Sfax     

Learning Objectives

By the end of this workshop, participants will be able to:

  • Understand the fundamental concepts of Natural Language Processing (NLP).
  • Discover preprocessing and text data representation techniques.
  • Explore basic and advanced models (CNN, LSTM) in NLP.
  • Extract automatically Named Entity Recognition (NER) from business texts.
  • Generate automatic summaries from large documents.

Workshop Modalities

  • Duration: 7h
  • Number of participants: 20 to 30 people
  •  Location: Tunis, Alecso.
  • Accessibility: In-person
  • Language: French

Program

Introduction to NLP

  • Definition, importance, and business use cases.
  • Overview of practical applications: sentiment analysis, automatic summarization, etc.

Preprocessing and Text Data Representation

  • Preparing Text Data
  • Cleaning, tokenization, stop word removal, lemmatization.
  • Representing text data with Bag of Words, TF-IDF.
  • Introduction to embeddings: Word2Vec, GloVe.

Practical Work: Transforming a text corpus using different approaches: Bag of Words, TF-IDF, Word2Vec, GloVe. Comparison of vectors.

Textual Analysis and Classification Models

  • Classify customer reviews or emails.
  • Introduction to advanced models: CNN and LSTM for NLP.
  • How CNNs work to capture patterns in text.
  • How sequence modeling works to capture contextual dependencies.

Practical Work: Build a CNN/LSTM model, Implement sentiment analysis or customer review classification.

Advanced NLP Applications for Business

  • Named Entity Recognition (NER)
  • Identify important entities like amounts, names, or dates in business documents.
  • Automatic Text Summarization
  • Generate summaries of reports or strategic decisions.
  • Introduction to Transformer-based Models BERT, GPT models for advanced NLP tasks.

Practical Work: Extract NER from business text. Produce a concise summary of a business document.

Teaching Resources

  • Laptops with Internet access
  • A projector, a whiteboard, or an interactive screen
  • Digital materials for participants (slides, documentation, source code)