First time to NLP huh ?

Natural Language Processing (NLP) is a major research field of AI and to almost developers, it sounds like a miracle. Lately I have an interest in this field since the noticeable viral news of GPT-3 model. I decided to learn to make use of it as a tool before somehow it will replace developer job in the future as many predictions from many illustrious figures. But the more I study about it, the more nothing I know. There are too many background knowledge to know before understanding each word on the GPT-3 paper. Below is a quick summary about works behind the scene that hopefully useful to developers like me who wants make a leap to catch up with the AI progress.

List of keywords

It is inevitable long and exhausting journey to make sure we can understand fairly basic about below terms:

  • Convolutional Neuron Network, Recurrent Neuron Network, Activation Function, Loss Function, Back Propagation, Feed Forward.
  • Word Embedding, Contextual Word Embedding, Positional Encoding.
  • Long – Short Term Memory (LSTM).
  • Attention Mechanism.
  • Encoder – Decoder Architecture.
  • Language Model.
  • Transformer Architecture.
  • Pre-trained Model, Masked Language Modeling, Next Sentence Prediction.
  • Zero-shot learning. One-shot learning, Few-shot learning.
  • Knowledge Graph.
  • BERT, GPT, BART, T5

What exists before BERT and GPT ?

There was a lot of researches and works existed in NLP field. Work on NLP field means to solve below common Tasks:

  • Tagging Part of Speech.
  • Recognising Named Entities.
  • Sentiment Classification.
  • Question & Answering.
  • Text Generation.
  • Machine Translation.
  • Summarization.
  • Similarity Matching.

SpaCy and NLTK is two most famous libraries in NLP field that provide tools, frameworks and models solving a few Tasks above, but not everything. Each Task usually had its own model and there is no reusing or transferring between models, until the Transformer Architecture is published. With its amazing performance and ability of Transformer Architecture, researchers begin to think about using this architecture to perform above NLP tasks, to have one single model can do it all. And the result is the BERT and GPT models which both are using Transformer. A fact is that, BERT is powering the Google search engine, and GPT-3 is the one powering ChatGPT application. There are also more applications making used of these models can be found around Internet.

Some Core Challenges when doing NLP

No matter what method is applied, the challenges that forming the NLP field is still the same:

  • Computer does not understand words, it understands numbers. Find a method to convert each word in a sentence into a vector (a group of numbers) that: given 2 words with similar meanings, 2 vectors can have a close-distance to present the similarity.
  • Given a sentence with many words and variable length, find a vector can present the sentence.
  • Given a passage with many sentences and variable length, find a vector can present the whole passage.
  • From a vector of a word, sentence or passage, find a method to convert it back to words/sentences/passage. This task in turn become the Machine Translation, or Text Summarization.
  • From a vector of a word, sentence or passage, find a method to classify it into some senses/intents. This task in turn become Sentiment Classification.
  • From a vector of a word, sentence or passage, find a method to calculate the similarity to other vector. This task in turn become Question & Answering, or Text Generation, Text Suggestion.

It will be too long to dive into each keyword here so please hit Follow button to receive upcoming posts from my learning journey.

Thanks for reading!