Artificial Intelligence (AI) is not just a futuristic concept. It is already shaping the world around us. However, diving into the AI realm can feel like stepping into a world filled with jargon, buzzwords, and acronyms, which may be daunting to a beginner who just wants to find out what all the hype is about or understand some heated Twitter/X threads (and believe me, there are many). This guide will unravel the mysteries around a select few AI buzzwords. Please keep in mind that the following is not expansive and is intended for a beginner audience and does not dive deep into technical background. Those who are interested in reading more can follow the included links.
Artificial Intelligence (AI)
Let's start with the basics. AI refers to machines or computer systems that mimic intelligent human behavior, essentially a computer that can learn and make decisions without explicit programming – that's AI at work. It's the brains behind smart assistants like Siri, Alexa, and ChatGPT.
Machine Learning (ML)
Machine learning is a subset of AI that enables systems to learn from data and improve their performance over time. It is like teaching a computer to recognize patterns from data and make predictions without being explicitly programmed for each task.
Deep Learning (DL)
Now, let's go deeper. Deep learning is a specialized field of machine learning that involves neural networks inspired by the human brain's structure (well, kind of). These huge networks learn to perform tasks by analyzing vast amounts of data. Deep learning powers image and speech recognition in your phone and is also being increasingly used in science, healthcare, remote sensing, and so on. Nowadays, whenever AI is mentioned, it almost always includes methods from deep learning.
GPT
When ChatGPT was first released by OpenAI in 2022, the term GPT started trending outside of the AI realm. GPT, short for generative pretrained transformers, is a type of deep learning model that excels at understanding and generating human-like text. Developed by OpenAI, GPT models have shocked the internet with their ability to generate coherent and contextually relevant text, making them the powerhouse behind chatbots, content creation tools, and much more. Some important works in this area include: GPT-1, GPT-2, and GPT-3.
Transformer
GPT, among other models, owes its power to the transformer architecture (the T in GPT). This innovative architecture allows the model to capture relationships between words in a sentence, understanding context and generating coherent responses. Other than in the GPT models, transformers are also used in many other important models including BERT and T5 (you can probably guess what the T’s stand for). Since the model architecture is extremely general and powerful, it has been adapted to realms beyond language, including images, audio, videos, and even genomic data. Transformers have revolutionized deep learning and has been the celebrity of AI ever since it was first introduced in this research paper in 2017. See this for a high level overview and this for an implementation walkthrough.
Multimodal AI
As AI models continue to scale, multimodal AI (AI that can process and/or generate data of multiple types or modalities) has become increasingly popular. In a world where information comes from various sources, multimodal AI tries to make sense of it all. This approach involves processing and understanding data from multiple modes, such as text, images, and audio, simultaneously. This enables the creation of models that can also convert data from one modality to another, such as image captioning, image generation (the field behind the controversial area of AI art) or even music generation, as well as models that can understand and produce both images and text (such as the latest version of ChatGPT and DALLE-3).
Comments