AI going full-circle: Why this is a good thing
It is easy to believe that AI is something new, with the sudden rise of ChatGPT and Generative AI. But it´s not. You´re probably are not aware of it, but you´ve been using AI for a long time and in many ways already. Whether it is your digital camera from the nineties using AI for making better pictures, Samsungs´one-button washing machine deciding itself what to do with your dirty laundry in the 2000s. Or take this computer company (Digital Equipment Corp, DEC) that saved an estimated 44M$ in the early 80s with their AI-based computer configuration systems RI. Many examples can be found.
We have gone full-circle on AI.
Let me explain why, by giving you some of the key moments in the development of AI, and how the “rise of GenAI” is a logical consequence of everything that came before it. And not in the least, let me show you why it is a good thing that AI is looping back to it´s origins.
1940s-1950s: Formative Concepts
Many of the foundational concepts that still shape AI today are actually developed during the mid-20th century. The Perceptron (1943), forms the basis in many connectionist algorithms, Shanon´s Information theory (1948) describes how Entropy can be used to define information value.
Vannevar Bush’s vision of the “memex,” an imaginary device that could store and link information, was the predecessor to machine memory and knowledge representation. In 1951, the first neural network was created (using perceptrons), over 70 years before generative AI would make extensive use of them.
1960s: The Birth of Symbolic AI
The 1960s brought Symbolic AI, where researchers emulated human reasoning through. As computer systems at that time where very restricted in CPU and memory, only “condensed” information, preferably of the type already processed by humans, was computer-encoded through symbolic representation and logic. This enabled logical reasoning with “crisp” knowledge and facts. Early AI systems like the General Problem Solver (GPS) showcased the potential of this approach by solving problems in specific domains. However, these systems struggled with handling ambiguity and lacked the ability to learn from data.
1970s: The Cognitive Revolution
The 1970s came with a Cognitive Revolution in AI, changing focus from symbolic manipulation to human-like thinking processes. Researchers explored areas such as knowledge representation, natural language understanding, and expert systems. Planning systems like STRIPS (“STanford Research Institute Problem Solver”) manipulated symbolic knowledge in order to plan real-world tasks for SHAKY.
The world (as well it´s developer Weizenbaum) was shocked by the public´s reaction to ELIZA (1964), a computer program capable of simulating conversation. ELIZA sparked hefty discussions about the boundaries between human and machine communication, as people reacted to it as if the program could understand. Not unlike the discussions being triggered by ChatGPT lately.
SHAKY 1966–1972 (Stanford University), using STRIPS which enabled task decomposition and execution.
1980s: The Knowledge Engineering Era
The 1980s ,just when I started my studies on Artificial Intelligence at the University of Amsterdam, brought us Knowledge Engineering and Knowledge Management, directed towards encoding human expertise into computer systems. Expert Systems became popular, as they aimed to encode human decision-making processes into AI systems. These hard-coded systems showed difficulties to cope with the infinite complexity of the real world (still an issue), but also showed how sophisticated decision making could emerge from the application of relatively small numbers of rules. Advantages of Classical Expert Systems where also shown: when the rule-base finally was consolidated, execution was very efficient, and most expert systems showed great explanation capability, something that is often problematic in today´s AI systems.
1990s: The Exploration of Reinforcement Learning
The 90s brought the insight that knowledge engineering was a tedious process including many human experts with different views on similar topics. Acquisition of such knowledge and then aligning it in order to build sound systems often takes long time and is difficult and expensive to maintain.
So the focus shifted to the utilization of “raw” data instead, and use machine learning algorithms to do the heavy lifting. These algorithms could assist in finding patterns, recognizing objects, language, regularities. Supervised and unsupervised machine learning algorithms of various kinds appeared for a variety of tasks. Multi-layer Neural Networks formed the basis for many of the successes two decades later. Convolutional Neural Networks (CNNs) transformed image analysis, while Recurrent Neural Networks (RNNs) tackled sequential data. During those years, the exploration of Reinforcement Learning (RL) gained traction. Much of the research focused on playing games, as these provided a perfect environment to explore how rewarding certain behaviors could lead to the emergence of strategies and tactics that were more effective than anything hard-coded could be. Notable highlights included agents that learned to play backgammon (G. Tesauro, 1992) and chess (J. Baxter, S. Thrun. 1995). These successes formed the basis for ChatGPT’s reinforcement learning foundation.
2000s: The Era of Deep Learning
The 2000s introduced extensive amounts of data in the form of images, videos and text were brought to the internet by a world-wide surge of smart phones, digital cameras and many other newly developed devices.
This decade automated feature extraction, an important contributor to the success of Neural Networks. Until then, the raw data would be pre-processed to extract pre-defined features that a human thought might be useful before those features were passed to the machine learning algorithm to learn from. Deep learning merged these two phases together, at the same time figuring out what the features with the highest “entropy” (remember Shannon´s information theory?) and were and how to learn from them.
The ability to do this at scale was partly due to algorithmic advances, but the majority of the progress came from hardware. Driven by demands from the gaming and entertainment industries, graphics processing units (GPUs) started to become more powerful and more parallelized than CPUs. At the same time it became clear that machine learning algorithms benefitted enormously from the efficiency of graphic cards on matrix- and vector calculations. For the first time, the highly parallelized structure of a neural network had highly parallelized hardware to run on and the leap in performance was flabbergasting.
2010s: Advances in Generative AI
During the 2010s significant advancements in Generative AI were made. The introduction of Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) revolutionized the creative potential of AI. By learning the patterns present in data at massive scale these systems were able to create compact and efficient representations of complex knowledge, and by using multiple systems competing against each other, they were able to make those representations more robust to the noise and complexity of the real-world. However, while GANs and VAEs are good in generating realistic data samples and compressing information, these algorithms struggled with interpreting and processing sequences of data, such as text or time series data. This limitation became a crucial drive to develop attention mechanisms, which allowed models to weigh the importance of different parts of a sequence when making predictions. This greatly improved their ability to handle sequential data, which again laid the foundation for self-attention mechanisms and resulted in the groundbreaking Transformer models behind GPTx, Llama and their likes. As a result we got revolutionary better AI applications, particularly in natural language understanding and generation, as they excelled in processing sequences with a new efficiency and effectiveness.
By paying attention to the context in which a phrase was used, rather than just its grammatical and syntactic usage, these models were able to effectively disambiguate the large number of possible interpretations of a passage of text. And even more impressive, it could keep track of discourses throughout longer dialogues and even across languages. But still something was lacking in order to ensure general uptake and understanding of the possibilities of these powerful new algorithms and models. It was still largely a technical exercise to use them, and in order to apply them with success they still needed a lot of tweaking.
2020s: Convergence and Integration
In the current decade, AI’s various strands of development are converging and integrating. We have seen AI that can effectively manage to interpret visual, audio and sensory input, we can generate sounds, images, video, text and data with specific patterns. All this based on the analysis of vast amounts of data using extreme computing power and energy. We find the generation of music, images and video fascinating enough, but get excited when it creates texts. Presumably because we have not yet seen any other organism on the planet being able to communicate with language, and these current transformer based algorithms seem to be able to do so. And indeed, they are capable of writing texts in many formats and styles, from letters to articles, poems to computer code, and answering emails is not an issue either. In addition they can translate languages in unequaled ways, and combine different modalities together.
Even though that is revolutionary enough in itself, research aims at more. The algorithms perform good, but are not flawless. And many of the flaws point into the direction of a lack of ability to abstract from learned patterns, induce knowledge and reuse it in other contexts. Correct recognition of the context at hand is often problematic, resulting in catastrophic errors and incorrectness. Where human intelligence excels in (logical) reasoning, there is no evidence that generative AI has the required abstraction capability for this.
And this brings us back to the beginning: we have already build systems that are effective in representing abstracted knowledge and efficient in working with logics and reasoning. They show explanation capabilities, as well as a high degree of correctness. And not in the least, they can recognize constraint violations and logical fallacies of various kinds.
Lately we have seen an increased interest in Knowledge Graphs, which is a format for knowledge representation that has been developed over decades. Such graphs can be augmented with formal semantics, logical interpreters and constraint reasoners. Tedious to build, efficient to execute, and iff built well, very good in being correct.
Combining this capability with the recent technical advancements brought to us by generative AI should be the next frontier. The resemblance with Daniel Kahneman´s famous book “Thinking Fast and Slow” is not to oversee for those who read it (otherwise you might add it to your reading list!).
So are we back to square one? Not at all, it is rather AI going full circle!