Sunday 5 March 2023

Million times more powerful AI models than ChatGPT in 10 years, or just 'Smoke and Mirrors'?

 

techtalk
BY LESLIE D'MONTE

Million times more powerful AI models than ChatGPT in 10 years, or just 'Smoke and Mirrors'?

On 23 February, Nvidia added $79 billion in a single day in market value after its CEO Jensen Huang said that OpenAI's ChatGPT represents an inflection point for artificial intelligence (AI) during the company's earnings call. Huang pointed out that companies around the world are racing to incorporate the capabilities of generative language models like generative pre-trained transformer (GPT) into their businesses, which explains why Nvidia is excited since it can sell more gaming processor units (GPUs) to power the craze to be AI-ready.

ChatGPT itself is said to run on around 10,000 Nvidia GPUs, and Huang expects "to see AI factories all over the world", even as he predicts AI models one million times more powerful than ChatGPT within 10 years.

         

There are already signs that this is happening because of the inherent ability of large language models (LLMs) to do "in-context learning". Consider this. While reams have been written about ChatGPT, researchers are just beginning to understand the workings of AI language models that power text and image generation tools like DALL-E and ChatGPT. LLMs like OpenAI’s GPT-3 and Google’s LaMDA are adept at implementing tasks that they haven’t been specifically trained to perform but how they do it has largely been a mystery. Now, researchers at the Massachusetts Institute of Technology (MIT), Stanford University, and Google have explored this "curious" phenomenon called "in-context learning", which enables LLMS like GPT-3 to learn a new task from just a few examples, and without the need for any new training data (or parameter updates as they are known in technical jargon).

For instance, a machine-learning (ML) model like GPT-3 would typically need to be retrained with new data when given a new task like prompting it with a new sentence that it has not been trained on. During this new training process, the model updates its parameters. But it does not have to do so with in-context learning. "Usually, if you want to fine-tune these models, you need to collect domain-specific data and do some complex engineering. But now we can just feed it an input, five examples, and it accomplishes what we want. So, in-context learning is an unreasonably efficient learning phenomenon that needs to be understood," Ekin Akyurek, a computer science graduate student and lead author of a paper titled 'What learning algorithm is in-context learning? Investigations with linear models' that explores this phenomenon.

Credits:Image: Jose-Luis Olivares, MIT
MIT researchers found that massive neural network models that are similar to LLMs are capable of containing smaller linear models inside their hidden layers, which the large models could train to complete a new task using simple learning algorithms.

LLMS like GPT-3 learn from hundreds of billions of parameters (GPT-3 has 175 billion parameters) and have been trained by reading mountains of text on the internet--from Wikipedia to Common Crawl and Reddit posts. The researchers experimented by giving these models prompts using synthetic data that the models could not have seen anywhere before. Yet, they discovered that the models could learn from just a few examples, which prompted them to think that neural network models have smaller ML models within that performs the in-context learning magic. “That could explain almost all of the learning phenomena that we have seen with these large models,” said Ayurek in a press statement.

He added that with a better understanding of in-context learning, researchers could enable models to complete new tasks without the need for costly retraining. Mike Lewis, a research scientist at Facebook AI Research who was not involved with this work, corroborated that "these results are a stepping stone to understanding how models can learn more complex tasks, and will help researchers design better training methods for language models to further improve their performance”. You can read the complete study posted to the arXiv preprint server here.

That said, not all are enamoured with ChatGPT. In an article in Salon, Jeffrey Lee Funk--an independent technology consultant who was earlier taught at the National University of Singapore, Hitotsubashi and Kobe Universities in Japan, and Penn State, and Gary N. Smith--the Fletcher Jones Professor of Economics at Pomona College, believe that "GPT is not as great as many think, and LaMDA is not woefully far behind..." LLMs, they argue, are "mere text generators" but are not "intelligent in any real way -— they are just automated calculators that spit out words". These models, as many have pointed out earlier too, are programmed to assert their answers with great confidence but do not know what words mean and consequently have no way of assessing the truth of their confident assertions.

He recommends that users should ask GPT-3 to write their biography as "a reliable way of demonstrating GPT-3's unreliability". Do try this at home! I tried it, and the results were hilarious with a lot of truths and untruths appearing as a coherent but incorrect bio since the LLM would have also been trained on data other persons whose surname is D'Monte.

The researchers conclude: "The undeniable magic of the human-like conversations generated by GPT will undoubtedly enrich many who peddle the false narrative that computers are now smarter than us and can be trusted to make decisions for us. The AI bubble is inflating rapidly."

I have argued earlier too that while LLMs such as GPT-3 and models like ChatGPT may outperform humans at some tasks, they do not understand what they read or write, unlike humans. Moreover, these models use human supervisors to make them more sensible and less toxic. But the criticism of Funk and Smith notwithstanding, the "in-context learning" ability of LLMs may radically alter the way AI-powered chatbots work. Watch this space for more developments.

No comments:

Post a Comment