How will AI make my work day faster?
As a business owner or executive juggling multiple responsibilities and constantly seeking ways to
As mobile technology advances rapidly, one exciting prospect on the horizon is running large language models (LLMs) with 400 billion parameters directly on our smartphones and tablets. Meta has just released its Llama 3.1 model with a 400 billion parameter version, and it's looking like it's contented to top GPT4o.
This would allow us to have incredibly powerful natural language processing, generation, and understanding capabilities at our fingertips without relying on cloud servers or internet connectivity.
The applications could be game-changing, from real-time speech translation to intelligent virtual assistants to creative writing aids. However, squeezing such massive AI models onto the limited hardware of mobile devices presents some formidable technical challenges that will require significant innovations to overcome. In this post, I'll explore the current state of ultra-large language models, the hurdles standing in the way of getting them onto phones, and the technological breakthroughs needed to make it happen.
Over the past few years, language models have achieved new heights in scale and capability. State-of-the-art models like GPT-4o, PaLM, Chinchilla, and Megatron-Turing NLG have surpassed 100 billion parameters, displaying impressive language understanding and generation abilities. However, these behemoths require enormous computational resources to train and run. For example, GPT-3 was trained on 45 terabytes of text data using over a thousand A100 GPUs from NVIDIA. Running the full model for inference requires at least dozens of gigabytes of memory and customized hardware accelerators to achieve reasonable speeds and costs.
On the open source side, projects like EleutherAI's GPT-Neo and GPT-J have aimed to replicate the capabilities of GPT-3.5 using publicly available data and code. The largest of these, GPT-J-6B, has 6 billion parameters. While much more accessible than its larger cousins, it still has a sizeable footprint of around 22 GB. Simply storing the model weights would overwhelm the 4-6 GB of RAM found in most modern smartphones, let alone the additional scratchpad memory needed to make predictions.
So, in summary, the current crop of ultra-large language models require data centres full of specialized AI accelerator chips to train and run them cost- and energy-efficiently. A 400 billion parameter model like the one we're dreaming of would likely be 10-20x more demanding than GPT-3.5 regarding computational and memory requirements. Running that continuously on a battery-powered handheld device is unimaginable with today's technology.
Several compounding challenges make the prospect of running 400 billion parameter LLMs on smartphones exceedingly difficult:
To run 400 billion parameter models on a smartphone, we'll need revolutionary breakthroughs on multiple fronts:
So when can we expect 400 billion parameter LLMs to run locally on smartphones? This is a difficult question to answer precisely, as it depends on the pace of progress in multiple fields. However, here is my rough timeline:
Of course, these are just my educated guesses based on the current state of the field and reasonable projections. The actual path of this technology will likely surprise us in both promising and challenging ways. These developments will almost certainly be unevenly distributed, with the most advanced capabilities initially limited to select high-end flagship devices before eventually trickling down to the mass market.
The prospect of running 400 billion parameter language models locally on smartphones is one of the most exciting and transformative developments for mobile technology. The ability to carry around human-level language understanding and generation capabilities in our pockets, untethered from the cloud, would be a monumental milestone in computing and AI.
However, bridging the massive gap between the data centre-scale resources required by today's LLMs and the stringent constraints of mobile hardware is a herculean challenge. Significant breakthroughs will be needed on both the software and hardware fronts - compressing models by 100x+ without losing quality, crafting vastly more efficient architectures, inventing new chips to efficiently process ultra-sparse models, perhaps shifting to wholly novel substrates like analog or optical computing.
None of these will be easy. But the immense potential of putting LLMs into the hands of billions - for education, health, productivity, accessibility, entertainment, and more - makes it a challenge well worth undertaking. With focused research and investment, backed by the relentless advancement of hardware, I believe we will not only achieve 400B models on smartphones but perhaps even 1T models or beyond. And that will truly change the game for mobile and AI.
Some other posts you may like
How will AI make my work day faster?
As a business owner or executive juggling multiple responsibilities and constantly seeking ways to
July 24, 2024
Read MoreThe Evolution of AI in Video Creation: Opportunities, Ethics, and Future Potential
In recent months, generative AI's capabilities have surged, particularly in video creation. From transforming …
July 24, 2024
Read More