In what can only be described as one of the most exhilarating weeks in AI history, groundbreaking developments have emerged from every corner of the field. From Stanford’s Alpaca 7B pushing the boundaries of conversational AI to the highly anticipated GPT-4 release and the cutting-edge advancements in MidJourney’s V5 and PyTorch 2.0, the AI landscape is being reshaped at a breathtaking pace. Join us as we delve into these riveting announcements and unveil how they are transforming the world of artificial intelligence as we know it.

Stanford Alpaca 7B

Stanford University made waves on March 13th by releasing its “own” version of a Large Language Model, Alpaca 7B. Based on Meta’s LLaMA 7B model, Alpaca has been fine-tuned using 52,000 instruction-following demonstrations, similar to those used to train OpenAI’s ChatGPT. Stanford has released the training code, data, and a simple demo (unfortunately suspended as of March 18th).

While Meta’s LLaMA series serves primarily as an “auto-complete” model, Stanford’s Alpaca 7B is designed for conversation. According to Stanford, the model behaves similarly to ChatGPT’s underlying model, text-davinci-003, while surprisingly small and inexpensive. In fact, there are now applications available that allow anyone to run Alpaca 7B locally, with comparable speed as ChatGPT on a standard modern laptop.

Additionally, Alpaca 7B does not have the same limitations as ChatGPT, meaning that it can output opinionated content and potentially harmful texts. Stanford’s Alpaca has made the power of modern LLMs accessible to enthusiasts, allowing anyone to fine-tune other versions of LLaMA. Overall, Stanford’s Alpaca 7B provides an affordable and capable alternative to more well-known large language models, making significant strides in expanding their accessibility to a broader audience.

GPT-4

There isn’t much to be said about this; OpenAI’s newest large language model, GPT-4, is the most anticipated release/announcement in the past few weeks. Released on March 14th, GPT-4 sets a new bar for what’s possible in the field. In fact, it’s not just an LLM—it’s a multi-modal model capable of processing images and text inputs and relating the two together in previously unimaginable ways.

OpenAI reports that GPT-4 significantly outperforms GPT-3.5 across various metrics, including creative writing, logical understanding, mathematical analysis, and multilingual abilities. It’s even been able to pass the US bar legal exam, with a score in the top 10 percentile, and earned competitive scores in some of the most demanding academic exams.

Image source: AP Calculus AB FRQ Question 4

Along with the announcement of GPT-4, another surprising news came from Microsoft: Bing chat was GPT-4 all along. That said, we’ve technically already seen the release of GPT-4 more than 5 weeks ago. Both Duolingo and Khan Academy are already integrating GPT-4 within their applications to help students and educators.

OpenAI says GPT-4 is “82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.”

Furthermore, GPT-4 can accept over 25,000 words of context (compared to the 3000~ words limit of GPT-3.5), allowing for extensive document searches and efficient summarization. GPT-4’s capability is impressive, from being able to produce the code for a hand-sketched website, to creating content that passed the “completely human” check of GPTZero, and confusing OpenAI’s LLM text classifier.

Unfortunately, due to security and competition reasons, OpenAI stated in their GPT-4 technical report that it “contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar”. 

GPT-4 is available for ChatGPT-Plus users at a minimal rate, while people can join the API waitlist at a rate of $0.03 per 1k prompt tokens and $0.06 per 1k completion tokens. Another caveat of GPT-4 lies in its time to respond. GPT-4’s text-generation speed is significantly slower than the current free version of ChatGPT. 

MidJourney V5

On March 15th, MidJourney dropped some exciting news: the V5 alpha-testing is now available on their official discord server. Here’s what you need to know:

Firstly, the new model supports higher image resolution and aspect ratios than ever before. No more waiting for images to upscale manually—they can now be scaled up instantaneously. 

Plus, images are more detailed and accurate, with better human and facial features.

But the improvements don’t stop there. The V5 model is now more responsive to prompts and generates a broader range of styles, making it easier to create the exact aesthetic you’re looking for. However, shorter prompts or one-word phrases may not produce the same results as V4.

Image source: Nick St. Pierre (@nickfloats) on Twitter

MidJourney has also improved its natural language processing, meaning that it’s now easier to input sentences rather than having to list specific requirements.

The cherry on top? MidJourney is aiming for a “friendly” default style for V5, which is excellent news for anyone who wants to generate realistic images without searching for specific prompts. But with all these advancements, there’s a downside. Some people worry that the increased quality and expressiveness of generated images will diminish the value of artists, photographers, and their work. It has been a major concern since diffusion models were proposed, but it will only increase as they surpass what humans can create. 

PyTorch 2.0

The PyTorch 2.0 release has been eagerly awaited by the deep learning community since its announcement during the PyTorch conference in December 2022. The March 15th update brings a wealth of new features and improvements, including stable support for Accelerated Transformers, which leverages a custom kernel architecture for scaled dot product attention to deliver high-performance training and inference support.

One of the standout features in PyTorch 2.0 is the beta version of torch.compile. This powerful API can significantly speed up model performance with just a single line of code. In testing, the PyTorch team found that over 90% of open-source models were compatible. Other notable beta features include Dispatchable Collectives, which simplifies running code on both GPU and CPU machines, and functorch, which enables advanced autodiff use cases. 

Additionally, updates to TorchAudio, TorchVision, and TorchText domain libraries offer new functionality and enhancements. With these new features, improvements, and much more, PyTorch 2.0 is poised to revolutionize the training and deployment of state-of-the-art deep learning models.

Other Releases & Announcements

In addition to the exciting releases of LLMs and diffusion models, there have been other releases. Google shared updates on its medical large language model (LLM) research on March 14th, including progress on its expert-level medical LLM called Med-PaLM 2. The model consistently performed at an “expert” doctor level on medical exam questions, scoring 85%, an 18% improvement from its previous version. 

Google-backed Anthropic released its competitor to ChatGPT, Claude, on March 14th. Quora has offered Claude to its users through Poe, while Antropic provided a standard and lightweight version of Claude to the public. Users have remarked that Claude is more “conversational than ChatGPT” and “more interactive and creative in its storytelling.”

Image Source: Anthropic

Google has also taken an enormous step forward in its consumer products by integrating generative AI into Google Workspaces, which offers assistance with creative writing, data analysis, and image creation. But Microsoft is not one to be left behind, as they unveiled Microsoft 365 Copilot two days later. Users can use natural language prompts from this new AI assistant to generate status updates, insights, and data visualizations. It is intended to be integrated with Microsoft 365 products, including Word, Excel, PowerPoint, Outlook, and Teams.

As we reflect on this whirlwind week in AI history, it is clear that these groundbreaking developments have set the stage for a new era of artificial intelligence. The possibilities for innovation are endless, from the democratization of LLMs to advancements in generative AI and deep learning. As the field evolves at an unprecedented pace, we eagerly anticipate what lies ahead and how these transformative technologies will shape our future.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo