A week ago, OpenAI announced GPT-4, their latest iteration of LLMs after GPT-3.5.
While a lot of improvements have been made over GPT-3.5, there are, in my opinion, three aspects in particular that stand out:
No. 1 - It’s surprisingly good in doing bar exams but - also surprisingly - bad at English literature and language exams
I don’t know much about the Uniform Bar Exam or law stuff in general, but from what I’ve gathered, it seems like these exams test a candidate’s reading comprehension and precision in expressing ideas (plus tons and tons of facts about law, rulings, precedents and related information). Considering that GPT-3.5 scored in the bottom 10% on the exam, it is remarkable that GPT-4 achieved scores in the top 10% of test-takes, which means it would’ve passed the exam with flying colours (and, in a way, it did).
Which makes it all the more surprising that it scored pretty poorly on exams for English literature and language. It’s a large language model that was trained on corpuses which predominantly consist of texts in English - chances are it has “read” more versions of “the Classics” (together with thousands of interpretations) than your average professor of English.
I’ve no clue as to why that is - maybe it has to do with the AP English literature and language exam setup. No idea, I’m not familiar with those tests.
No. 2 - Prompts with texts and images
Ok, I have to admit that this is wild: having prompts with both text and images allows large language models to incorporate visual information into their understanding of the world, not just textual data. If audio prompts were added to the mix, it could lead to lots of very interesting and intriguing applications in the field of robotics!
No. 3 - Larger prompts
The most striking announcement, in my opinion, came at the end of OpenAI’s post: In the API section, OpenAI casually mentioned that they’ve increased the size of tokens per prompts from 2048 for GPT-3.5 to 8192 for GPT-4 - that’s four times the size! And they also teased limited access to even larger prompts of up to 32,768 tokens per prompt.
That would be a game changer! Why? Because larger prompts mean larger context, and when it comes to language, context is king, hands down (a lesson that modern linguistics had to learn the hard way - just look at the development of the field of pragmatics, for example).
Let’s think about it for a moment: 32.000 tokens (or “words”) is roughly the size of a small novel that you can now provide to your LLM in a single prompt - no more chopping up prompts, fingers crossed that it’ll retain enough information to be used for the “actual” prompts later on. That big business report? No problem, just feed it into GPT-4! Information, summaries and even conclusions for on a complex legal case? GPT-4 has you covered (even more so since it’s now also a top lawyer, remember?).
–
All in all, it looks like it’s going to be an interesting year for NLP :)
And with that I’ll wrap it up!