I tried ChatGPT's new o1-preview model, but you shouldn't switch just yet

Calvin Wankhede / Android Authority

With competitors from Google’s Gemini and Anthropic’s Claude AI fashions heating up, OpenAI has discovered itself within the midst of an id disaster. As soon as the undisputed chief in giant language fashions (LLMs), it’s now scrambling to take care of its place on the high. New fashions like ChatGPT-4o and 4o mini have stemmed the exodus to competing AI chatbots, however OpenAI is underneath fixed strain to maintain innovating. The corporate has carried out simply that with o1-preview, a brand new AI mannequin sequence that excels at complicated reasoning and emulating human thought. How good is it? I put it to the take a look at to seek out out.

What’s the new o1-preview ChatGPT mannequin all about?

OpenAI’s o1-preview and o1-mini are the newest fashions accessible inside ChatGPT, designed for complicated reasoning duties and problem-solving. As their names recommend, these fashions will not be generational successors to GPT-4 or any of OpenAI’s earlier language fashions. Actually, GPT-4o won’t solely live on but additionally stay the default mannequin for all chats.

Not like prior fashions that responded to your prompts as shortly as attainable, the o1 sequence has been designed to spend extra time considering by means of issues, much like a human’s thought course of. This naturally ensures better accuracy in prompts associated to math and coding, however it’s also helpful for real-world questions and situations, as I’ll showcase in my testing under.

We first heard concerning the o1 mannequin sequence in July, when Reuters interviewed researchers conversant in a secretive inside venture codenamed Strawberry. The purpose of the venture was to develop an AI able to performing “deep analysis,” in step with the corporate’s mission to attain synthetic normal intelligence (AGI). The latter refers to an AI system that’s clever sufficient to outthink people throughout a number of topics. The Strawberry venture was rumored to reach forward of GPT-5, which continues to be being developed.

o1 is OpenAI’s newest mannequin household that may break down issues and motive like a human.

The brand new o1 sequence continues to be a good distance off from attaining true AGI — OpenAI CEO Sam Altman admitted that “o1 continues to be flawed, nonetheless restricted, and it nonetheless appears extra spectacular on first use than it does after you spend extra time with it.” Nevertheless, it’s a giant leap ahead from the earliest ChatGPT launch that many believed would by no means succeed at fixing math issues or logical workout routines.

Whereas o1-preview is the latest flagship mannequin, it’s additionally accompanied by a a lot leaner and quicker o1-mini. OpenAI discovered that the sequence excels at coding, so it additionally launched a second mannequin that may precisely generate and debug code. Aimed principally at builders, o1-mini is 80% cheaper than o1-preview.

o1-preview vs GPT-4o examined: Is it actually higher?

For those who’re skeptical that o1-preview is leagues forward of prior fashions, there’s excellent news — the chatbot does pause to assume, typically upwards of a minute, earlier than responding. It breaks down complicated issues into chunks, which helps it appropriate errors

Nevertheless, there’s additionally unhealthy information — the o1 sequence is just not universally higher throughout the board. Particularly, it can not search the web for brand spanking new data just like the older GPT-4o mannequin nor can it carry out superior information evaluation. You additionally can not add recordsdata and pictures, which means you’ll need to frontload every immediate with as a lot data and context as attainable. OpenAI even admits that many ChatGPT customers will wish to stick with GPT-4o in the intervening time.

Setting apart these caveats, although, how does it carry out? To search out out, I posed a handful of complicated and sophisticated inquiries to each of OpenAI’s finest fashions. Right here’s how o1-preview fared vs GPT-4o.

Immediate 1: What number of legs do I’ve?

Beginning with a simple one, I requested ChatGPT what number of legs I’d have if I had 4 cows, 3 canine, 2 cats. The reply is clearly two, which GPT-4o put forth however solely after saying I’d have 36 animal legs. Against this, I watched the o1-preview mannequin “assume” for 5 seconds earlier than accurately (and confidently) saying I’d have two legs. It additionally acknowledged that the query was a riddle.

I additionally posed the identical query to OpenAI’s smaller GPT-4o mini mannequin and it failed miserably. It merely mentioned I’d have 38 legs, including mine to the animals’ rely.

Immediate 2: Funding return calculation, whereas accounting for foreign money depreciation

Since easy prompts solely require a couple of seconds of considering, I made a decision to take issues up a notch. On this immediate, I requested ChatGPT to seek out the higher funding between two belongings with differing returns and dangers. The chatbot took 11 seconds to assume earlier than it responded this time. As soon as once more, it delivered the proper reply whereas explaining every step.

Curiously, GPT-4o additionally arrived on the similar conclusion however it didn’t compute the figures by itself. As an alternative, it generated the Python code essential to carry out the calculations and executed it through ChatGPT’s superior information evaluation function. So whereas the output is similar, the complexity is increased. Coding as a workaround additionally has the potential to fail fairly spectacularly, as I might quickly discover out.

Immediate 3: Which is healthier, shopping for a home or renting?

For those who hold round financially savvy of us, you’ll know that renting vs shopping for a home is an excellent divisive matter that includes a variety of variables, each monetary and in any other case. Fortunately, we will ask ChatGPT to do the mathematics for us — the o1-preview mannequin put 37 seconds’ value of thought into this query and broke it down into 12 totally different steps.

I offered a number of figures, together with my down cost quantity, rate of interest, anticipated return on funding if I rented as a substitute, and extra. This made the query much more sophisticated — ChatGPT needed to first compute the price of an $800,000 house with a $200,000 down cost. The remaining quantity can be financed with a 20-year mortgage at 3.5% curiosity. If I rented as a substitute, I’d be capable to make investments all the $200,000 in an index fund and save any further revenue after paying off the lease too.

The o1-preview mannequin responded with a 1,000-word breakdown of the issue, concluding that my web value can be increased by roughly $716,620 after 20 years if I rented as a substitute of shopping for a house.

OpenAI’s prior GPT-4o mannequin cannot sustain with o1-preview in superior reasoning duties.

Feeding the identical immediate to GPT-4o yielded a way more disappointing end result. The mannequin tried to generate and run Python code to resolve this drawback, however failed twice earlier than succeeding on the third attempt. Even then, it responded incorrectly and prompt I’d lower your expenses by shopping for a house as a substitute. It solely admitted fault after I identified a discrepancy in its calculations.

gpt o1 preview buy vs rent house complexity thought

Since there are much more variables that may be concerned, I additionally requested o1-preview to think about elements like property appreciation, upkeep prices, and taxes if I purchased a house in addition to a possible 3% improve in lease payable yearly. This time, it took 142 seconds to assume earlier than responding with a believable conclusion, which I believe may be very spectacular.

Learn how to use ChatGPT’s o1-preview and o1-mini fashions

As you will have guessed, the o1 mannequin sequence requires copious quantities of computational energy. And provided that ChatGPT itself has been rumored to be unprofitable since its launch in 2022, it’s not shocking that OpenAI has locked o1-preview behind a paywall. In different phrases, you will have a ChatGPT Plus subscription to pick out the newest mannequin from the dropdown menu pictured above.

Actually, the mannequin is so costly that OpenAI has additionally positioned a tough cap of fifty messages per week on high of the $20 per 30 days paywall. When you exhaust this quota, your solely choice is to attend or pay for a second ChatGPT Plus account. OpenAI has imposed such fee limits up to now, particularly across the time GPT-4 was first launched, however this occasion is probably the most aggressive one but.

Fortunately, the overwhelming majority of ChatGPT prompts don’t profit from o1’s considering capabilities. And if you’re a programmer, the o1-mini mannequin inside ChatGPT can also be rolling out to the free plan in a restricted capability.

No, you might want to pay for a ChatGPT Plus subscription to make use of the o1-preview mannequin. Nevertheless, the o1-mini mannequin is accessible on the free tier in a restricted capability.

All in all, ChatGPT’s new o1-preview mannequin may be very spectacular and value a glance in case you have math and programming questions. It won’t be the only option for many duties, and even the overwhelming majority of duties, however it’s the closest we now have to emulating human reasoning and thought. Nevertheless, the overwhelming majority of customers gained’t profit from o1-preview’s improved logical reasoning abilities or math capabilities so I can not suggest switching to it full time. The weekly response restrict and missing internet searching assist additionally imply I’ll proceed utilizing GPT-4o going ahead. And should you solely use ChatGPT a couple of occasions day-after-day, you may simply get by with a free account.

Perplexity’s Professional Search function additionally applied multi-step reasoning a couple of months in the past and it too delivered spectacular leads to my testing. If you need a peek at chain-of-thought AI reasoning with out paying for it, I’d suggest making an attempt it out because you get 5 Perplexity Professional searches each few hours on the free tier. I haven’t examined it in opposition to OpenAI’s o1-preview head-to-head but, however it’s clear that competitors within the AI area has compelled ChatGPT to evolve and I can’t wait to see the place it’s headed subsequent.

Source link