Can Your AI Draw a Pelican on a Bike? This Test Says a Lot

Amazon Nova\'s depiction of Pelican on a Bicycle [Photo courtesy of Simon Willison\'s Weblog] — Amazon Nova’s depiction of Pelican on a Bicycle [Photo courtesy of Simon Willison’s Weblog]

A new benchmark for evaluating artificial intelligence (AI) models has emerged: the Pelican on a Bicycle drawing test, proposed by engineer Simon Willison. On Monday, GigaGen reported Willison’s latest analysis, presented at the AI Engineer World’s Fair in San Francisco.

The first notable performance came from Amazon’s AI model, Nova, which was launched last November.

Willison tasked Amazon’s three text generation models – Nova Micro, Nova Lite, and Nova Pro – with drawing a pelican on a bicycle. The results were disappointing, with the images being nearly indecipherable.

Meta’s AI models also fell short of expectations.

While Meta’s earlier Llama 3.1 405B model could somewhat depict a bicycle and a pelican, the newer Llama 3.3 70B failed to represent either of them accurately. Despite Llama 3.3’s ability to operate more cost-effectively with 70 billion parameters, its performance in this test lagged significantly behind its predecessor.

OpenAI’s GPT-4.1 series, including GPT-4.1 Mini and GPT-4.1 Nano, produced unstable bicycle images, yielding unsatisfactory results.

Anthropic\'s Claude 3.7 Sonnet\'s depiction of Pelican on a Bicycle [Photo courtesy of Simon Willison\'s Weblog] — Anthropic’s Claude 3.7 Sonnet’s depiction of Pelican on a Bicycle [Photo courtesy of Simon Willison’s Weblog]

DeepSeek, however, showed remarkable improvement. Willison praised DeepSeek-R1 for its enhanced pelican depiction and easily recognizable bicycle imagery.

The standout performer was Anthropic’s Claude 3.7 Sonnet, which perfectly illustrated the Pelican on a Bicycle. Its rendition surpassed all others in accuracy for both the pelican and bicycle.

Lastly, Gemini 2.5 Pro Preview-05-06 impressed with its flawless pelican depiction. This model had previously scored 1499.95 in visual completeness and functionality at WebDev Arena, ranking first overall. This score outperformed Claude 3.7 Sonnet by about 17% and showed significant improvement over earlier Gemini versions.

Radioactive Waste from North Korea Could Be Flowing into South Korean Waters

TOURIST Trump: President Gets ZERO Deal and ZERO Meeting—His Korea Trip Was Just An Expensive APEC Souvenir Run

Austin Butler Applauds Korean Cinema Influence on Dune 2’s Global Appeal

Can Your AI Draw a Pelican on a Bike? This Test Says a Lot

Check Out Our Content

South Korea Launches AI Robot Testing Hub in Boston to Speed Global Market Entry

“Ioniq 5 Robotaxi to Operate in U.S. Cities”… Motional Launches Pilot Service With Uber

U.S. Launches Section 301 Probe, Auto Industry on Alert: “Limited Immediate Impact but Closely Monitoring”

North Korea Launches 600mm Rockets Capable of Hitting Seoul and U.S. Bases

Didier Dubot Strengthens Premium Strategy, Expands Global Presence in the United States and Asia

NATIONAL COLLAPSE : How Trump’s Failed War Is Burying The Economy In A Stagflation Grave

Kim Jong Un Watches Rocket Drill With Daughter, Warns of ‘Tactical Nuclear’ Power

HUMILIATION AT THE HORMUZ : Trump’s Cowardly Demand For Others To Fight His Suicidal Conflict

Samsung SDI Secures 1.5 Billion KRW ESS Battery Supply Deal: What This Means

Most Popular Articles

South Korea Launches AI Robot Testing Hub in Boston to Speed Global Market Entry

“Ioniq 5 Robotaxi to Operate in U.S. Cities”… Motional Launches Pilot Service With Uber

U.S. Launches Section 301 Probe, Auto Industry on Alert: “Limited Immediate Impact but Closely Monitoring”

North Korea Launches 600mm Rockets Capable of Hitting Seoul and U.S. Bases

Didier Dubot Strengthens Premium Strategy, Expands Global Presence in the United States and Asia

NATIONAL COLLAPSE : How Trump’s Failed War Is Burying The Economy In A Stagflation Grave

Kim Jong Un Watches Rocket Drill With Daughter, Warns of ‘Tactical Nuclear’ Power

HUMILIATION AT THE HORMUZ : Trump’s Cowardly Demand For Others To Fight His Suicidal Conflict

Cars

Tech

future

health