

Photo by author
# Introduction
It seems like almost every week, a new model claims to be the latest, beating the current AI model on all criteria.
I have free access to the latest AI models at my full-time job within weeks of release. I usually don’t pay too much attention to the hype and just use whatever model is automatically selected by the system.
However, I know developers and friends who want to build software with AI that can be shipped into production. Since these initiatives are self-funded, their challenge lies in finding the best model to work with. They want to balance cost with reliability.
Because of this, after the release of the GPT-5.2, I decided to run a practical test to understand if this model was worth the hype, and if it really was better than the competition.
Specifically, I chose to test flagship models from each supplier: Cloud Ops 4.5 (the most efficient model of anthropic), GPT-5.2 Pro (Upnai’s latest reasoning model), and Depsec v3.2 (One of the latest open source alternatives).
To put these models to the test, I chose to make them into a workable Tetris game with a single prompt.
These were the metrics I used to evaluate the success of each model:
| Quality | Description |
|---|---|
| First attempt success | With just one hint, did the model deliver working code? Multiple debugging iterations lead to higher costs over time, which is why this metric was chosen. |
| Feature Completion | Were all the features mentioned in the prompt created by the model, or was something missed? |
| Playability | Beyond the technical implementation, was the game really smooth? Or were there issues that created friction in the user experience? |
| Cost effectiveness | How much did it cost to get the code ready for production? |
# hint
Here is the input I entered into each AI model:
Create a fully functional Tetris game as a single HTML file that I can open directly in my browser.
Requirements:
Game Mechanics:
– All 7 Tetris piece types
– Smooth slice rotation with wall kick collision detection
– The pieces should fall automatically, gradually increasing the speed as the user’s score increases
– Line cleaning with visual animation
– “Next Fragment” preview box
– Find out more about the game when the pieces reach the topControl:
– Arrow keys: Fast, Down, Down, Down, Down/Right to rotate
– Touch controls for mobile: swipe left/right to move, swipe down to drop, tap to rotate
– Spacebar to pause/unmute
– Enter the restart key after the game is overVisual Design:
– Gradient color for each fragment type
– Smoother animations when pieces are broken and lines are cleared
– Clear UI with rounded corners
– Update the score in real time
– Level indicator
– Screen over screen with last score and restart buttonGameplay experience and polish:
– Smooth 60 fps gameplay
– Particle effects when lines are cleared (optional but impressive)
– Increase the score based on the number of clear lines simultaneously
– Grid background
– Responsive designMake it visually polished and feel satisfying to play. Code should be clean and organized.
# Results
// 1. Cloud Ops 4.5
The Ops 4.5 model did exactly what I asked for.
The UI was clean and the instructions were clearly displayed on the screen. All controls were responsive and the game was fun to play.
The gameplay was so smooth that I actually ended up playing for a long time and bumped into testing other models.
Also, Ops 4.5 took less than 2 minutes to provide me with a working game, which impressed me on the first try.


Tetris is made by Game Ops 4.5
// 2. GPT-5.2 Pro
The GPT-5.2 Pro is Openai’s latest model with extended reasoning. For context, GPT-5.2 has three levels: Immediate, Thinking and Pro. At the time of writing this article, the GPT-5.2 Pro is their most intelligent model, providing extended thinking and reasoning capabilities.
It is also 4x more expensive than Ops 4.5.
There was a lot of hype surrounding this model, leading me to go in with high expectations.
Unfortunately, I was disappointed with the game this model produced.
In the first attempt, GPT-5.2 Pro produced a Tetris game with a layout bug. The bottom rows of the game were out of viewport, and I couldn’t see where the pieces were landing.
This made the game unplayable, as shown in the screenshot below:


Tetris game is built by GPT-5.2
I was particularly surprised by this problem because the model took about 6 minutes to generate this code.
I decided to try again with this follow-up prompt to resolve the viewport issue:
The game works, but there is a bug. The bottom rows of the Tetris board are disconnected at the bottom of the screen. I can’t see the fragments when they land and the canvas extends beyond the visible viewport.
Please fix it:
1. Making sure the entire game board fits in the viewport
2. Adding proper centering so that the entire board is visibleThe game should fit on the screen with all rows visible.
After the follow-up prompt, the GPT-5.2 Pro model produced a functional game, as seen in the screenshot below:


Tetris Second Attempt by GPT-5.2
However, the gameplay was not as smooth as it was made out to be by the Ops 4.5 model.
When I pressed the “down” arrow to drop a piece, the next piece would sometimes fall instantly at such a high speed, I didn’t have enough time to think about how to position it.
The game only ended when I let each piece fall by itself, which wasn’t the best experience.
(Note: I also tried the GPT-5.2 standard model, which produced similarly buggy code on the first try.)
// 3. Deepsec v3.2
Depesek’s first attempt at building this game had two problems:
- Pieces start disappearing when they hit the bottom of the screen.
- The “down” arrow used to quickly drop pieces ended up scrolling the entire web page instead of moving the game pieces.


Tetris game is made by DeepSec v3.2
I rebuilt the model to fix this problem, and the gameplay controls ended up working correctly.
However, some pieces disappeared before they could be broken. This made the game completely unplayable even after the second iteration.
I’m sure this problem can be fixed with 2–3 more pointers, and given the low cost of deepsock, you can afford 10+ debugging rounds and still spend less than one successful Ops 4.5 attempt.
# Summary: GPT-5.2 vs Ops 4.5 vs DeepSock 3.2
// Cost breakdown
Here’s a cost comparison between the three models:
| Model | Input (per 1 meter token) | Output (per 1 meter token) |
|---|---|---|
| Depsec v3.2 | 7 0.27 | 10 1.10 |
| GPT-5.2 | 75 1.75 | .00 14.00 |
| Cloud Ops 4.5 | $5.00 | .00 25.00 |
| GPT-5.2 Pro | .00 21.00 | .00 84.00 |
Depsec V3.2 is the cheapest alternative, and you can download the model weights for free and run it on your own infrastructure.
GPT-5.2 is about 7x more expensive than Deepsec V3.2, followed by Ops 4.5 and GPT-5.2 Pro.
For this particular task (building a Tetris game), we consumed about 1,000 input tokens and 3,500 output tokens.
For each additional iteration, we will estimate an additional 1,500 tokens in an additional round. Here is the total cost per model:
| Model | Total cost | The result |
|---|---|---|
| Depsec v3.2 | ~ 0.005 | The game is not playable |
| GPT-5.2 | ~$0.07 | Playable, but poor user experience |
| Cloud Ops 4.5 | ~$0.09 | Playable and good user experience |
| GPT-5.2 Pro | ~$0.41 | Playable, but poor user experience |
# Takeaways
Based on my experience building this game, I’ll stick to the Ops 4.5 model for day-to-day coding tasks.
Although GPT-5.2 is cheaper than Ops 4.5, I personally wouldn’t use it in code, as the iterations required to achieve the same result likely cost the same amount of money.
The Dipsec V3.2, however, is much more affordable than the other models on this list.
If you’re a developer on a budget and have time to spare on debugging, you’ll still end up saving money even if it takes you more than 10 tries to get working code.
I was surprised at GPT 5.2 Pro’s failure to produce a working game on the first try, as it took 6 minutes of thinking before coming up with poor code. After all, this is Openai’s flagship model, and Tetris should be a relatively simple task.
However, GPT-5.2 Pro’s strengths lie in mathematical and scientific research, and it is specifically designed for problems that do not rely on pattern recognition from training data. Perhaps this model is more engineered for simple everyday coding tasks, and should instead be used when building something that is complex and requires novel architecture.
Practical work from this experience:
- Ops 4.5 excels in everyday coding tasks.
- Depsec V3.2 is a budget alternative that provides decent output, although it requires some debugging effort to reach your desired results.
- GPT-5.2 (Standard) did not perform as well as OPUS 4.5, while GPT-5.2 (Pro) is probably better suited for complex reasoning than for quick coding tasks like this.
Feel free to replicate this test with the hints I shared above, and happy coding!
& nbsp
& nbsp
Natasa Seluraj A self-taught data scientist with a passion for writing. Natasa writes on all things data science, a true master of all data topics. You can contact him LinkedIn Or check it out YouTube channel.