Experiments With Vibe Coding: The Cert Prep Project

I can’t remember where I read this, but someone once wrote that there are four phases of a mathematician accepting new concepts and ideas:

“That’s wrong.”
“Well, it might not be wrong, but the old way is better.”
“Well, there are cases where the new way is better.”
“I’ve always done it that (new) way.”

Regarding vibe coding, I’ve run through phases 1 and 2. To be completely honest, I took pride in writing good code and reviewing it — especially in reviews, there’s always something new to learn, both for the reviewer and the reviewee.

Info

In case you haven’t come across the term: vibe coding means describing what you want in natural language and letting an AI agent write the code for you. You review the output, steer the direction, but you’re not writing most of the code yourself.

Vibe coding seemed to take that part of learning away from me — or so I thought. As it turned out, you can also use vibe coding as a tool to get rid of boilerplate and to learn about the domain in a more systematic way.

Vibe Coding Experiment: A Cert Prep Tool Link to heading

I’ve always wanted to build something more complex in Haskell. And ever since I started studying for cloud certifications, I had built a small TUI app in Python to sample practice questions from a question pool and simulate a certification exam (yes, some people might call this procrastination). It didn’t have all the bells and whistles, but it worked well enough. When I found out about the brick library for building TUIs in Haskell, I thought it would be a good learning opportunity to rebuild this tool in Haskell.

So I started by creating the project foundations with a simple cabal init, wrote the necessary types (questions, answers, configuration), some utility functions, and a few tests. I thought this setup would serve as a good guardrail for Claude Code — if you throw the AI agent into a project with good conventions, it’s more likely to follow them. Plus, Haskell’s strict and elaborate type system makes it harder to make mistakes because conditions, restrictions, and rules are already encoded at the type level. Contrast this with Python, where type hints are entirely optional (even though they are a firm part of modern Python code and libraries such as pydantic).

Liftoff with Claude Code Link to heading

Then I started prompting Claude Code to write the TUI app. In less than an evening, I had feature parity with my old Python app — and more: I also had stratified question sampling, so I could simulate the distribution of question types in the real exam.

Over the first few iterations, my skepticism (phases 1 and 2, remember?) faded almost completely. The code was clean, test cases were solid, and I got to learn about lenses, which proved to be an even bigger rabbit hole than the whole Monad machinery.

I quickly had a configuration registry to gather and choose between different question pools. Refactorings and reviews were equally fast.

Claude Code vs. Codex: Trophy System Link to heading

Because Codex had a free tier offer until March 2026, I decided to pit Claude Code against Codex for the shiniest feature so far: a trophy system for the preparation, with a small animation popping up when you earn one, similar to what happens in video games.

Here’s the prompt I gave to both:

1You are a well-versed Haskell programmer who breathes monads. You love the type system, you love abstraction and
2  self-documenting code.
3  Your task: Introduce a trophy system to the application. Preferably, it should be added as an extra phase between
4  answering / reviewing that checks whether the condition for a trophy has been satisfied.
5  Come up with a good type for a trophy and create a few instances. Trophies instances should be attached to configs
6  (i.e., I can win a trophy "5 correct answers in a row" for each config that I have). Come up with funny names similar
7  to trophies on PS.
8  In the app, each trophy should have a nice unique pixel icon. If possible, when achieving a trophy, this pixel icon
9  should be shown after the review phase, with a big trophy animation and the text "Trophy achieved: <trophy name>".

Claude Code’s implementation Link to heading

Claude Code’s implementation took around 16 minutes (around 10 minutes for planning, 6 minutes for coding) with Opus 4.6.

It added the data model and the trophy conditions to src/Types.hs which looked as follows (excerpt):

 1data TrophyId
 2    = FirstBlood
 3    | HatTrick
 4    | OnFire
 5    | SpeedDemon
 6    | FlawlessVictory
 7    | ScholarSupreme
 8    | Marathoner
 9    deriving (Show, Eq, Ord, Enum, Bounded, Generic)
10
11instance FromJSON TrophyId
12instance ToJSON TrophyId
13
14type EarnedTrophies = Set TrophyId
15
16data TrophyDef = TrophyDef
17    { trophyDefId :: TrophyId
18    , trophyName :: Text
19    , trophyDesc :: Text
20    , trophyIcon :: [Text]
21    }
22    deriving (Show, Eq)
23
24-- | Check for trophies earned after submitting an answer.
25checkAfterSubmit ::
26    Bool -> Int -> Int -> EarnedTrophies -> [TrophyDef]
27checkAfterSubmit wasCorrect newStreak questionSeconds alreadyEarned =
28    concatMap
29        (filter (not . (`Set.member` alreadyEarned) . trophyDefId))
30        [ [firstBloodDef | wasCorrect]
31        , [hatTrickDef | wasCorrect, newStreak >= 3]
32        , [onFireDef | wasCorrect, newStreak >= 5]
33        , [speedDemonDef | wasCorrect, questionSeconds < 5]
34        ]
35
36-- | Check for trophies earned at the end of an exam.
37checkAtFinish :: Int -> Int -> EarnedTrophies -> [TrophyDef]
38checkAtFinish finalScore totalQs alreadyEarned =
39    concatMap
40        (filter (not . (`Set.member` alreadyEarned) . trophyDefId))
41        [ [flawlessVictoryDef | finalScore == totalQs, totalQs > 0]
42        , [scholarSupremeDef | totalQs >= 10, percentage >= 90]
43        , [marathonerDef | totalQs >= 20]
44        ]
45  where
46    percentage :: Int
47    percentage
48        | totalQs == 0 = 0
49        | otherwise = (finalScore * 100) `div` totalQs

Deciphering the trophy conditions took me a while. At first I thought it was using guard clauses, but then I realized it’s using list comprehensions where the condition doesn’t depend on the list elements. For instance, [firstBloodDef | wasCorrect] is equivalent to if wasCorrect then [firstBloodDef] else []. Undeniably clever, but also fairly confusing — not to mention the extra work involved in unwrapping the list of lists with concatMap.

Codex’s implementation Link to heading

Codex was blazingly fast, clocking in at around 4 minutes with no dedicated planning phase.

It decided to put the trophy conditions into app/TUI/Event.hs

1isUnlockedBy :: ExamCore -> Bool -> Int -> Trophy -> Bool
2isUnlockedBy core answerIsCorrect responseSeconds trophy =
3    case trophyCondition trophy of
4        CorrectStreakAtLeast n -> core ^. correctStreak >= n
5        TotalCorrectAtLeast n -> core ^. score >= n
6        FastCorrectAtMost n -> answerIsCorrect && responseSeconds <= n

while keeping the data model in src/Types.hs:

 1data TrophyCondition
 2    = CorrectStreakAtLeast Int
 3    | TotalCorrectAtLeast Int
 4    | FastCorrectAtMost Int
 5    deriving (Show, Eq, Generic)
 6
 7data TrophyIcon
 8    = PixelRocket
 9    | PixelFire
10    | PixelCrown
11    | PixelBolt
12    deriving (Show, Eq, Generic)
13
14data Trophy = Trophy
15    { trophyId :: Text
16    , trophyName :: Text
17    , trophyCondition :: TrophyCondition
18    , trophyIcon :: TrophyIcon
19    }
20    deriving (Show, Eq, Generic)

While there’s an argument for putting code right where it’s called, I would have preferred keeping trophy conditions close to the data model. Otherwise, adding a new trophy means touching at least two files instead of one. That said, Codex’s approach is arguably easier to understand and makes good use of the lens machinery — at the cost of tightly coupling trophy conditions to the state model of the event logic.

My verdict Link to heading

I slightly preferred Claude Code’s approach of keeping the trophy conditions close to the data model. However, I strongly disliked the tricky list comprehensions — which is why I came up with my own variant that integrates trophy conditions directly into the data model.

I used DataKinds and TypeFamilies to divide trophies into ones checked after each answer and ones checked at the end of the exam.

 1data TrophyState = TrophyState
 2    { currentStreak :: Int
 3    , lastQuestionSeconds :: Int
 4    }
 5    deriving (Show, Eq)
 6
 7data FinalStatistics = FinalStatistics
 8    { nCorrectQuestions :: Int
 9    , nQuestions :: Int
10    }
11
12data TrophyCheckTime = WhenReviewing | WhenFinishing
13
14type family CondInput (t :: TrophyCheckTime) where
15    CondInput 'WhenReviewing = TrophyState
16    CondInput 'WhenFinishing = FinalStatistics
17
18data TrophyData (t :: TrophyCheckTime) = TrophyData
19    { trophyDef :: TrophyDef
20    , trophyCond :: CondInput t -> Bool
21    }

Then I defined conditions for each trophy separately, tied to the data model. Here’s the “First Blood” trophy as an example (earned after answering the first question correctly):

1-- Trophy definitions
2currentStreakGte :: Int -> TrophyState -> Bool
3currentStreakGte n ts = currentStreak ts >= n
4
5firstBlood :: TrophyData 'WhenReviewing
6firstBlood = TrophyData{trophyDef = firstBloodDef, trophyCond = currentStreakGte 1}

The check functions then become simple maps of trophyCond over the list of all trophies.

 1-- | Check for trophies earned after submitting an answer.
 2checkAfterSubmit :: Bool -> TrophyState -> [TrophyDef]
 3checkAfterSubmit wasCorrect ts =
 4    map trophyDef $
 5        filter
 6            ((&&) wasCorrect . flip trophyCond ts)
 7            [ firstBlood
 8            , hatTrick
 9            , onFire
10            , speedDemon
11            ]
12
13-- | Check for trophies earned at the end of an exam.
14checkAtFinish :: FinalStatistics -> [TrophyDef]
15checkAtFinish fs =
16    map trophyDef $
17        filter
18            (`trophyCond` fs)
19            [flawlessVictory, scholarSupreme, marathoner]

Conclusion Link to heading

My main takeaway is that getting up and running with Claude Code, even in a language you’re not entirely familiar with, is incredibly fast. More than that, I felt like I could learn relevant Haskell concepts (like lenses) much quicker than I would have on my own — or I might not have used them at all.

The trophy feature also showed me that extending existing software with vibe coding, while certainly producing functional results, will at some point lead to technical debt. The trophy conditions I showed above might seem like a small nuisance, but note that I haven’t shown you the event code for the trophy system: instead of introducing a separate phase for evaluating trophies, Claude Code integrated them directly into the answering and final phase of the exam. I haven’t gotten around to refactoring that code yet, but it’s definitely on my list.

So where does that leave me on the four-phase scale? I think I’m solidly in phase 3. There are clearly cases where vibe coding is better — spinning up a project, plowing through boilerplate, exploring unfamiliar libraries. But I still want to understand and reshape the code that comes out, and I don’t see that changing anytime soon. Phase 4 will have to wait.

If you want to take a look at the full code, I’ve published it on GitHub.