I used AI. It worked. I hated it.

25 minute read Published: 2026-03-30

I'm as anti-genAI as it gets. And yet, this past month, I have used generative coding to complete a project. It works. I hated making it.

The Apologia

Let's get this out of the way: my feelings about using generative models at all are...fraught. And if you are ready to call me a monster or a hypocrite right now, I understand. I'm navigating some tensions about this and I fully own that I may have made the wrong choices here.

My actual day job, the one that puts food on the table for my family, has metamorphosed into an "AI security expert" role, in which I am not only responsible for testing AI-enabled applications, but I am also expected to be an expert in their operation. I hope, perhaps naively, that standing between these applications and deployment, I can do whatever is possible to make them safer—and to say "no" as loudly as I can to ideas that are too dangerous for production. I can't do any of this without using and knowing these tools intimately.

I nevertheless recognize the societal and environmental harms posed by these tools. I want them to unexist. I even recognize the cognitive hazards to which I expose myself in their use (more on that later). I do not want to use them. And yet, I must understand them. If that damns me in your eyes, so be it.

That suffices, I hope, for the self-flagellation component of our proceedings.

The Project

I've been migrating The Taggart Institute off two commercial platforms—Teachable and Discord—at the same time. Doing so has required bending Discourse, our new platform of choice, into something a Learning Management System. Discourse is forum software, and it forums about as well as anything I've ever used. It's actually kind of fun to administer! And owing to its flexibility and extensibility Discourse mostly does what we need of a learning platform—with a few tweaks and remaining quirks. But one feature remained stubbornly difficult to reproduce: course completion certificates. Although our policy is to honor self-assessment, our community of learners has demonstrated their desire for these documents. Our primary source of new learners is LinkedIn, and LinkedIn people are always thrilled to announce their accomplishments. Do I think LinkedIn is the digital River Styx, where damned souls clamber over each other and claw at the boat passing overhead in the dim hope of salvation from those who have escaped the shambling horde? I do. But if we're all in hell together, we might as well try to lift each other up.

Teachable, which now costs an arm and a leg, keeps AI-ifying, and has a shocking security issue that I can't yet disclose (stay tuned!), has a very well-implemented certificate generator. That is one comfort I sacrifice in this move. No existing solution I could find met my needs exactly. But therein was an opportunity to build my own certificate generator—perhaps even an open source general solution that anyone who needs publicly verifiable certificates could use.

The challenge: I was already swamped with actual work, the other migration efforts, and a brand new project taking a lot of free time. Most importantly though, my free time is minimal because I'm trying to be a good dad to this amazing toddler. I can't code all day and night like I used to. Such is life.

So on the one hand, I have to understand genAI coding tools for work. On the other, here's this missing feature I need implemented to complete the TTI migration. I decided to test development using Claude Code for this project.

If it works, I'll have my certificate solution, I thought. If it doesn't, at least I'll know more about the technology and its implications.

Well, spoiler alert: it works. It's even, near as I can tell, reasonably secure. But good lord, building this way was miserable, even if it was faster than coding it all myself.

The Design

My idea was fairly simple: a webhook interceptor that received course completion details (student name, email, and course name), and generated a PDF certificate with a unique/verifiable ID. The certificate would be emailed to the user, and the cert itself would contain a QR code linking to a verification page on the app. This could be as simplistic as a Python Flask app that calls out to some shell scripts, or as complicated as...what it turned out to be. But I have to admit, in the planning process, the model's suggestions for creature comforts were rather appealing, so I decided to roll with them. If I decided I hated it, there was nothing stopping me from blowing all the code away and rebuilding a lighter solution.

The Process

Before setting off on this misadventure, I tried reading as much as I could about "best practices" in using AI coding tools. I'm purposefully avoiding the term "vibe coding," because that's not really what I was pursuing here. In fact, the single most common kernel of wisdom I received was to proceed in such a way that maximized your chances of keeping the model on-task and within expected parameters. That's not just about writing prompts, but building in such a way that the model's output at any given time has deterministic measures of success or failure, and that it keeps track of its own progress with external context.

Planning

Claude Code has a "Plan Mode" which is critical to how Anthropic themselves believe the tool is best used to build new projects. Essentially, the model's instructions shift from beginning with writing code to writing a plan for itself, which becomes a semi-permanent part of its context to guide changes. That's how I began every new feature in the project. I also prompted the model to output the plan to an external Markdown file tor future reference (context disappears over time). I also maintained a TASKS.md file that I used to track features already implemented and yet to come. In fact, almost all the hands-on-keyboard time for myself in this project was in Markdown syntax. I love me some Markdown, but it didn't feel great writing for a model as an audience instead of human beings.

Technologies

To nobody's surprise, I chose to build the application in Rust, but not just because I'm a Rust dork. I know the language well, so it would be easier for me to see mistakes, anti-patterns, and code smells. Moreover, it was my hope that Rust's built-in safety features (type safety, compile-time checks, robust testing) would aid in keeping the model on track. The frontend was built with Svelte, in part because I'd always wanted to do a Svelte project, and in part because I knew the mostly HTML syntax would be easier to debug than React or Next.js.

The PDF generation is probably my favorite part. I use Typst templates, accessed via the Typst API, to generate the PDFs. This makes it simple to change templates later. Also, I'm kind of becoming a Typst nerd?

For the model/coding agent, I used Claude Code with Sonnet 4.6. I've experimented a lot with Ollama and open-weight models, but I wanted the same experience as developers who are swearing up and down about this revolution in their work.

Methodology

To maximize determinism, each step of the build used test-driven development (TDD). Using the Markdown planning file as a starting point, the model generated tests for functions that would define the features, then implemented each in turn. After each coding round, cargo check and cargo test were run to confirm compilation and test passing. I reviewed every line of code the model generated. For initial drafts, very little had to change. Now to be fair, this is not a particularly complex app. It's basic CRUD app with some specialized requirements. Still, getting it all right, including auth and data handling, really mattered.

After the initial drafting phase, I went through the entire app and made a list of tasks for improvement/change in the codebase. This TODO.md became the new starting point for model context in plan creation.

Unexpectedly, as items were addressed in the document, the model updated the file with checkmarks and details of implementations. This was not an instruction I gave the model, but it was behavior I liked, since it created a trail of accountability.

After all the features I wanted were functional, context was cleared entirely and new instructions were provided to the model. Instead of acting as a software developer, I instructed the model to perform as a security auditor and secure code expert, finding vulnerabilities in the code and recommending remediations. The findings would be written to a FINDINGS.md file, keeping with our established "Plan, Document, Execute, Log" pattern established in earlier rounds.

Results

Here's the part you actually wanted. How did it all go? How did I feel about it?

Mad at Me? Read This

This is where we need to be grown-ups and entertain some seemingly contradicting ideas at the same time, okay? I'm going to be talking about what worked and what didn't, and how it all felt. What I am not doing is endorsing this technology or hand-waving away the significant legal and ethical issue with its use. My position on its danger to society has not changed, but my understanding of its capacity for software development has. I don't think fast code creation is worth the world, any more than wax fruit.

Okay, on to the results.

Functionality

Well, the thing works. The code is in production today, serving certificates for TTI. The only changes I made to the codebase were for elegance. The core logic was solid from the jump, owing I believe as much to Rust's safeties in development as to the model's capabilities.

You can review the code here. I intentionally put the link down here so interested readers were more likely to find it.

The application I ended up with is far more robust and feature-filled than what I would have built on my own. I have to acknowledge that. Audit logging, GDPR data deletion, cryptographic verification of uploads, optional HMAC for incoming webhooks—I probably would not have bothered with these for my little certificate generation utility. But their inclusion results in a more widely applicable tool, and one that I feel more comfortable using in this migration away from managed services. See, in sacrificing Teachable and Discord, I also take own the responsibility for legal compliance that they had previously handled. It didn't suck having those considerations come up from the model's planning. Because of the models' stochastic nature, it is possible for their output to inspire ideas you would not have arrived at yourself. That doesn't make them "good;" it's just a thing that can occur when using them. Again, don't mistake this for an argument in favor of their usage.

Development Process

How was it developing this code via the Claude Code loop? Miserable*.

I hated writing software this way. Forget the output for a moment; the process was excruciating. Most of my time was spent reading proposed code changes and pressing the 1 key to accept the changes, which I almost always did. I was basically Homer's drinking bird.

drinking bird from the Simpsons

It was so tempting to press 2: "Yes, and accept all changes for this session." Why wouldn't you? If you're accepting them all manually, what's the harm?

What's the harm? harm harm harm harm

Yeah, that's how you get got in this process. Once you stop scrutinizing the model's output, the probability something goes off the rails approaches 1. "Human in the loop" is necessary, but the current process itself makes the loop stultifying, and encourages the human to take themselves out of the loop. That process is straight up dangerous. The temptation is let it rip is always there, and I didn't even have a boss pressuring me to ship code.

Although I read each proposed change, knowing the codebase deeply was much more challenging. When I write a new application myself, I'm building an elaborate house of cards in my head, a gossamer structure of interlinked ideas and goals. It's a story I'm telling myself in code—and ultimately, a story I share with users.

In this case, I was the audience rather than the author. I had to back my way into understanding the code, carefully reading and understanding the structure after it had been built. This is much more common for developers who work on large teams or with codebases they didn't build themselves. I have not had as much experience with that kind of development, so this all felt a little awkward.

Awkward, but not impossible. I do know the code very well by way of careful reading the code, the relevant libraries' documentation, and the proposed changes during the code's creation. But that safety comes down to human discipline. It is entirely possible (probable?) to take the easy road and trust the model to do the right thing.

That way lies madness. The lack of real, systematic safeties in this process aren't so much a rake to step on as a rake to de-leaf a minefield.

Major companies are already triaging the results of reckless code deployment using these tools.

Speaking of things going wrong...

Hallucinations

Did the model hallucinate? Yes, albeit rarely and with self-correction. A handful of times it made up methods for a struct in one of the libraries or another. However, Rust's error messages from the LSP server and compilation checks coerced the model to recheck its work, leading to correct implementation. I did not intervene in this process. It took about five minutes per issue.

One time, during a security fix, the model's code introduced a non-obvious DoS vector. Well, obvious from the perspective of how the code would be deployed, but not from the code itself. That's exactly why reading each change was so important. Once the issue was pointed out, the model produced code that both addressed the security issue and avoided the DoS.

Security

I am quite glad I performed the security audit round (which was a part of the plan from the outset). The model found some whopper vulns—most of which, but not all, I had observed emerging in the codebase. The scariest were a path traversal vulnerability in the template management system, and a potential Typst template injection issue that, while not able to grant code execution, could potentially have resulted in a DoS.

I will also own up to missing a timing side-channel in the initial implementation of the Argon2 hashing function. Incorrect passwords failed quickly, whereas correct passwords took longer. The fix is constant-time checks. This is something the model discovered in the code that I would not have, given my unfamiliarity with Argon2. Now, I probably wouldn't have chosen it in the first place as my hashing algorithm, but I think this speaks more to my status as a Baby Cryptographer than any specific cipher. All this to say: the code is more secure because of the audit pass. That's a hard one to process, but it means that, independent of the ethical concerns regarding this technology, there is value to be found in the application of these tools in security contexts.

The CVEs they're finding are real. They can perform some kinds of static analysis well—and with some agentic pipelines, dynamic analysis is possible as well. They're not doing anything novel, but the speed and thoroughness possible can improve an application's security. The trick is deciding what to give to the model, what to lock to deterministic automation, and what to keep in the hands of human experts.

Now that I know this, is it reasonable or responsible to omit such a step in a development pipeline? I'm not so sure.

Once again, because I know this is going to be taken otherwise, I'm not excusing the negative externalities of the tech. The harms massively outweigh the benefits. I'm simply saying that right here in this domain of software development, something is working.

Lessons

I learned a lot from this process, and I'm deeply uncomfortable with much of it. But ignoring that discomfort helps no one. Let's lean into it.

It works (with lots of caveats)

The folks waving their arms and yelling about recent models' capabilities have a point: the thing works. This project finished in three weeks. Compare that to Ringspace, a similarly-sized project that took me about six months of nights and early mornings to complete, while not doing my day job or being Dad to an amazing, but demanding toddler. I simply could not have built this project as well or as quickly without help. And as other developers have noted, this is the help that's showing up.

I'm not entirely onboard with Mike Masnick's optimistic view of this technology's democratizing power. I don't think it's as easy to separate the tech from its provenance or corporate control. But CertGen, my certificate application, exists now. It didn't and couldn't without the help of a tool like Claude Code. Open source in particular needs to reckon with this, because the current situation of demanding developers starve and bleed themselves dry without support isn't tenable. We need to grapple with this. I'm not yet sure how it all breaks down, and anyone who says they do is lying, foolish, or fanatical.

The "works" in "it works" is scoped strictly to coding tasks. I have no evidence, and seemingly no one else does, that the same kind of success is available outside the world of highly structured language with deterministic outputs. More plainly: I have no reason to expect this technology can succeed at the same level in law, medicine, or any other highly human, highly subjective occupation.

The arguments against generative models would be much easier if their failure rate remained high across all disciplines. In this specific domain, it would seem large language models have found a successful niche. That's why OpenAI's pivoting to enterprise and coding tools. That's why coding assistants have experienced such wide adoption—well that and management diktat.

The help that showed up

I turned to generative models not only as an experiment, but out of desperation. I had a need for code that did not exist. Nobody was going to help me build it, nor should I expect help for a project such as this. In the past, I would have cobbled something quick-and-dirty together, probably at the expense of my mental and physical health to get it done. This time, I had another option. In this limited scope, the model was beneficial to all involved: myself, TTI's community, and my family. This does not negate the dangerous externalities of the technology at large, but I can only adopt so much of that responsibility as an individual.

I am privileged in that I can make these choices. Not everyone is so privileged. I am hesitant to condemn those who choose to accept the help that shows up, and in so doing fulfilling their obligations of care to those in their lives. Insofar as these models empower people to build something good, and do so without overmuch suffering on their part, we must reckon with the value proposition there. It is a benefit, but it does not eliminate the attendant harms.

Narrow is the safe path

Programming agents might work, but there are a lot of ways they can go wrong. The amount of guardrails necessary to keep the model in check doesn't scale. The more part of a process rely on generative output, the higher the potential for error and catastrophic failure. Any process that involves these models must strive to maximize determinism and minimize model variance.

For what it's worth, this is why I think Rust is a perfect fit for this kind of development (if not a straight-up requirement). The safeties inherent in a well-considered Rust development process can help keep a generative model on track. Other languages have similar safeties, but none, in my opinion, as rigorous.

But even then, the ways mistakes can start to compound on complex projects make this a dangerous proposition. This all worked with my small project, but on a bigger one, which more dependencies or more complex project structures would likely flummox the models.

Brain drain

There's a fundamental problem with these tools beyond the capacity of any deployment strategy to solve: the tool requires expertise to validate, but its use diminishes expertise and stunts its growth. How does one become an expert? There are no shortcuts; there is only continuous hard work and dedication. I was once told of writing, great writers learn how to break the rules in new and ingenious ways by first learning the rules.

But how is a new developer meant to learn the rules if their day-to-day work is nothing but the babysitting of models? How will they gain the hard-won experience that allows a human in the loop to be a useful safeguard?

As I felt myself bored to tears in this process, I realized that if this is what becomes of software development, not only will it be a terrible occupation, it will be one that eats its young.

I have no solution for this. The tool, as long as it exists, will represent a quick and cheap answer to shortsighted organizations. No policy or procedure will prevent over-reliance on it. Its mere existence is temptation enough.

The itch

I let this thing into my brain, and now it is always there. For any new potential project, there is a voice in my head telling me how much easier it would be to let the model do it. How much faster it would be to simply describe the objective in a prompt and let go.

I do not want to let go, but I recognize the power of this pull. Feeling this for myself has only reinforced my belief that these models constitute an addictive substance. They alter cognition in ways deleterious to human prosperity. In other words, for as much output as they provide, they take something important from us.

The reckoning

Or maybe that's just me. I've been writing code for a good chunk of my life now. I find deep joy in the struggle of creation. I want to keep doing it, even if it's slower. Even if it's worse. I want to keep writing code. But I suspect not everyone feels that way about it. Are they wrong? Or can different people find different value in the same task? And what does society owe to those who enjoy an older way of doing things?

If I could disinvent this technology, I would. My experiences, while enlightening as to models' capabilities, have not altered my belief that they cause more harm than good. And yet, I have no plan on how to destroy generative AI. I don't think this is a technology we can put back in the box. It may not take the same form a year from now; it may not be as ubiquitous or as celebrated, but it will remain.

And in the realm of software development, its presence fundamentally changes the nature of the trade. We must learn how to exist in a world where some will choose to use these tools, whether responsibly or not. Is it possible to distinguish one from the other? Is it possible to renounce all code not written by human hands? And if it were, is that reasonable?

The original sin

We come now to the inconvenient truth of this technology: that it is built, like so much "progress," on theft. The training corpora of these models includes code with licenses not meant to be used in this way. Even if one could guarantee that copyleft code were not included in output, the entire system of weights and tokens is inexorably linked to copyright infringement. There is no escaping this. To call it theft is accurate in my opinion, but then I'm a bigger believer in copyright than many in my circles. What is the appropriate response, and by whom? How do we respond to the theft of others whose accidents are visited upon us? I write this on the stolen, unceded land of the Chumash and Tongva peoples. I do what I can to remember that, acknowledge that, and teach others what I know of those cultures. I have no idea how to mitigate the harms of the wholesale theft of intellectual property that gave birth to large language models.

I also don't know what to do about the destructive extraction mining that sourced the minerals making up my computer. These human harms are almost surely greater than the theft of writing, yet I am happy to ignore them. I mention this not to wave away the wrongs, but to recognize that all my technology is bloody. I don't know how to remove myself from the entire system in such a way that my hands are clean. I don't know that anyone can do so in the interconnected age.

Maybe the heist of intellectual property is the line you won't cross. That is a fair line! I am unconvinced it must be everyone's. I'm unconvinced the mere usage of these tools as an individual makes one a monster.

The real monster

As much as I fear the fallout of this technology, I fear the fallout of ideological purity even more. Time and again, people fall victim to the transformation of a stance on an issue into a holy cause, a flag to rally behind, a group from which to exclude The Other. Purity is a dangerous idea—historically, more dangerous than technology's capacity to change labor. Indeed, purity is weapon used to divide labor against each other (see: race vs. class in the United States, 1865 - present).

The real monster is not the homunculus, but the one who gave it life. The fight cannot be among laborers who are all threatened by this technology. The fight must be between the workers who wish to work, create, live, and prosper, and the elites who only seek to enrich themselves by means of this technology. If generative models are monstrous—and I remain convinced they are—its masters are the enemy of those who wish to end its march.

I am tired of tearing each other down over the tools we use. I am tired of running from one corner of technology to the next, and the next, and the next, in service of proving how much I care about the ethics of my technology. I want to spend my time and energy building the future I want to see with and for the people I care about. I don't particularly like this tool, and I truly believe in its societal danger. I still think it's addictive; I still think the ways it goes wrong are far more likely than the ways it goes right. I found a specific scenario in which it did go right.

In this exploration, I have come to understand why one would use it to accomplish a specific goal, and how it can be successful in doing so. I don't think condemning people for its use people is that helpful to anyone, or to whatever cause you're fighting for so fervently that condemning someone seems worthwhile.

These models, like the people who use them, are not one thing. They look different in different lights, held at different angles. Dangerous, awesome, seductive, effective, beneficial, corrupted—all this and more. And we of course are all sinners. The only way I have found to survive this world is together, reaching out with grace and understanding to those who do not see the world as we do, who do not act as we do. It is much harder than anger and condemnation. That's part of why I'm sure it's the right thing to do. It also feels a lot better to reach out a hand than strike with closed fist.

Will I use this tool for my code again? I don't really want to. If I do, it will be clearly marked and following the safety principles I have learned and will keep learning. But I will not dismiss or castigate anyone for this choice alone.