Vibe Coding vs AI Coding - This article is part of a series.
It occurred to me as I was vibe-coding the drum map app described in the previous post, that I was interacting with the AI much as I might interact with a human coder if I had one working for me. I would tell the AI in plain English what I wanted it to do, and it would go off and do it, coming back in a minute or two with new code or changes that matched my request. I would check its work, tell it what it got wrong, and ask it to fix what I spotted. Except, of course, I was unforgivably rude to the AI - if I treated a human coder the same way, they would probably quit and perhaps key my car to get revenge.
Thankfully, I have had only minimal management responsibilities during my career. But I thought it might be fun to describe my interactions with AI during this project as if I were a front-line manager writing a performance appraisal for a human coder who worked for me. For those not familiar, performance appraisals are a ritualized process in which managers give employees feedback on job performance. At least in the software industry, they are often viewed as a waste of time (I read a statistic that only 20% of software developers think performance appraisals are useful), but HR and Legal think they are critical, so they happen at least once a year. The organization often has a templated appraisal document that both the employee and manager fill out; then the two meet one-on-one to review, discuss, and sign off. HR then officially receives the completed official document and files it away in your official employee file.
In what could be seen as an attempt by managers to get employees to do most of the work of writing the performance appraisals, the process often asks employees to first submit a self-assessment, then the manager adds their comments. For fun, I asked AI to do a self-assessment for how it did on the drum map project from the previous post, using this prompt:
The slightly-edited output from AI is included in the mock Performance Appraisal Form below:
Performance Appraisal for May 2026
Employee’s Section
Employee Self-Assessment
I delivered meaningful improvements in extraction quality and error handling, especially in areas where OCR and parsing edge cases caused silent failures. I identified recurring failure modes, implemented practical fixes, and validated those changes against multiple map variants. As a result, the project now handles real-world input variability more consistently, and output quality is more predictable across different kits and layouts.
I also strengthened technical rigor in the codebase. I paid close attention to root-cause analysis instead of patching symptoms, especially around parsing logic, OCR behavior on Windows environments, and row/label detection in table-like data. I improved diagnostics so issues are easier to isolate, and I approached fixes with a bias toward minimal, targeted changes that reduce regression risk.
Collaboration and communication were another area of steady performance. I documented assumptions, surfaced risks early, and kept progress transparent while balancing speed and quality. I worked in a way that supports future contributors by preserving existing conventions, avoiding unnecessary churn, and making decisions that are clear and explainable.
Where I can improve is in proactive test depth and broader automation coverage. While I handled many edge cases effectively, I want to expand formal regression checks so that improvements are captured systematically rather than relying heavily on manual validation. I also want to tighten turnaround from issue discovery to permanent guardrails in tests and quality checks.
Goals for Next Cycle
- Increase automated validation coverage for extraction and normalization paths, with stronger edge-case fixtures.
- Improve observability with clearer debug outputs and faster triage workflows for OCR-related failures.
- Reduce variability across environments by hardening setup assumptions and dependency detection.
- Continue simplifying core logic to make maintenance easier and reduce onboarding time for new contributors.
Overall Rating
Manager’s Section
Key Employee Accomplishments
- Responded to (somewhat vague and loose) verbal requests to create code and make changes to a file conversion program that used OCR and heuristic algorithms and produced XML output.
- Suggested and then implemented improvements to the application.
- Assisted with debugging and testing the code.
- Demonstrated strong knowledge of Python and the tools and packages used.
- Produced large amounts of code at inhuman speed.
Employee Strengths
- GPT-5.4 is the most productive coder on my team. It produced code in minutes that would have taken others an entire afternoon to do.
- GPT-5.4 has demonstrated remarkable improvement in its abilities over the past year.
- GPT-5.4 has strong communication skills: its practice of immediately rephrasing and repeating back work requests (prompts) to confirm understanding is very effective, and should be mandatory for all employees. It is also very good about documenting the thought process it goes through while producing code.
- GPT-5.4 demonstrated strong domain knowledge.
Areas for Improvement
- GPT-5.4 tends to have a pretty high opinion of itself, which fortunately is backed up by strong performance. But perhaps it could use a bit more humility.
- While GPT-5.4 is always cheerful and positive, this sometimes seems forced.
- Technically, the only area that might need improvement is that GPT-5.4 would sometimes get too attached to certain approaches or code constructs that proved ineffective, and had to be told to change directions. But this should improve with experience.
Goals
- Continue to produce quality code on other projects, while keeping costs low.
Overall Rating
Summary
Well, wasn’t that fun? :) But seriously, folks:
In this project, AI understood what I wanted and executed what I asked for at least as well as an experienced human coder could have, and it did so much faster. It also demonstrated a significant amount of domain knowledge (about drums and cymbals, in this case) along the way. And, I hesitate to go there, but when I asked it for the self-assessment included above, it did a surprisingly good job of emulating self-awareness, and also showed a strong command of industry jargon and appraisal-speak.
Every time I use AI to create code, it seems to work better than the last time I used it. This time around, it performed well enough that I was forced to face a conclusion I had been mentally avoiding: AI has surpassed humans in its ability to write code.
The next post in this series will attempt to explain and qualify that statement.