Milhan Kim
Hi, I’m a Senior Software Engineer crafting high-performance labeling systems for autonomous driving.
I explore software research and practice, writing and analyzing systems. My interests span philosophy, psychology, management, and of course machine learning.
Here I share stray ideas, findings, and thoughts. For astrophotography musings, visit my personal blog at https://milhan.lol.
Recent Uploads
-
Creativity, Cost, and Context: Competing with AI as a Software Engineer
From talk to senior students at CAU, Seoul, 2024-11-19.
Introduction
Not long ago, artificial intelligence was the secret cheat-code for students hoping to pass coding assignments. Today, in an ironic twist, that same AI has become a reason some companies hesitate to hire those students at all. If an AI co-pilot can generate decent code in seconds, why bring on a junior developer who might take days? This shift raises a fundamental question for anyone in software engineering: What is creativity, and what does it mean to be valuable as a human engineer when AI is so fast, accurate, and scalable?
In this article, I reflect on the changing power dynamics of knowledge, experience, and productivity in the age of AI. We’ll explore sharp arguments around creativity (hint: it’s not chaos), examine why novelty is not noise, and argue that great engineering is as much about economics and context as it is about code. The goal is to provoke deeper thought on how to compete with and not just against AI. This perspective comes from my experience as a senior engineer and interviewer who has watched AI reshape our industry’s expectations for entry-level and experienced roles alike.
Knowledge in the Age of AI: The Shift in Power Dynamics
Knowledge used to be power—seasoned engineers held deep expertise and juniors learned by slowly absorbing it. Now, much of that explicit knowledge is ubiquitously accessible. Ask a question on Stack Overflow or feed a prompt to an LLM, and you might get an answer in seconds that previously required tapping a senior colleague or spending days in documentation. Today, a motivated junior can quickly learn how to do something via AI or online resources without leaning on team veterans for every detail.
Yet this democratization of knowledge doesn’t automatically level the playing field. In fact, it changes the power dynamics in unexpected ways. Experience still matters, perhaps even more, because experience isn’t just knowing facts – it’s knowing context, what not to do, and why things are the way they are. A large language model can regurgitate best practices or API usage, but it won’t instinctively know which practices actually fit your situation. This is where experienced engineers leverage AI as a force multiplier. Many senior engineers tell me that with AI-assisted coding, they feel like they’re working with a squad of junior developers on demand, but without the overhead of mentoring each one from scratch. In contrast, a less experienced developer might generate code easily with AI but struggle to validate or integrate it, because they lack the big-picture understanding.
The result is a shifting landscape of productivity. A senior engineer wielding AI can potentially outproduce an entire junior team on raw code output. Management notices when one veteran with Copilot delivers what used to require a handful of entry-level devs. This raises the bar for what a human newcomer must contribute. It’s not that knowledge itself has lost value—it’s that everyone has it now. What differentiates engineers is moving up Bloom’s taxonomy: not just remembering or understanding facts (AI does that), but applying, analyzing, and evaluating in context. In short, human engineers increasingly prove their worth not by the knowledge they carry, but by how they use knowledge creatively and contextually.
Creativity is Not Chaos (and Novelty Is Not Noise)
Let’s address the buzzword that everyone clings to as their saving grace: creativity. We often reassure ourselves that humans will remain relevant because we’re creative and AI is just combinatorial. But creativity in engineering is often misunderstood. Creativity is not just randomness or wild experimentation; true creativity is novelty with purpose. In other words, a creative solution is one that is both original and valuable in solving a problem. Novelty without value is just noise.
AI, under the right conditions, can surprise us with seemingly creative outputs. Remember DeepMind’s AlphaGo, which played Go in ways no human had seen, or GPT-based models that refactor code into an ingenious one-liner? These cases show that given a well-defined problem and vast training, an AI can produce strategies or code that appear innovative. However, there’s a caveat: AI’s “creativity” is constrained by the data and goals we give it. AlphaGo was creative within the fixed rules of Go – it wasn’t inventing a new game, it was discovering unanticipated moves to achieve the explicit objective of winning. In software, an AI might generate a novel implementation of a feature, but only within the bounds of patterns it learned from existing codebases and the requirements we specify in the prompt.
Human creativity in engineering starts from a different place: the ambiguity and messiness of the real world. We often have to define the problem itself, not just solve a given one. That’s where true innovative leaps happen – questioning the problem, reframing requirements, or merging ideas from different domains. A human engineer might recognize that a user’s need could be met with a completely different approach than initially imagined, or that a small change in assumption opens up an elegant solution. This kind of creativity—finding valuable new questions to ask or novel approaches that aren’t in any playbook—is where humans excel. It’s not chaotic ideation for its own sake; it’s insight born of understanding context and purpose.
So yes, AI can generate a hundred variations of a login form or suggest an optimization trick it pieced together from thousands of GitHub projects. But deciding which of those variations truly serves the users, or imagining a solution that isn’t in the dataset at all, remains a human strength. As engineers, we shouldn’t retreat to a vague notion of “creativity” as just artistic originality. We should hone a creativity rooted in deep awareness—what one might call informed imagination. It’s the kind of creativity where you propose a design that’s never been tried, but you can reason about why it just might work. That’s not something an autocomplete-style AI achieves easily, because it requires stepping outside of established patterns. Our value lies in creative thinking that is grounded in reality, not in churning out random new ideas hoping one sticks.
Engineering is Applied Economics, Not Idealized Science
Another provocative statement: engineering is not science. This isn’t to downplay engineering’s technical rigor, but to highlight that engineering success is measured in outcomes and trade-offs, not just technical perfection. In many ways, software engineering is the art of making optimal compromises under real-world constraints. It’s where computer science theory meets the unforgiving realities of budgets, timelines, and maintenance costs. In short, engineering is applied economics.
What does this mean in the context of AI? It means that the best code isn’t necessarily the most elegant algorithm or the one that uses the fanciest new framework—it’s the code that appropriately balances cost and benefit for the problem at hand. An AI, however, doesn’t have an innate sense of economics or context; it will happily generate a complex solution that technically meets the requirements you typed in, even if a far simpler approach would suffice (or even if the best solution is to write no new code at all). How often does an AI tell you “actually, you don’t need to build this feature” or “maybe we can solve this with a configuration change instead of code”? Virtually never. It’s optimized to produce something when asked, whereas a seasoned engineer knows that sometimes the smartest engineering decision is to not write new code (for instance, reusing an existing tool, simplifying a requirement, or just avoiding a risky feature altogether).
In practice, human engineers bring an economic lens to decisions:
- We consider cost of complexity: If AI suggests a clever but convoluted architecture, we weigh the long-term maintenance burden versus the immediate gains.
- We think about return on investment: Is implementing this feature going to bring enough value to justify the engineering effort? AI isn’t going to proactively raise that question.
- We align solutions with business goals and constraints: A solution that’s technically optimal but financially or organizationally impractical is no solution at all. Engineers often have to say, “This approach is overkill—here’s a simpler alternative that meets our needs at a fraction of the cost.”
This economic mindset is where human judgment outshines AI’s pattern-matching. As an example from my own career: I’ve seen AI-generated suggestions that would indeed solve a given coding problem, but would also introduce an external library with a heavy license fee or add infrastructure that our small team couldn’t realistically support. The AI had no way of knowing those external constraints, because they weren’t in the prompt. A senior engineer, on the other hand, is constantly evaluating the broader picture—Does this make money? Does it save money? Does it carry hidden costs or risks? Those are questions at the intersection of technology and economics. In the future, the engineers who thrive will be the ones who treat every technical decision as a business decision too. They’ll use AI to generate options, certainly, but then apply a keen filter of cost, risk, and benefit to choose the path that makes the most sense. That’s something an “idealized science” perspective often misses—engineering is about finding a good-enough solution that maximizes value, not pursuing an ivory-tower ideal regardless of expense.
Patterns vs. Consequences: Code in Context
Perhaps the biggest gap between what AI does and what human engineers do lies in context. When an AI code assistant writes code, it’s fundamentally pushing symbols that fit patterns it learned from training data. It treats code as patterns to complete. Human engineers, by contrast, must treat code as something that has consequences. Every line we write eventually runs on real systems, interacts with other code, and lives within an organization’s ecosystem of tools, policies, and obligations. We can’t afford to see code as context-free.
Consider some of the real-world contexts and consequences that humans must account for, which AI typically ignores unless explicitly told:
- Legal and Licensing: If ChatGPT suggests using an open-source library under a viral license (like GPL) and you blindly include it, you might be putting your company’s proprietary code at risk. AI won’t warn you about software licenses or patent encumbrances unless you ask. Engineers have to know what’s permissible or run it by legal teams.
- Security and Privacy: AI can generate a login implementation or a data processing script, but it won’t inherently know your organization’s security policies or privacy regulations. Is the code storing personal data securely? Is it exposing a vulnerability? These concerns require contextual awareness that comes from experience and often from humans in the loop (think of all the internal security guidelines AI has never read).
- Infrastructure and Scalability: Code doesn’t run in a vacuum. An AI might produce a solution that works for 100 users, but will it work for 100,000? Engineers understand the infrastructure context—database load, network latency, memory constraints—and design solutions that won’t fall over in production. AI will happily return an O(n²) algorithm if it’s common in its training data, not knowing that for your use case n is a million and that solution will timeout. Oh but wait, are we actually happy with O(n!) algorithm for now?
- Organizational and Cultural Constraints: Every company has unwritten rules and conventions. Maybe there’s an approved tech stack, or perhaps using a certain cloud service is off-limits due to prior bad experiences. AI doesn’t read the room; it won’t know that your team’s DevOps folks hate that one library it keeps suggesting, or that half your codebase is legacy for a reason (e.g. regulatory compliance). Human engineers navigate these soft constraints daily.
In my own work, I’ve encountered situations where an AI-generated code snippet technically solved the immediate problem but introduced a downstream headache. One memorable case: an AI suggested using
ffmpeg
for video processing in an embedded system. Sure, it worked—until our compliance team flagged it because including that library would oblige us to open-source parts of our product (due to LGPL licensing). The AI had no clue; it just saw a pattern that others have used. It was my job to foresee the consequence of that choice. In another instance, a junior engineer used an AI to generate a data analysis script that worked on a sample dataset. When we ran it in production, it crashed the server – the approach couldn’t handle our real data scale. No surprise: the AI didn’t have our production context, it just provided a generic solution.The takeaway is that code is only as good as the context considered. Humans remain the stewards of context. We live with the code after it’s written. We debug it at 3 AM when something goes wrong (and debugging itself is an exercise in reconstructing context: logs, state, user behavior – things you can’t always stuff into a prompt). We maintain it months or years later, when business requirements change or when a new law compels a rewrite of some module. AI is a phenomenal pattern machine, but it has no skin in the game for the consequences. As a software engineer aiming to stay valuable, one of your superpowers is to be the guardian of context – to ask “what then?” for every suggestion the AI gives, and to consider aspects that aren’t explicitly in the spec but are very real (like security, scalability, legality, and interoperability). This ability to internalize the broader context and foresee consequences is a distinctly human form of diligence that complements AI’s raw coding ability.
Will AI Replace Junior Engineers? It’s Complicated.
A common refrain in industry talks and LinkedIn posts these days is: “AI will replace junior developers.” There’s truth in that statement, but it’s an oversimplification that deserves a closer look. Many routine programming tasks that used to be the proving grounds for entry-level engineers can now be automated or accelerated by AI. Need to write a boilerplate CRUD API, a unit test suite for a model class, or a script to parse log files? An LLM can crank those out faster than a new hire ramping up on the codebase. It’s no wonder some engineering managers muse that they can hire one less junior dev if they budget for an AI tool subscription instead.
The economics are stark: an AI coding assistant might cost on the order of $20 per month, while a junior engineer can cost $8,000–$10,000 per month or more when you factor in salary and overhead. A seasoned engineer using AI might achieve productivity that previously required a team – effectively doing the work of “ten juniors” (as some have quipped) at a fraction of the cost. When viewed purely through the lens of efficiency and cost, it’s hard to argue against the idea that some traditional junior tasks are getting automated away.
However, declaring the death of junior engineers is too simple of a narrative. For one, today’s juniors are tomorrow’s seniors. If we stopped hiring and training new engineers, the pipeline of talent and leadership would quickly dry up. Companies are aware of this; completely forgoing early-career hires is a short-term gain but a long-term risk. More realistically, the role of the junior engineer is evolving rather than disappearing. Instead of being hired to crank out trivial code (which AI might handle), new engineers might be expected to focus on higher-level contributions sooner: think integration, testing, creative problem solving, and coordination. In other words, the bar is rising. Juniors may need to bring something extra to the table – whether it’s a specific domain knowledge, exceptional debugging skills, design sensibilities, or simply a knack for using AI tools effectively to amplify their impact.
It’s also worth noting that there are limits to replacement. An AI can’t attend stand-ups and spontaneously report that a requirement doesn’t make sense from a user’s perspective. It won’t mentor the next intern or contribute to the company wiki with a tutorial on the internal deployment process. Humans in a team do more than just write code – they share knowledge, catch misunderstandings, and often bring soft skills that glue a project together. Junior engineers, in particular, often question assumptions (sometimes out of naivete) and that can be healthy for a team that has grown set in its ways. The perspective of a human newcomer can highlight issues an AI, optimized to follow instructions, would never raise.
From my vantage point as someone involved in hiring, I have indeed seen a shift. In take-home coding tests, it became apparent when candidates started using AI to generate answers. Instead of throwing up our hands, many of us adjusted the tests: we now design them to require a bit more end-to-end thinking or creativity, something not easily found in an LLM’s training data. We might ask for a design proposal along with code, or pose problems that span multiple domains (e.g. “simulate a hardware device and build an API for it” – something that forces understanding of both low-level and high-level concerns). The goal isn’t to catch juniors using AI (we assume they will, and even encourage it as a tool), but to see how they add their own insight beyond what AI provides. Can they critique the AI’s output and improve upon it? Do they understand the why, not just the how?
So will AI replace juniors? It will certainly replace some of the work juniors used to do, and it may reduce the number of entry-level positions in some areas. But I see it as a call to action for up-and-coming engineers: the easier it is for AI to do average work, the more you should strive to demonstrate above-average thinking. Rather than churning out another generic to-do app for your portfolio, contribute to an open-source project, or tackle a problem that required digging into a new field (maybe a bit of hardware, maybe a dash of machine learning, maybe a tricky algorithm that isn’t well-covered on Wikipedia). Show that you can synthesize ideas and learn in public – for example, write a blog post about how you solved a tough bug or improved performance in an unexpected way. These are signals that you’re not just typing code that any AI could generate, but bringing unique value through insight and initiative.
Finding Your Edge: Creativity, Cost, and Context
For a software engineer facing the AI era, the guiding question becomes: What can I do better with a human mind than an AI can do on autopilot? By now, a theme should be clear. The answers revolve around creativity, cost, and context – the very elements highlighted in this article’s title.
- Creativity: Cultivate a problem-solving approach that isn’t just about writing code that works, but about envisioning solutions that others (including AI) wouldn’t think to try. This could mean drawing on analogies from other industries, questioning the problem framing, or experimenting with a new architecture. Creativity in engineering is a skill you build by staying curious and cross-pollinating ideas. Don’t be the coder who only knows how to follow recipes – be the one who occasionally invents a new recipe.
- Cost-awareness: Always tie your technical decisions to the real world. This doesn’t mean you need an MBA, but it does mean thinking like a product manager or an architect. Why are we building this? Is there a simpler way? What’s the maintenance cost? If you develop a habit of considering the ROI of your choices, you’ll make yourself indispensable. Teams need people who prevent over-engineering and keep efforts aligned with value. AI can suggest ten ways to do something; your job is to pick the one that makes the most sense for the business and justify why.
- Context mastery: This is about depth and breadth. Be the engineer who delves into understanding the domain you’re working in (whether it’s finance, healthcare, automotive, etc.), because that domain knowledge becomes context that informs better software. Also, pay attention to the ecosystem around your code: the devops, the legal implications, the user’s perspective, the data flows. The more context you carry, the more you can anticipate issues before they arise. AI is a quick study of documentation, but it has zero intuition – it won’t get a hunch that “something about this requirement feels off” or that a certain use case might break the design. Your awareness of context gives you that intuition.
I’ll share a personal anecdote that ties these elements together. Not long ago, I interviewed a candidate for a mid-level engineering role. On paper, she relatively was inexperienced, and I suspected she likely used AI assistance in the take-home exercise we had given (which was fine by us). What stood out was how she described her solution: she identified a subtle performance bottleneck in an open-source library our problem used, and she talked about how she tweaked the usage to lazy-load certain components, improving throughput by a significant percentage. This wasn’t in the prompt; it was a tangential discovery she made while testing. To me, it demonstrated creative problem-solving (she went beyond the obvious requirements), cost-awareness (she cared about performance and efficiency), and context understanding (she dove into the library’s behavior to see why the bottleneck happened). That was an immediate “hire” recommendation from my side, even though she had no prior industry job. Why? Because those are exactly the traits that AI won’t give you out-of-the-box but are incredibly valuable in an engineer.
Conclusion
AI is here to stay, and it’s changing our field much like electricity once changed manufacturing. As software engineers, we have a choice: wield the new tools wisely or be outpaced by those who do. Competing with AI doesn’t mean trying to race it on brute-force output; that’s a losing battle. Instead, it means augmenting our human strengths – creativity, judgment, contextual understanding – with AI’s speed and scale. It means embracing the irony that the cheat-code of yesterday is the standard equipment of today, and adjusting our game accordingly.
Being a valuable human software engineer in the era of fast, accurate, and scalable AI comes down to being a creative, cost-conscious, context-aware engineer. It’s about seeing the whole chessboard, not just the next move. AI will undoubtedly get better, and the ground will keep shifting. But if we focus on the timeless aspects of engineering – understanding problems deeply, crafting solutions that make sense in the real world, and continuously learning – we won’t just survive alongside AI; we’ll thrive, with AI as a powerful ally. In the end, the question isn’t human or AI, but rather how each amplifies the other. The engineers who figure that out will lead the way in this new age of software development.
-
AI4SE and the V-Model; The case of Shoot-and-forget BDD
Introduction
Software engineering is undergoing a paradigm shift as AI for Software Engineering (AI4SE); particularly large language models (LLMs) enters the development lifecycle. Nowhere is this more evident than in the transformation of the traditional V-model of system and software development.
The V-Model
Leon Osborne, Jeffrey Brummond, Robert Hart, Mohsen (Moe) Zarean Ph.D., P.E, Steven Conger ; Redrawn by User:Slashme. - Image extracted from Clarus Concept of Operations. Publication No. FHWA-JPO-05-072, Federal Highway Administration (FHWA), 2005
The V-model is a classic software process that emphasizes a rigorous, sequential relationship between development phases and corresponding testing phases. Each stage of requirements or design on the left “wing” of the V has a mirrored verification or validation step on the right wing, culminating in system validation against the initial requirements. This model promotes upfront planning and traceability between artifacts, but it has also been criticized for rigidity and late discovery of defects. Today, AI-driven tools are reshaping this model—making testing far more iterative and integrated, and enabling non-technical stakeholders to actively participate in creating technical artifacts.
Each development phase on the left side:
- requirements
- analysis
- system design
- architecture design
- module design
- coding
has a corresponding testing phase on the right:
- unit testing
- integration testing
- system testing
- acceptance testing
%%{init: {'theme':'base','themeVariables':{ 'primaryColor':'#DCEFFE', 'primaryBorderColor':'#76B3FE', 'edgeLabelBackground':'#FFFFFF', 'tertiaryColor':'#F0F8FF' }}}%% flowchart TD R(Requirements):::left SD(System Design):::left AD(Architecture Design):::left MD(Module Design):::left IM(Implementation):::left R --> SD --> AD --> MD --> IM UT(Unit Test):::right IT(Integration Test):::right ST(System Test):::right AT(Acceptance Test):::right IM --> UT --> IT --> ST --> AT --> R classDef left fill:#DCEFFE,stroke:#76B3FE,color:#034694; classDef right fill:#DCF8E8,stroke:#34BA7C,color:#0B4F3C; class R,SD,AD,MD,IM left; class UT,IT,ST,AT right;
The V-Model in a linear view.
This model enforces strong traceability and planning for verification and validation, but follows a linear, sequential flow.
In this article, we analyze how AI4SE is transforming the V-model, with a focus on the economics of black-box testing and on cross-functional collaboration. I apply principles from management theory and game theory to understand shifts in team dynamics, knowledge asymmetry, and incentives. The result is a vision of development where behavior-driven testing is continuous (not just an end-phase activity) and product managers (PMs), product owners (POs), technical program managers (TPMs), scrum masters, and other non-engineers can directly shape and verify the software. The goal is to provide a thought-leadership perspective on these changes for a technically literate, managerial audience.
The Traditional V-Model and Its Limits
The V-model (Verification and Validation model) has long been used to structure system development. It visualizes a project in a V-shape: moving down the left side for definition and build phases, then up the right side for testing phases. For example, requirements are defined at the top-left and validated via acceptance testing at the top-right; system design is verified by integration testing; module design by unit testing, and so on. The strength of this model lies in clear verification steps tied to each specification stage and in early planning of tests (even during requirements analysis, one plans the acceptance tests). This ensures that testing isn’t an afterthought and that each requirement is eventually validated.
However, the V-model is essentially a linear lifecycle. It assumes that if you plan well and follow the sequence, you’ll catch issues in the corresponding test phase. In practice, this rigidity has drawbacks. Changes late in the process are costly, and misunderstandings in requirements might not surface until the final validation. There is little room for iterative refinement or unplanned exploration; everything follows a predetermined plan.
As management theorists like Peter Drucker and W. Edwards Deming have noted, such heavy upfront planning and hierarchy can falter in fast-changing environments. The traditional model can lead to a “frozen middle,” where feedback and innovation slow down. In an era where requirements evolve rapidly and quality needs to be assured continuously, the pure V-model feels inflexible.
Another issue is knowledge asymmetry between roles and phases. In the classic setup, business stakeholders define requirements and testers verify them, but only engineers truly understand the system internals during development. This often creates communication gaps or even power imbalances; engineers become gatekeepers of technical knowledge, and non-technical team members must largely trust their judgments until tests validate the outcomes.
In economic terms, this resembles a principal–agent problem: those who own the product vision (principals) rely on those who implement it (agents), but have less information about the technical work. The agent (developer) has more information and may act in self-interest (e.g. saying a feature is “too hard” or deferring tests) while the principal lacks visibility. The incomplete and asymmetric information allows an agent to act opportunistically in ways that diverge from the principal’s goals. Traditional processes tried to counter this with documentation, sign-offs, and structured testing, but the information gap remained.
flowchart LR subgraph "Sprint1" P1["Stories & Requirements"] D1["Design & Implementation"] T1["Test & Review"] P1 --> D1 --> T1 --> P1 end subgraph "Sprint2" P2["Stories & Requirements"] D2["Design & Implementation"] T2["Test & Review"] P2 --> D2 --> T2 --> P2 end Sprint1 --> Sprint2
The V-Model in modern (iterative) configuration.
It’s important to note that while the V-model may be considered “traditional,” its core idea of mapping validation to every development step remains valuable. In fact, most development work today still follows a V-model in miniature. Agile and iterative methods essentially break one large V-cycle into many smaller V-cycles (each sprint or feature is like a mini V-model with its own design, implementation, and testing). In other words, teams haven’t discarded the V-model’s principles of verification; they’ve just compressed and repeated them. This means it’s not enough to dismiss the V-model as outdated; we are all still using some form of it, whether we acknowledge it or not. The key is using it in a flexible, iterative way.
In summary, the V-model ensures thorough verification and validation, but its sequential nature and information silos pose challenges for today’s fast-paced, collaborative development. This is where AI4SE begins to make a profound impact—introducing more agility, continuous testing, and knowledge sharing into the model without losing the traceability that the V-model championed.
AI in the Software Lifecycle: LLMs Change the Game (-theoretic payoffs)
AI4SE refers to applying modern AI techniques (machine learning, NLP, etc.) to software engineering tasks. Large language models (LLMs) have recently shown they can generate code, explain complex concepts, and even produce test cases from natural language descriptions. In effect, coding is becoming easier and more automated, and some aspects of engineering are being “democratized.” Tools like GitHub Copilot already enable developers to generate boilerplate code or unit tests with simple prompts. But beyond assisting coders, these AI tools allow people without coding expertise to contribute in new ways, in theory.
For example, people imagine these:
- A product manager uses an LLM-based tool to prototype an application or query a dataset without writing actual code, even refining features on their own.
- A TPM tests ideas before an engineer ever gets involved.
Within engineering teams, AI is changing the workflow. Developers use LLM “co-pilots” to generate functions or suggest design patterns, acting as force-multipliers (suddenly, every engineer can be more productive with AI help). Engineering managers and tech leads use AI to analyze codebases or generate documentation, saving time on grunt work. In essence, AI is taking on the labor of reading, writing, and synthesizing—tasks that scale with data and code—allowing humans to focus on decision-making and creative problem-solving.
In reality, we see:
- Engineering teams invest most of their time in solving highly technical problems. Boilerplate occurs rarely, and LLMs aren’t ready to solve the truly complex problems yet.
- PMs, POs, TPMs, engineering managers, and other leaders are extremely busy. When engineers can create the same artifact in a fraction of the time, it’s not rational for these folks to engage in “vibe coding” (casual coding for its own sake) during real work.
Instead, I want to showcase one success path I’ve discovered: black-box testing, the practice of verifying a system against its specifications from an external perspective.
Black-Box Testing: From Costly Phase to Continuous Activity — Shoot-and-forget BDD
Revisiting Black-Box (Oracle) Testing
Black-box testing means testing software from the outside, against its requirements, without knowing the internal code. In the V-model, black-box testing activities occur in stages like system testing and acceptance testing—critical but often late phases. Traditionally, black-box testing is labor-intensive: QA engineers must derive test cases from requirements, script them, run them, and maintain them when requirements or UIs change. This effort has always been significant in terms of cost and time. Ensuring broad test coverage with manual or scripted tests is so expensive that teams often prioritize a subset of scenarios, potentially missing edge cases until users find them.
- Black-box testing: testing a system without knowing how it’s constructed (external behavior only).
- White-box testing: testing based on knowledge of how the system is built (internal logic).
Practical AI for Software Engineering: Accelerated Black-Box Testing
LLMs can dramatically shift the economics of black-box testing. AI-powered test generation can turn natural language statements directly into executable test cases within minutes. In my team, it takes around 10 minutes. This makes it feasible to generate many more test scenarios than before, at a fraction of the manual effort. For instance, given a requirement like “It should be able to reset a password using the registered email,” an AI can produce a behavior-driven test scenario in Gherkin syntax. In my team, a GitHub Copilot-based coding agent converts a GitHub issue into a Gherkin feature file:
Scenario: Password Reset Given the test user is on the login page When the test user clicks on "Forgot Password" And enters their registered email Then they should receive a password reset link
This was once a task that QA or developers had to do by hand—translating specs into test steps. Now it’s almost automated, effectively creating failing test cases that highlight unimplemented features. In effect, LLMs can interpret the intent behind requirements and produce test cases that validate those requirements. These generated tests are automatically published as a pull request. (Sometimes developers tweak the
.feature
files afterward, but that’s a fraction of the time compared to writing them from scratch.)Additionally, LLMs generate stub step definitions for the tests (which initially fail), often reusing existing common building blocks and following internal naming taxonomies via a server-side index.
I call this approach “shoot-and-forget BDD” (from the perspective of a TPM writing the scenario in my team).
The implications are profound:
- Easy to review and highly feasible: The benefit of shoot-and-forget BDD is clear. The input is human language and the output is structured human language. This is basically pattern matching—exactly where ML excels.
- True black-box testing at scale: Achieving broad black-box test coverage has historically been expensive. Engineers writing tests for their own systems can fall into a principal–agent trap, and integration tests are often based on how we expect the system to work (i.e. white-box assumptions). Now, behavior scenarios written with minimal inside knowledge can cover many aspects of the system’s intended behavior, addressing the higher-level requirements (the upper parts of the V-model’s left wing).
- Faster, cheaper, iterative test creation: It becomes “write a new requirement and forget (the tests).” Teams can generate hundreds of test cases—something impossible to do manually within an Agile sprint. Because creating tests is so much faster (as easy as writing a user story), it’s now practical to do continuous and ad-hoc testing throughout development, not just plan a fixed test suite upfront.
Some teams report that AI-based observability platforms can even analyze real production logs and generate new test flows based on actual user behavior. This means the test suite can evolve as the product evolves, covering edge cases humans might overlook. Such exploratory testing becomes feasible because an AI can quickly take a new scenario description and produce a runnable test, which can then be executed and tracked.
Going through robust QA/QC for even a small system is still costly, and that fundamental truth won’t change. But ironically, the classic V-model works better with this practice—we actually enjoy the benefits of the V-model’s rigor. The more thorough our internal validation and verification, the less pressure on external QA/QC phases.
It turns out this approach is also traceable. Because these AI-generated tests originate from natural language requirements or user scenarios, they can be tied back to their source information. In a BDD approach, tests are written in a language that business stakeholders can read, ensuring each test case maps to a specific requirement or user story. LLMs enhance this by automating the generation of those BDD scenarios from the requirements themselves. The outcome is that every requirement can have one or many corresponding black-box tests, and if a requirement changes, new tests can be generated just as easily. This achieves something like a traceability matrix (linking formal requirements to JIRA tickets, GitHub issues, feature files, and releases), which was a core goal of the V-model—now achieved with far less manual toil.
Reliability
An important consideration when using LLMs for test generation is reliability. By default, an LLM’s output can be probabilistic or non-deterministic; the same prompt might yield slightly different test code on different runs, or an AI might “hallucinate” a test scenario that doesn’t exactly match the requirement. Relying on an LLM’s ad-hoc answers each time would be risky and hard to reproduce.
The solution—and a key philosophy we must adopt when using AI in software development—is to use the LLM to generate a reviewable artifact (such as a test script or specification) and then automate that artifact in the pipeline. Once the AI produces a test case, that test becomes part of the codebase—subject to code review, version control, and repeated execution. This approach ensures the software’s behavior is validated in a deterministic way, even though the AI that generated the test is nondeterministic. In essence, we get the creativity and speed of the AI combined with the rigorous repeatability of traditional automation. Industry practitioners emphasize this difference: LLM-based coding assistants may produce different outputs if prompted repeatedly, whereas a deterministic test generation tool will always produce the same output for the same input. By capturing the AI’s output as a fixed artifact, teams can eliminate the AI’s randomness from the testing process. The tests will run exactly the same in CI/CD every time, increasing trust in the results.
Micro-Economics, Game-Theoretic Analysis
From a ledger perspective, AI-driven testing yields obvious benefits of cost reduction and value increase. On the cost side, automating test case generation and maintenance slashes the human effort needed for comprehensive testing.
Status Quo
From a game-theoretic standpoint (viewing team interactions as a strategic game), AI is changing the “payoff matrix” for sharing knowledge vs. hoarding it. Historically, an engineer might gain a form of job security or influence by being the only one who understands a critical component (holding a knowledge silo). This could create an incentive to guard information—a non-cooperative strategy to ensure one’s importance. Meanwhile, a PM had little choice but to trust the engineer’s estimates and explanations, operating at an informational disadvantage. This scenario is akin to an asymmetric game where one player has more information and thus more power. Such asymmetry can breed mistrust or suboptimal outcomes (like overly padded estimates or missing customer needs).
Impact of AI-driven Transparency
If the PM can ask an LLM to explain the code or generate an alternative solution, the information asymmetry diminishes. The engineer no longer gains by hoarding knowledge; in fact, since the PM can get a second opinion from AI, the engineer now has incentives to be more forthcoming and collaborative to maintain trust. In game theory terms, the interaction moves closer to a symmetric information game, which supports a more cooperative equilibrium. When all players have more equal access to information, strategies that involve deception or withholding are far less viable because they can be discovered or worked around. The stable strategy becomes collaboration: everyone shares and works together because that produces the best collective outcome, and there’s less advantage in going solo. Essentially, AI tools help make certain knowledge common to all (or at least much easier to obtain), and common knowledge is a known facilitator of coordination in game theory.
In summary, this means a fundamental shift in incentives for each role:
- Before AI Adoption (traditional setup): The developer’s best move was often to keep expertise and information closely guarded (maintain a knowledge silo) because the product owner or manager had no easy way to verify technical claims. The PM was forced to trust the developer’s statements and estimates, often operating with incomplete information and little leverage.
- After AI Adoption (AI-driven transparency): Now the developer gains little by hoarding knowledge—any attempt to do so can be quickly uncovered or bypassed by AI analysis. Instead, the developer is incentivized to collaborate and share, since the PM can and will verify specifics with AI if needed. The PM no longer has to fly blind; they can independently inspect code or generate tests using AI, leading to a more transparent, trust-based working relationship.
Interactive Simulation: Adjust each slider to set the AI adoption level for Dev, PM, QA, and TPM. The payoff matrix updates instantly and highlights Nash equilibrium rows.
Back to AI for SE: Attacking Principal–Agent Theory
Another way to view the incentive shift is through principal–agent theory. The principal–agent problem arises largely from misaligned goals and information gaps. AI4SE attacks this by closing the information gap. The “principal” (say, a product owner or engineering manager) can verify and even do parts of the “agent’s” work independently with AI’s aid, increasing transparency. The agent (engineer) knows that the principal has more visibility now (for example, the PM could run an independent AI analysis to review recent code changes), which discourages any temptation to shirk or mislead. In essence, LLMs act as a real-time monitoring and enabling mechanism; they reduce the need for heavy oversight or blind trust because the knowledge to evaluate work is accessible on demand. Monitoring costs drop and trust can build. Ideally, this leads to better alignment: everyone is working toward the same goal with the same understanding, rather than guarding their own turf.
Team Dynamics and Equilibria in the AI-Assisted Era
As AI levels the playing field, we may witness a shift toward flatter team structures and new collaborative equilibria. Practically, this means team interactions become more about jointly solving problems and less about negotiating hand-offs or protecting domains. Engineers, PMs, QA, and other roles find their day-to-day work involves more overlap and shared language.
- Convergence of Roles (Multi-skilled Teams): While each team member still focuses on their specialty, the skills and activities of different roles now overlap much more. Over time, each team becomes more T-shaped—deep in their specialty but able to contribute across a broad range of tasks. The equilibrium is a team of multi-skilled individuals each supported by AI, rather than strictly siloed specialists. This can increase mutual respect and understanding, as everyone has at least a basic grasp of others’ work (with the AI as their on-demand tutor).
- Incentives to Share Knowledge: With AI agents able to capture and distribute knowledge (e.g. summarizing a design into documentation, or answering questions in a chat), hoarding information makes much less sense. Teams will gravitate towards open information-sharing norms. I anticipate new incentives (perhaps set by management or company culture) that reward collaboration and teaching. In game-theory terms, cooperation becomes the dominant strategy: if one team member tries to keep critical knowledge to themselves, they’ll be quickly outpaced by teams that share and thus move faster—whether they like it or not.
- Leadership and Management Changes: As hierarchy is flattened by AI-enabled transparency, the role of a manager shifts from controlling information flow to enabling and coaching. Middle management, in particular, can be streamlined; fewer “translation layers” are needed when AI helps executives, managers, and engineers communicate directly and clearly. Industry observers have noted that AI-driven tools allow businesses to operate with leaner management structures by lowering the costs of acquiring, processing, and verifying information. I see a lot of potential for automating middle managers’ traditional tasks (gathering status updates, preparing reports, relaying information) through AI-generated reports and templates. Managers, freed from those duties, will focus more on setting direction, defining success metrics, and developing the team’s skills. The hierarchy becomes flatter as one manager can oversee a larger team with help from AI, and decision-making chains shorten. In essence, leadership becomes more about guiding a well-informed team than micromanaging tasks. (Indeed, Harvard Business Review and others have discussed how AI might redefine managerial roles, potentially eliminating some layers of hierarchy and transforming leadership into a more facilitative role.)
- Tension Shift: Ultimately, with LLMs integrated into workflows, the team achieves a more cooperative equilibrium. Everyone has access to the information they need (or can get it with AI), and everyone can contribute to solving the problem at hand, albeit in different ways. This changes old tensions; for example, the classic dev vs. test “us vs. them” mentality fades when developers use AI to generate tests and testers use AI to understand code. Ideally, the new equilibrium is a positive-sum game: the combined output of an AI-augmented, collaborative team is greater than before, which incentivizes continued cooperation. If any member deviates (say, an engineer refuses to use AI assistance and thus slows down the team), they’ll feel pressure to adapt because the rest of the team is moving faster with new tools. Over time, we expect norms to solidify around AI-augmented collaboration, much like how norms solidified around version control or agile ceremonies in earlier eras. Teams that embrace the technology and new ways of working will outperform those that don’t, reinforcing the trend (and likely forcing laggards to catch up).
Of course, challenges remain. There is a risk that if “anyone can code” with AI, then coding might become commoditized and the craft of software engineering could lose some status or bargaining power. One engineer-blogger mused whether widespread LLM adoption could lead to a form of de-skilling of programming. In other words, programmers wouldn’t become less skilled per se, but the job might be perceived as less of a specialized craft, potentially reducing its reward (pay, prestige) in the long run. The worry is that companies might hire more “prompt engineers” or citizen developers at lower cost, while expert software craftsmen become less differentiated. And if misused, software quality can dramatically decrease with an overdose of generated code.
On the other hand, truly expert engineers may be even more in demand for the complex, critical, and highly technical tasks—analogous to how anyone can shoot a short video today, but not everyone can direct a blockbuster film. It is hard to predict exactly how the talent market will shape up.
Conclusion
AI4SE, driven by powerful LLMs, is transforming the software development landscape in ways that upend traditional models like the V-model. By automating and accelerating tasks, AI makes formerly sequential, costly activities (like black-box testing turned into shoot-and-forget BDD) continuous, cheap, and richly informative. By translating between natural language and code, AI enables people outside of engineering to contribute directly to technical work, reducing knowledge silos. Management theory suggests that this empowerment and transparency will flatten hierarchies and align incentives, while game theory implies that teams will settle into more cooperative and efficient patterns when information asymmetry is reduced. We are essentially witnessing the software process become more fluid and equitable—without losing the discipline of verification and validation that models like the V aimed to ensure.
In practical terms, embracing AI4SE means evolving how we collaborate and manage. Teams that leverage LLMs for testing can achieve higher quality at lower cost, with tests evolving alongside the software. Non-technical stakeholders, armed with AI copilots, can inject their domain knowledge directly into the development process, resulting in products that better fit user needs (and faster feedback loops when they don’t). Engineers and technical leads, rather than feeling undermined, must focus on the sophisticated challenges and act as coordinators of ideas coming from many quarters. The economics of development shift: mundane work is automated, so the scarce resource is no longer coding hours but human creativity and strategic thinking. This will likely change how we measure productivity and how we reward team contributions, putting more emphasis on design, innovation, and coordination.
While the transformation is still underway, the trajectory is clear. AI4SE is not just an efficiency booster or academic concept—it’s a catalyst for a more inclusive and collaborative engineering culture. Much like DevOps broke down silos between development and operations, AI-assisted development breaks down silos between technical and non-technical contributors, between planning and testing, and even between management and execution. Organizations that understand and harness this will foster teams that are both highly innovative and disciplined—a new competitive equilibrium in the industry.
The traditional V-model isn’t so much discarded as it is augmented and iterated upon. Verification and validation steps still happen, but they are woven throughout the process with AI as an ever-present assistant. Requirements, development, and testing all converse in real-time via natural language and code generation. This makes the development lifecycle more like a continuous loop than a strict V shape—perhaps an evolving spiral of constant refinement and feedback. For tech leaders and managers, the message is clear: leveraging AI4SE enables your teams to move faster and smarter. It means rethinking roles, training staff to work alongside AI, and fostering an environment where human creativity and AI capability unite. Those who embrace this transformation stand to deliver higher-quality software, align teams more closely with business goals, and ultimately create more value in a competitive marketplace. The future of software development will be written by human–AI teams—and those teams are already reshaping the process today.
Key Takeaways:
- Continuous, AI-Driven Testing: Black-box testing is becoming inexpensive and ongoing. LLMs can generate and even execute tests from natural language specs throughout development, improving quality and keeping requirements and tests in sync. Testing shifts from a late-phase cost to a continuous activity, catching issues early when they are cheaper to fix.
- Empowered Stakeholders: AI tools enable non-engineers to directly create or modify technical artifacts. Product managers can prototype features or derive tests from user stories, and domain experts can query system behavior without writing code. This flattens team structure and lets knowledge from any source flow into the product more easily.
- Reduced Knowledge Asymmetry: By providing on-demand expertise, LLMs reduce technical gatekeeping. Information once confined to specialists is now accessible to all team members (e.g. an LLM explaining a module in plain English). With less information asymmetry, team incentives shift toward transparency and trust—a more “one team” culture where everyone works with the same facts.
- New Team Equilibria: As AI levels the field, teams reach a new balance where collaboration is the norm. Engineers focus on complex problems and architecture, QA ensures AI-generated tests truly capture business intent, and managers orchestrate rather than dictate. The result is a highly collaborative, cross-functional workflow where AI handles grunt work and humans focus on creativity and strategy. Overall team productivity and innovation increase, benefiting everyone.
-
The first and the last surname gTLD, *.kim*
It has been a while since I bought the first top-level domain for family name
.kim
.I did some researches and most likely
.kim
would be the only for very long time. Obviously, I ownmilhan.kim
.For years, https://milhan.kim had redirected to my LinkedIn profile.
Today marks the first day this domain starts to play its role.
🎉 initial commit
Publications
-
Code X-Ray: LG Electronics’ Static Analysis Platform, 2018 — Scalable tooling that uncovers code issues across large repositories.
-
Applying Deep Learning Based Automatic Bug Triager to Industrial Projects, 2017 — Neural triaging that routes bug reports to the right engineers.
-
Computational Fluid Dynamics Simulation Based on Hadoop Ecosystem and Heterogeneous Computing, 2015 — CFD simulations accelerated with Hadoop clusters and heterogeneous hardware.
-
RETE-ADH: An Improvement to RETE for Composite Context-Aware Service, 2014 — Enhanced RETE algorithm for richer context-aware services.