Agentic Coding Workflow: Jules + Antigravity Case Study

Real case study: agentic coding workflow with Jules and Antigravity. Build sites in 1-2 days, migrate Node.js 16→24 in 8 hours. Practical playbook included.

Executive Summary

💡 Core Thesis: "Agentic coding removes friction, not responsibility - developers who can steer agents will define the next wave of engineering leverage."

Real proof, not theory: This article is hosted on parilsanghvi.in** - the very site built using the workflow described below.

Two experiments, one pattern:**
Greenfield build (parilsanghvi.in): Antigravity scaffold → Jules features → Vercel previews. Total time: 1-2 days vs. ~1 week manually.
Legacy MERN migration (Node.js 16 → 24): 8 hours autonomous agent work + 2-3 hours human verification. 22K lines written, 33K deleted.

What you'll learn:**
Concrete workflows that work today (not vaporware)
Where agents stumble and how to guard against it
Cost economics and ROI thresholds
A practical playbook for teams of any size

Time investment: 10 minutes. Potential ROI:** Hours or days saved on your next project.

Prerequisites

Basic setup required to replicate this success: - A GitHub repository with basic CI/CD configured - Vercel account with preview deployments enabled (free tier works) - Jules access with repository permissions properly scoped - Either an existing test suite OR willingness to write critical path tests before handing work to agents - Google AI Pro plan or equivalent (includes Antigravity, Jules, and other tools used in this workflow)

⚠️ Important The workflow assumes you have preview environments - without them, the fast feedback loop that makes agentic coding safe disappears.

Greenfield Experiment

Project:** parilsanghvi.in (personal site & technical blog)
Stack:** Antigravity → Jules → Vercel

The Antigravity Scaffold

The Jules + Vercel Workflow

I treated Jules as a collaborative specialist. I described features and acceptance criteria; Jules implemented, committed, and opened a Pull Request. Vercel produced a live Preview Sandbox for each PR automatically.

Implementation: Jules writes code and opens a PR.
Verification: Vercel deploys a preview URL.
Review & Feedback: I test the feature in the sandbox and leave comments on the PR. Jules consumes PR feedback and iterates.
Self-Review: Jules runs internal performance & security modes before human review, leaving me with 1–2 strategic edits in most cases.

This workflow is fast, auditable, and keeps humans in control via PRs and previews.

Agentic Workflow for parilsanghvi.in - a greenfield build

Figure 1: Jules-generated pull request with Vercel preview deployment - instant sandbox testing for every feature.

Figure 2: Jules self-review mode flagging potential issues before human review, reducing back-and-forth iterations.

Legacy Migration

Project:** Full MERN E-commerce Application (5 years old)
Challenge:** Upgrade Node.js 16 → Node.js 24 and modernize React patterns

Major runtime upgrades are "digital surgery" - native modules must be rebuilt, TLS/crypto behavior can change, and subtle runtime differences can break production logic. I pointed Jules at a non-trivial, real repo (not a toy project) and gave a clear directive: upgrade Node to v24 and modernize frontend patterns.

What Jules did

Timeline:** Jules ran unattended for about 5 hours and continued an overnight pass. Human verification the next morning took ~2-3 hours. The bulk of the syntax and version fiddling was handled by the agent - the human job was verification and strategic decisions.

The Numbers:**
22,000 lines written (new dependencies, updated syntax, modernized patterns)
33,000 lines deleted (deprecated packages, old patterns, redundant code)
Net impact: Cleaner, more maintainable codebase with modern runtime
Autonomous execution time: ~8 hours of active work spanning overnight
Human verification: 2-3 hours the next morning testing features and fixing edge cases
Final touch-ups: One conversation session with Jules to handle the remaining 10%

What I tested during verification:**
Core user flows (product browsing, cart, checkout)
Admin dashboard functionality
API endpoints and data persistence
Mobile responsiveness
Performance under load (basic smoke testing)

Bugs found:** Several edge cases where React component lifecycle changes in newer versions broke assumptions. Jules had modernized the patterns but a few components needed manual adjustment for state management timing.

Database migration:** Handled separately - created a copy of the production database and manually validated schema compatibility before allowing Jules to touch backend code. The agent never had direct database access.

Migration Visuals

Figure 3: Migration PR summary showing package upgrades and modernization notes - 22K lines added, 33K deleted.

Figure 4: Jules timeline showing overnight autonomous execution and morning verification.

Reality Check

Agentic tools are powerful but imperfect. In my runs, common friction points included: - Context overload: Using a single long thread for too many tasks can cause the agent to lose context or loop. Break tasks into smaller missions. - Vague requirements: Agents need clear acceptance criteria; ambiguity slows or derails runs. - No preview environments: Without preview sandboxes, you remove the fast feedback loop that makes agentic workflows safe.

A Real Failure: The Security Feature Gone Wrong

During the development of parilsanghvi.in, I asked Jules to implement a security feature. The agent made changes that seemed reasonable in isolation but broke existing functionality in subtle ways.

What went wrong:**
The prompt was clear on the feature requirement but didn't specify constraints
Jules modified more files than expected, creating unintended side effects
The PR looked clean but testing revealed broken edge cases

How I recovered:**
Deleted the feature branch entirely rather than trying to fix incrementally
Started a new Jules session with an updated, more constrained prompt
The second attempt with clearer boundaries succeeded and was merged

Lesson learned:** When an agent goes off track, starting fresh with better context is often faster than debugging the divergence. Treat each Jules session as disposable - the real asset is your refined understanding of the requirement.

Practical Playbook

Step 1 - Pick a low-risk starter

Start with documentation/test generation to build confidence and improve the repo's testability - this dramatically reduces later surprises when agents touch code.

Step 2 - Prepare the repo & define clear acceptance criteria

Your job here is to refine the requirement and write it up in precise technical language, including measurable success criteria (matching your instincts).

Example: Prompt Engineering in Action

The Node 24 Migration Prompt (actual prompt used):**

"Upgrade all packages to latest version. Ensure code is working fine and use best practices. Do it without breaking functionalities. Ask questions if any."

Why this deceptively simple prompt worked:**

This isn't as detailed as the "clear acceptance criteria" advice earlier suggests - and that's the point. It worked because: - The task type (dependency upgrade) is well-understood by the agent - The constraint ("without breaking functionalities") set clear boundaries - The repository context (existing tests, CI setup) provided implicit success criteria - The agent's domain knowledge filled in the mechanical steps

What happened:**
Jules executed overnight, writing 22K new lines and deleting 33K old lines
One massive PR with the complete migration
~90% success rate - remaining 10% was edge cases fixed through conversation the next morning

Key insight:** Prompt complexity scales with task ambiguity. Well-bounded technical tasks (upgrades, migrations, refactors) need less detail than creative or product work. The agent doesn't need step-by-step instructions - it needs clear goals and constraints.

Compare this to a task that WOULD need more detail:**

❌ Too vague for a UI feature: "Add a contact form" ✅ Better: "Add a contact form with name, email, and message fields. Validate email format client-side. Show success toast on submit. Store submissions in Firestore collection 'contact_submissions' with timestamp. Must work on mobile (320px+) and desktop. Match existing design system colors and spacing."

The difference: UI features have infinite valid implementations. Dependency upgrades have established patterns.

Step 3 - Run the pilot with strict human gates

Measure success by:**
Time saved (developer-hours)
Fewer bugs in the pilot area (post-merge incidents)
Speed of PoC validation (time from idea → validated preview)

Step 4 - Scale & iterate

Decision Tree

Should I use agentic coding for this task?
├─ Is it well-scoped with clear acceptance criteria? 
│  └─ No → Break it down into smaller, measurable tasks first
│  └─ Yes → Continue ↓
│
├─ Do you have preview environments set up?
│  └─ No → Set up Vercel previews or equivalent first
│  └─ Yes → Continue ↓
│
├─ Do you have tests covering critical paths?
│  └─ No → Write tests first (or accept higher verification burden)
│  └─ Yes → Continue ↓
│
├─ Is this high-risk? (data migrations, security, auth, production hotfixes)
│  └─ Yes → Keep this human-led with agent assistance only
│  └─ No → Continue ↓
│
└─ All green? → Ready to pilot! Start with the playbook below.

⚠️Note: This tree helps you avoid the most common pitfalls. Skip a step and you'll likely spend more time debugging than you saved.

Cost Economics

My Setup Costs:**
Google AI Studio Pro: ~$20/month (includes Antigravity, Jules, and other tools)
Vercel: Free tier (sufficient for personal projects and small teams)
GitHub: Free tier for public repos
Total monthly overhead: ~$20

Time Savings on parilsanghvi.in:**
Traditional approach: ~1 week (as a DevOps engineer with basic React knowledge)
Agentic approach: 1-2 days (Antigravity scaffold + Jules features + iterations)
Net savings: ~5-6 days of development time

Time Savings on MERN Migration:**
Traditional approach: ~3-5 days (researching breaking changes, updating code, testing)
Agentic approach: ~8 hours autonomous + 2-3 hours verification = ~1 day total
Net savings: ~2-4 days of focused work

ROI Threshold:**
You have multiple features to ship and limited time
The project involves patterns you understand but don't want to implement manually
You're dealing with mechanical upgrades (dependency bumps, migration work)
You're a specialist in one area (like DevOps) working in another (like frontend)

When it's NOT worth it:**
One-off scripts or very simple tools
Projects where you're learning fundamentals (agent-assisted code skips learning opportunities)
Highly experimental work where requirements change every hour

Team Collaboration

This workflow was developed as a solo developer. Here's what changes when you scale to a team:

Multi-Developer Scenarios:**

Merge conflicts are inevitable** - just like traditional development, multiple developers using agents on overlapping parts of the codebase will create conflicts. Mitigation:
Assign clear ownership boundaries per feature/module
Use feature flags to isolate work-in-progress
Communicate in team channels when starting agent sessions on shared code

Code review burden shifts** - You're not reviewing implementation details; you're reviewing:
Does the preview environment work as expected?
Does this meet acceptance criteria?
Are there obvious security/performance red flags?

Traditional PR review practices still apply - test the preview, check the diff for unexpected changes, approve or request changes.

Onboarding new team members:**
3-4 iterations with Jules to understand prompting patterns
Pair one agent-experienced dev with one new dev for first few tasks
Create a team "prompt library" of successful prompts for common tasks

Recommended team guardrails:**
Branch protection rules requiring human approval before merge
Mandatory preview environment checks before review
Agent sessions logged/audited (who requested what, when)
Post-mortems on agent-caused incidents (treat like any other bug source)

When team size matters:**
1-2 developers: Coordination is informal, trust-based
3-5 developers: Need clear task boundaries and communication norms
6+ developers: Require formal processes, audit logs, and agent usage policies

Risks and Guardrails

Security-critical code:**
Authentication logic and permission systems
Encryption/decryption implementations
API key management or secrets handling

Performance-sensitive paths:**
Database query optimization
Rendering bottlenecks in high-traffic pages
Memory-intensive operations

Production hotfixes:**
Critical bugs affecting live users
Rollback procedures under pressure

User-facing changes without validation:**
UI/UX changes that affect conversion or engagement
Copy or messaging changes
Pricing or checkout flows

These aren't permanent exclusions - as tooling matures and audit trails improve, some of these may become safe for agent-assisted work. For now, keep humans in the driver's seat.

🗣️ The Human Role: "In an agentic workflow, the developer’s real responsibility is to refine requirements and write them in precise technical language and define success criteria, not to fight boilerplate or wrestle with repetitive syntax fixes."

Conclusion

Measure pilot success by time saved, reduced bugs, and faster PoC validation. If those metrics improve consistently, you have a solid case to expand agentic workflows across your team.

The Strategic Shift

Your job as a developer becomes refining requirements and writing them in precise technical language with clear success criteria** - not fighting boilerplate or wrestling with dependency hell.

Software development is shifting from "AI as autocomplete" to "AI as autonomous executor." This isn't about replacing developers - it's about elevating them from manual implementers to architects and orchestrators.

Ready to Try This Yourself?

Start small, stay safe, measure everything:

This week: Pick one low-risk task (documentation, test generation, or a small internal tool)
Set up guardrails: Preview environments + PR reviews mandatory
Run the pilot: Give Jules a clear prompt with acceptance criteria
Measure results: Track time saved, bugs found, preview iterations needed
Iterate: If it works, expand to more complex tasks. If it fails, refine your prompts and try again.

Share your results:** I'm building in public and learning with the community. If you try this workflow, I'd love to hear what worked (or didn't) for you. Find me on LinkedIn or open an issue in the repo.

Questions This Blog Should Have Answered:

If you still have questions**, drop a comment or reach out. This is an evolving workflow and your feedback shapes the next iteration.

🎨 Recursion Check: This blog post about agentic coding was enhanced using agentic coding (Antigravity IDE), published on a site built using agentic coding (Jules + Vercel), and documents the workflow used to build said site. We've achieved maximum meta.

Repository

The case study references this repository: - Ecommerce Website

Check out my other projects

If you found this case study interesting, you might also like:

Terragrunt Modularization: My deep dive into AWS multi-region DR
Elastic Stack Observability: Building a monitoring system at scale (15M+ logs/week)

See more on the Projects page.

Read the fully interactive version at https://www.parilsanghvi.in/blog/sdlc_2_0_agentic_coding