Design a Frontier Project
You have spent eight lessons studying how frontier AI is actually built — the labs, the hardware, the data, the training pipeline, the economics, the evaluation process, and the competitive and policy dynamics around release. Now you will put that knowledge together. In this lesson, you play the role of a founding team at a new frontier AI lab and design a frontier project from first principles.
This is not a coding exercise. It is a strategic and analytical design exercise — the kind of planning work that real research directors, infrastructure leads, and policy teams at frontier labs engage in before committing hundreds of millions of dollars to a training run. Every decision you make connects to concepts from this module. Your job is to be specific, rigorous, and honest about tradeoffs.
Background: The Scenario
It is late 2026. You have secured $500 million in funding from a consortium of investors and a major cloud provider. Your lab has 150 employees: 40 ML researchers, 30 systems engineers, 20 data engineers, 15 safety researchers, 15 product engineers, 10 policy staff, and 20 in operations, legal, and administration. Your cloud provider partnership gives you access to a reserved cluster of 8,000 high-end AI accelerator GPUs at favorable rates. Your investors expect a commercially deployable model within 18 months. Your safety team believes you need at least 12 months of pre-deployment evaluation. Your research team wants to pursue a genuinely novel architecture. Your data team estimates it can assemble a high-quality training corpus of 2 trillion tokens within 6 months. You must design the project. Every major decision must be justified using concepts from this module.
Design Your Frontier AI Project
- Work through each section carefully. Be specific — vague answers miss the point of the exercise. Aim for 400-600 words total across all sections.
- SECTION 1: Mission and Domain (50-75 words)
- What is your model's primary domain and use case? Examples: general-purpose assistant, coding assistant, medical reasoning, scientific research support, multilingual education. Explain why this domain is strategically justified given your funding, timeline, and team composition. What gap in the current market are you filling?
- SECTION 2: Compute Planning (75-100 words)
- You have 8,000 GPUs for 18 months. Not all of them can go to the main training run — you need capacity for pre-run experiments, post-training alignment stages, and inference serving during evaluation. Allocate your compute across these uses. Estimate how many GPU-months you will commit to the main pretraining run. Using rough public estimates of $2-3 per GPU-hour, estimate your compute budget for the main run. Is this consistent with your $500M total budget after accounting for salaries and infrastructure?
- SECTION 3: Data Strategy (75-100 words)
- Your data team can deliver 2 trillion tokens. Describe your data mix: what sources will you include (web crawl, books, code, scientific papers, synthetic data, domain-specific corpora for your chosen domain)? What is the approximate weight of each? What is your single biggest data quality risk, and how will you mitigate it? Given the Chinchilla scaling laws and your compute budget, is 2 trillion tokens approximately right for your expected model size, or would you adjust?
- SECTION 4: Training Pipeline (75-100 words)
- Outline your three-stage training pipeline: pretraining objective and duration, SFT data collection approach (who writes the examples? how many?), and your preference alignment approach (RLHF or DPO? why?). Identify the single biggest technical risk in your pipeline and how your team will monitor for it. What is your checkpointing strategy and why?
- SECTION 5: Evaluation Plan (75-100 words)
- Identify three capability benchmarks you will use and explain why each is appropriate for your domain. Describe two safety red-team scenarios that are specifically relevant to your chosen domain. Explain how you will avoid benchmark contamination in your capability evaluation. Who conducts your external red-team — internal safety team, external contractors, academic researchers, or some combination? Justify your answer.
- SECTION 6: Release Decision (75-100 words)
- Will you release model weights publicly, restrict to API access, or offer a hybrid model (open weights for research, API for commercial use)? Justify your decision using at least two arguments from the open-vs-closed debate you studied. Describe one condition under which you would change your release decision after six months of deployment. How will you handle a scenario where a security researcher privately reports a serious safety issue in your deployed model?
- SECTION 7: Honest Risk Assessment (50-75 words)
- Identify the one decision in your project plan you are least confident in. What information would you need to be more confident? Identify one external factor — a competitor action, a regulatory change, a hardware supply disruption — that could most threaten your plan, and describe your contingency.
- Share your plan with a partner or small group. Compare your domain choices, compute allocations, and release decisions. Where do your plans diverge most? What does that reveal about the tradeoffs in frontier AI development?
Reflection: What the Exercise Reveals
Even with $500 million and 150 employees, you likely discovered that you cannot do everything. The timeline is tight. The compute is finite. The safety evaluation takes time that the investor timeline does not easily accommodate. The release decision involves genuine tradeoffs with no clean answer. This is what frontier AI development actually looks like from the inside — not a smooth optimization but a series of constrained decisions under uncertainty, made by real people balancing scientific ambition, commercial pressure, safety responsibility, and competitive dynamics. The concepts from this module are not abstract — they are the levers and constraints that define the shape of every frontier model that reaches your hands.
The ability to reason about AI development from a systems perspective — understanding compute, data, pipelines, economics, and policy as interconnected variables — is increasingly valuable in careers ranging from AI research to product management to policy and law. This module has given you a framework for that reasoning.
In your frontier project design, you allocated 60% of GPU capacity to the main pretraining run. A competitor announces they will release a model in six months — four months before your planned completion. Which response is most consistent with responsible frontier lab practice?