Is HYVE CARES really free?

Yes. 100% free, forever. Every feature, every lab, every lesson. The only paid add-on is the optional Homeschool Compliance Program ($10/month) for families who need legal compliance tools.

Can I use HYVE CARES for homeschooling?

Yes. HYVE CARES provides a complete K-12 curriculum plus a dedicated Homeschool Compliance Program with attendance tracking, immunization records, standardized test management, and transcript generation — available in all 50 US states.

What subjects does HYVE CARES cover?

200+ subjects including Math, Science, Language Arts, Social Studies, Coding, 18 world languages, Financial Literacy, Music, Art, Career Readiness, and more — aligned with Common Core and NGSS standards.

Does HYVE CARES have practice exams?

Yes. 30+ practice exams including SAT, ACT, GRE, LSAT, MCAT, ASVAB, CompTIA A+, Real Estate, CDL, and more — with timed testing, AI-powered scoring, percentile estimates, and spaced repetition study mode.

MaXXiE is HYVE CARES' AI tutoring system — a personalized learning companion that adapts to each student, generates lessons on demand, scans homework, and provides voice-based learning.

Is HYVE CARES safe for children?

Yes. HYVE CARES requires parental consent for children under 13 (in line with COPPA), stores student data with Row-Level Security and AES-256 encryption at rest, and never sells data or shows ads.

Interpretability: Opening the Box

If opacity is the disease, interpretability research is the medicine — or at least the attempt at medicine. Researchers have developed a growing toolkit of techniques designed to illuminate what happens inside a model: which inputs mattered, what concepts the model encoded, where in the network a capability lives, and when the model is likely to be wrong. This lesson surveys the major categories of that toolkit, what each can and cannot tell you, and why interpretability matters for AI safety.

Feature Attribution: What Drove This Prediction?

The most practical interpretability question is local: for this particular input, which features most influenced the output? Feature attribution methods assign a score — sometimes called an importance weight — to each input feature. LIME (Local Interpretable Model-agnostic Explanations) answers this by perturbing an input slightly in many ways, observing how the output changes, and fitting a simple linear model to those perturbations. The linear model's coefficients become the feature importances. LIME is model-agnostic: it treats the underlying model as a black box and only observes input-output pairs. SHAP (SHapley Additive exPlanations) is grounded in cooperative game theory. It computes how much each feature 'contributed' to pushing the prediction away from the average prediction, drawing on a concept called Shapley values. SHAP satisfies several mathematical fairness axioms that make its attributions more theoretically principled than LIME. Gradient-based methods — integrated gradients, GradCAM, Guided Backpropagation — take a different approach: they propagate signal backward through the network to see which input dimensions the output is most sensitive to. These methods require access to the model's internals (its gradients), so they are not model-agnostic.

Saliency Maps: Where Did the Model Look?

For image models, gradient-based attribution produces a saliency map — a heatmap overlaid on the image showing which pixels most influenced the prediction. A saliency map that highlights the tumor itself is reassuring; one that highlights the ruler or the hospital watermark reveals a shortcut. Saliency maps have become standard diagnostic tools in medical AI, though they come with limitations: they show correlation with the output gradient, not causal contribution.

Probing and Concept Activation Vectors

Feature attribution tells you about inputs. Probing tells you about the model's internal representations — the activations at each layer. In a probing experiment, a researcher extracts the hidden-layer activations of a trained model and asks: does any linear transformation of these activations predict some interpretable property? For example, does a language model's 12th-layer activations encode grammatical subject-verb agreement? If a simple linear classifier trained on those activations predicts agreement with 95% accuracy, we have evidence that the model has encoded this grammatical concept internally — even though it was never explicitly trained to do so. Testing with Concept Activation Vectors (TCAV) extends this idea to image classifiers. A researcher defines a concept — say, 'striped texture' — using a set of example images. TCAV finds the linear direction in the model's activation space corresponding to that concept and measures how much predictions in a class (e.g., 'zebra') depend on that direction. High TCAV scores for 'striped' in the 'zebra' class confirm the model uses stripes as a key feature — a sanity check that builds confidence. Mechanistic interpretability goes further: instead of finding what concepts a layer encodes, it attempts to reverse-engineer the entire computational algorithm the model implements. Researchers at Anthropic and elsewhere have identified individual attention heads and circuits within transformers that perform specific operations — induction heads that copy repeated sequences, name-retrieval circuits that look up facts about entities. This line of work aspires to provide a full account of a model's computation.

Match each interpretability technique to its defining characteristic.

Terms

LIME

SHAP

GradCAM

Probing classifier

TCAV

Definitions

Backpropagates gradients through a CNN to produce a pixel-level heatmap

Uses Shapley values from game theory to fairly distribute prediction credit

Tests whether a linear model can predict an interpretable property from hidden activations

Measures how much a human-defined concept influences a class prediction

Fits a simple model to local perturbations to approximate feature importance

Drag terms onto their definitions, or click a term then click a definition to match.

Limits and Failure Modes of Interpretability Tools

Every interpretability method has known failure modes, and safety-conscious practitioners must know them. Post-hoc explanations can be unfaithful. Research by Adebayo et al. showed that some saliency map methods produce visually plausible heatmaps even when the model's weights are randomized — meaning the maps look like they're highlighting meaningful features but contain no information about what the model actually computed. If an explanation looks reasonable regardless of what the model did, it is not a reliable window into the model. SHAP and LIME can disagree substantially on the same model and input. Because they use different approximation strategies, they can produce contradictory attributions. There is no single ground truth about feature importance, and different tools embody different assumptions. Probing measures correlation, not causation. A probing classifier showing that a layer encodes some concept does not prove the model uses that concept in its computation — only that the information is present. The model might use an entirely different path. Interpretability tools can be gamed. Models can be constructed to produce friendly-looking explanations while still making decisions on hidden discriminatory features. A technically competent adversary can make a biased model appear to explain itself fairly.

Explanations Can Create False Confidence

One of the subtler dangers of interpretability tools is that they make humans feel confident in a model's behavior even when the explanation is incomplete or wrong. A radiologist shown a saliency map highlighting tumor tissue may trust a model's diagnosis far more than warranted — because the explanation looks right, they stop scrutinizing. Explanations must be treated as hypotheses to verify, not certificates of correctness.

A researcher trains a probing classifier on layer 15 of a large language model and finds it predicts whether the next word is a noun with 91% accuracy. What does this tell us?

Research found that some saliency map methods produce nearly identical heatmaps even when model weights are replaced with random values. This finding implies:

Evaluate a Real Explanation

Access a public AI demo that provides explanations — many text sentiment classifiers, image classifiers, or SHAP-based tabular models are available online through tools like Hugging Face or Google's What-If Tool.
Step 1: Submit five different inputs and record the explanation the system provides for each.
Step 2: For two of the explanations, try to generate a modified input that changes the attribution without changing the prediction. For example: if the system says the word 'terrible' drove a negative sentiment prediction, remove that word and see if sentiment changes or if the model finds another driver.
Step 3: Form a hypothesis about whether the explanation is faithful to the model's actual behavior based on your experiments.
Step 4: Write a one-paragraph evaluation: 'Based on my experiments, I believe the explanations provided by this system are / are not trustworthy because...'