Skip to main content
AI Foundations

⏱ About 15 min15 XP

What Is Data?

Before AI can learn anything, it needs raw material. That raw material is data. The word gets thrown around constantly — in the news, in school, in every tech conversation — but almost nobody defines it carefully. This lesson fixes that. By the end, you will know exactly what data is, how it differs from two related ideas, and why that distinction matters for everything AI does.

A Precise Definition

Data is recorded information about the world. That definition has two parts that both matter. Recorded means written down, stored, captured in some persistent form — a number in a spreadsheet, a pixel in a photo, a sample in a sound file. A feeling in your head is not data until it is recorded. A conversation you had is not data until it is logged, transcribed, or saved. About the world means data points at something real: a temperature, a word, a color, an event, a person's rating of a movie. Data is always a representation — a description of something, not the thing itself. Data comes in many forms: numbers (78.3 degrees, 2 million users), text (a tweet, a novel, a review), images (a photograph, an X-ray, a satellite image), audio (a voice recording, a bird call), and more. All of them fit the same definition: recorded information about the world.

Definition: Data

Data is recorded information about the world. It exists in many forms — numbers, text, images, audio — and always represents something outside itself. A piece of data is often called a data point or an observation.

Here is a concrete example. A weather station in Kansas City records the outside temperature every ten minutes. At 3:40 PM on a Tuesday, it logs 91.2°F. That single measurement is one data point. The station also records humidity (67%), wind speed (8 mph), and barometric pressure (29.92 inHg). These four measurements taken together are a single record — a snapshot of one moment. Over a week, the station produces thousands of such records. That collection is a dataset. Notice what the data does not tell you by itself: it does not tell you whether Tuesday was pleasant or oppressive, whether the farmers were worried about drought, or whether the power grid was strained. That requires more than data.

Data vs. Information vs. Knowledge

Three words are constantly confused: data, information, and knowledge. They are related but not the same. Data is the raw recorded facts — 91.2°F, 67%, 8 mph. On their own, they just sit there. Information is data that has been processed or organized so that it is meaningful in context. 'Tuesday afternoon in Kansas City was hot and humid, with light wind' is information. You took the raw numbers, put them in context, and produced a statement a human can understand. Knowledge is what you build from information over time — understanding that summers in Kansas City are hot, that heat indexes above 100°F are dangerous for outdoor workers, that this summer is running 3°F above the historical average. Knowledge is built up from many pieces of information processed and connected. For AI purposes, the critical point is this: AI systems work primarily with data — they process raw recorded facts at enormous scale. Converting that data into something genuinely useful requires careful design at every step.

The Hierarchy

Data → Information → Knowledge. Data is the raw ingredient. Information is data made meaningful by context. Knowledge is understanding built from information over time. AI starts at the very bottom: raw data.

Match each term to its correct description.

Terms

Data
Information
Knowledge
Data point
Dataset

Definitions

A single recorded measurement or observation
An organized collection of many data points
Raw recorded facts about the world
Understanding built from many pieces of information
Data organized and placed in a meaningful context

Drag terms onto their definitions, or click a term then click a definition to match.

Why the Definition Matters for AI

You might wonder why being precise about the word 'data' matters. Here is why: when an AI is trained, it processes enormous amounts of data — but it has no automatic access to context, background knowledge, or common sense. It sees the recorded facts. It does not see the world. A facial recognition system trained on photos (image data) does not 'know' what a face is — it learns patterns in pixels. A language model trained on text (text data) does not 'understand' language — it learns statistical patterns in sequences of characters. Keeping that distinction sharp — data is a representation, not reality itself — is the first step to thinking critically about what AI can and cannot do. Throughout this module, you will build on this foundation. Every lesson connects back to the question: where did the data come from, what does it actually represent, and what happens when that representation is imperfect?

Data Is Not Reality

A photo of a cat is not a cat. A temperature reading of 91.2°F is not the heat you feel walking outside. Data is always a partial, imperfect representation of the world. This gap between data and reality is one of the most important ideas in all of AI.

Fill in the blanks to complete the key ideas from this lesson.

Data is information about the world. A single measurement is called a data . The three-level hierarchy runs from data to to knowledge.

Which of the following is the best example of data?

Why is it accurate to say that AI systems 'work with data, not with reality'?

Find Five Data Points Around You

  1. Look around the room or think about your day so far.
  2. Identify five things that could be recorded as data — not vague feelings, but specific, measurable facts.
  3. For each one, write it down in the exact form it would be recorded: a number with units, a category, a text snippet, etc.
  4. Then write one sentence turning each data point into information by adding context.
  5. Finally, ask yourself: what would someone need to know (knowledge) to truly understand what each data point means?
  6. Example: Data point: 68°F (room temperature). Information: The classroom is cooler than outside today. Knowledge: This room is kept 5–8°F below the outdoor temperature every summer to improve focus.