Data is a recorded piece of information about an event or thing. Many data points taken together or put together to answer a question can provide insights or lead to information. But in its most basic sense, data is an observation. You can have one data point or many, and depending on what the data represents, looking at the data might raise other questions or insights.
Some data represent basic facts or characteristics that describe a person, place, event, or thing. You might have a dataset that lists the age of every person who lives in a certain zip code. With that data you can use statistics to learn things like:
- What’s the average age?
- How young is the youngest person?
- How old is the oldest person?
You can have a dataset that represents a one-time collection effort or a dataset that’s collected on a recurring basis. The U.S. census is regularly collected on a schedule, allowing for comparisons over time that are primarily used by governing bodies but many other groups find clever ways to repurpose the data if it’s made available.
Combining datasets can produce new insights. Starting with the census data, you might be curious about the overlap between people who live in a city and people who have a certain characteristic.
In the past, we had fewer concerns about data collection and use. Collecting data was a manual process, storing it was expensive, and our ability to use the data was limited by what was stored and indexed. Fewer devices were connected to the Internet, which meant most data stayed where it was collected. Today our reality is radically different. Sensors are widely available and easy to deploy, data storage and hard drive space are cheap and plentiful, computers are everywhere, and everything is connected, making sharing is easy. Even data analysis is getting simpler. The limitations that used to prevent large-scale data collection and use simply aren’t relevant anymore.
We must change the way we think about data, the questions we ask, our expectations for what’s collected, who has access, what’s stored, and how data is used.
Further Reading: Data Footprint, Wikipedia