CSVs and pandas — Excel with a Python brain

The first time someone handed me a CSV file, I was sitting in a small office in Accra trying to help a friend make sense of her shop's sales. She'd been writing every transaction into a notebook for six months — date, item, price, customer if she remembered the name. A volunteer had typed the whole thing into a spreadsheet and emailed it to her. She forwarded it to me with one line: "Can you tell me which months were good?"

I opened the file. It ended in .csv. I had no idea what that meant. I right-clicked, opened it with a text editor by accident, and braced for something complicated. What I saw instead was the most boring thing in the world. Just rows. Just commas.

That moment — the moment a CSV stopped feeling like a mystery format and started feeling like a notebook page — is the moment I want to give you in this post.

A CSV is a market sales book at the end of the day

Picture a trader closing her stall in Makola. She has a small ruled book. Each row is one sale. The columns are the same every day: date, item, quantity, price. At the end of the week she flips through and adds things up by hand.

A CSV file is that book, typed out. Nothing more. The name even tells you the format — comma-separated values. Open one in any text editor and you get something like this:

student_id,name,gpa
1,Ama,3.6
2,Kojo,3.2
3,Yaa,3.9

The first row is the headers — the column names. Every row after that is one record. Commas mark where one cell ends and the next begins.

That is the entire specification. There is no hidden formatting, no formulas, no fonts. It's older than Excel, and that's exactly why it survived — almost every system on earth, from a government statistics office to a bank to a Python script on your laptop, knows how to read a CSV. It's the closest thing data has to a universal language.

So why not just read it line by line?

You could. Python's standard library has a csv module, and for a tiny file it does the job. But the moment your file grows past a few hundred rows, or you want to ask anything more interesting than "print every row," you start writing a lot of bookkeeping code. Open the file. Skip the header. Loop through. Convert strings to numbers. Track a running total. Handle the row that's missing a value.

That bookkeeping is not the work. The work is the question you wanted to ask.

This is where pandas comes in. Pandas gives you a thing called a DataFrame, and the simplest way to describe it is this: a DataFrame is an Excel sheet with a Python brain. It looks like a spreadsheet when you print it. But unlike a spreadsheet, you can talk to it in code — and it answers in one line what would take you twenty in plain Python.

Loading a file and taking a look

Two lines is all it takes.

import pandas as pd
 
df = pd.read_csv("students.csv")
df.head()

pd is the nickname everyone uses for pandas — keep the convention, it'll make every tutorial you read later feel familiar. df stands for DataFrame. .head() shows the first five rows so you can confirm the file loaded the way you expected.

In a notebook, the output is a neat table:

   student_id  name   gpa
0           1   Ama   3.6
1           2  Kojo   3.2

That extra column on the far left — 0, 1, 2 — is the index. Think of it as the row number pandas adds for free, the way an attendance register has a numbered line for each student before you even write their names down.

Grabbing one column

If a DataFrame is a spreadsheet, a column is exactly what you'd think — the values stacked under one header. In Excel you'd click the letter at the top of column C. In pandas you write:

df["gpa"]

That gives you back every GPA in the file, in order. It's a single column you can now do things with — print it, sort it, count it, average it.

The mental shift that makes pandas worth learning

Here is the part that took me the longest to internalise, so I'm going to say it plainly.

In normal Python, when you want to do something to every value in a list, you write a loop. You think one row at a time. Take this row. Do the thing. Move to the next row. That's a perfectly good way to think, and it's how you've been writing code so far.

In pandas, you stop thinking row by row. You start thinking column at a time. You don't say "for each student, look at their GPA, add it to a total, then divide at the end." You say:

df["gpa"].mean()

One line. Pandas reads down the whole column and gives you the average. It works the same on three rows or three million.

This is the entire point of the library. Less code. Bigger questions. You stop being a person who walks through rows with a calculator and you start being a person who asks the table a question and reads the answer.

Take this with you

A CSV is not a fancy format. It's a list of rows separated by commas, the same shape as a sales book or a class register. Pandas takes that flat little file and hands you a DataFrame — a spreadsheet you can speak to in Python. The trick to learning it is not memorising functions. It's making one mental switch: stop looping through rows, start asking columns questions. Once that switch flips, the rest is just vocabulary.

← All field notes