Creating maps in Python with geopandas: a tutorial (2024)

Short tutorial on creating maps in Python using GeoPandas: great for geospatial analysis.
Author
Affiliation
Sid Metcalfe

Cartesian Mathematics Foundation

Published

January 9, 2024

Introduction

I have spent some months now using GeoPandas on a daily basis. This is a tool for working with geospatial data in Python. If you’re into data analysis or GIS, you might find this library pretty handy. It’s built on top of pandas, the go-to data manipulation tool in Python, which makes it a familiar environment for those who are already data wrangling with pandas. In this article, I’ll elaborate on how to start off with GeoPandas, from setting up your environment to creating your first map, and then I’ll discuss some more complex mapping and customization.

Introduction to GeoPandas and Geospatial Data

Geospatial data is essential for understanding the world around us. As a Python enthusiast, I’ve been fascinated by how this data can be manipulated and visualized using libraries like GeoPandas. This nifty library has become a go-to tool for anyone interested in geographic information systems (GIS) within Python. It’s built on top of pandas and integrates well with other Python libraries, making it incredibly powerful for geospatial analysis.

GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. It makes working with geospatial data in Python an enjoyable task by abstracting the complexities of manipulating geometries. Here’s how you might get started with a simple GeoPandas operation:

import geopandas as gpd

# Load a GeoDataFrame containing regions (e.g., countries, states, etc.)
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Look at the first few rows to understand the data structure
print(world.head())

# Plot the GeoDataFrame
world.plot()

This snippet loads a dataset of the world’s countries and plots it. Simple, right? The power of GeoPandas is evident when you start interacting with different geometries. Say you want to get the centroid of each country. Here’s how I might do it:

# Calculate the centroid of each country
world['centroid_column'] = world.centroid

# Check the new GeoDataFrame
print(world[['geometry', 'centroid_column']].head())

When digging into geospatial data, you’ll often work with shapefiles, GeoJSONs, and other geographic vector formats. GeoPandas easily reads from and writes to multiple file formats, which is a huge plus in my workflow. For example, reading a GeoJSON file is as simple as this:

gdf = gpd.read_file('path_to_your_geojson_file.geojson')

GeoPandas supports all the common spatial operations I need such as projection, spatial joins, and overlays. If I want to filter the dataset to include only those countries in the Southern Hemisphere, I can do so with a straightforward query:

southern_world = world[world['geometry'].centroid.y < 0]
southern_world.plot()

It’s also incredibly easy to integrate GeoPandas with other libraries for further analysis and visualization such as Matplotlib, Plotly, and even libraries that specialize in spatial analysis like Rasterio and Fiona.

When it comes to projecting your geospatial data (that is, transforming the coordinate references), GeoPandas handles this in stride. The CRS (Coordinate Reference System) can be changed using the to_crs method:

# Change the CRS to EPSG:3395 (Mercator projection)
world = world.to_crs(epsg=3395)

Interacting with such complex data types may seem daunting, but GeoPandas turns it into an approachable task, even for beginners like us. This brief intro should give you some insight into the capabilities of GeoPandas and geospatial data in Python. It serves as the basis for the exciting tasks ahead, where I’ll be guiding you through the creation of maps and delving into more sophisticated mapping techniques with GeoPandas. Get ready to manipulate and visualize spatial information like a pro!

Setting Up Your Environment for GeoPandas

Before we can begin creating beautiful maps with GeoPandas, we need to set up our environment properly. I’ll guide you through the process step by step, ensuring you’re equipped with the necessary tools to get started.

First and foremost, make sure Python is installed on your system. I recommend using Anaconda as it simplifies package management and deployment, especially for data science-related libraries. You can download Anaconda from the official website at https://www.anaconda.com/products/distribution.

Once you have Anaconda installed, setting up an environment is straightforward with the following commands in your terminal:

conda create --name geopandas_env python=3.9
conda activate geopandas_env

Remember to replace python=3.9 with the version of Python you prefer or need for your specific project.

With our environment active, we turn our attention to installing GeoPandas. This library has a few dependencies that can be tricky to install manually, but with the following command, they can be installed together without a hitch:

conda install geopandas

This command will install GeoPandas along with all the necessary dependencies such as ‘fiona’ for file access, ‘shapely’ for geometric operations, and ‘pyproj’ for projections and coordinate systems.

However, there might be additional packages you want for specific tasks, like ‘rtree’ for spatial indexing which improves performance, or ‘matplotlib’ for plotting your maps. To install these, simply run:

conda install rtree matplotlib

If you encounter any trouble with these installs, the GeoPandas documentation at http://geopandas.org/install.html is an excellent resource. Additionally, GitHub issues (https://github.com/geopandas/geopandas/issues) can be a goldmine for finding solutions to common problems.

Furthermore, we can’t process geospatial data without data itself! I like to work with shapefiles (.shp), a popular vector data format for geospatial information. You can find free datasets at sites like Natural Earth (http://www.naturalearthdata.com/) or the US Census Bureau (https://www.census.gov/cgi-bin/geo/shapefiles/index.php).

Let’s load a shapefile to test our installation. After downloading your selected data, you can use the following Python code to load the data into a GeoDataFrame:

import geopandas as gpd

shapefile_path = 'path_to_your_shapefile.shp'
gdf = gpd.read_file(shapefile_path)

# Now let's take a quick look at the first few rows
print(gdf.head())

This code snippet imports the GeoPandas library and reads a shapefile into a GeoDataFrame, which is then displayed. If this works without hiccups, congrats, you’re all geared up to start creating maps!

Remember, setting up might be the least exciting part, but a robust environment is critical for project stability and reproducibility. While it might seem daunting at first, once you’ve gone through the process, the next time will be a breeze.

Now that we’ve set up our environment, installed GeoPandas, and tested it with some data, you’re prepared to move on to creating your first map. Trust me, the setup effort is worth the magic you’re about to conjure with your data!

Creating Your First Map with GeoPandas

Here’s a practical rundown on creating your first map with GeoPandas. Assuming you’ve got GeoPandas and its dependencies sorted, it’s time to get mapping!

First up, import GeoPandas along with matplotlib for plotting:

import geopandas as gpd
import matplotlib.pyplot as plt

Let’s load a simple world map. GeoPandas comes with some built-in datasets to play with—one being a dataset of the world.

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

This ‘world’ GeoDataFrame now holds your countries and their geometry. Here’s how to plot it:

world.plot()
plt.show()

You’ll see a basic world map. Pretty straightforward, right?

What if you wanted to focus on a single country? Well, here’s how. Let’s say I want a map of Brazil:

brazil = world[world.name == 'Brazil']
brazil.plot()
plt.show()

But I know there’s more to a map than just shapes. You might want to see some details, like borders and cities. Let’s add those:

cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))

# Plotting with both the world's borders and the cities
fig, ax = plt.subplots()
world.plot(ax=ax)
cities.plot(ax=ax, marker='o', color='red', markersize=5)
plt.show()

Now, let’s personalize it a bit. Suppose you’re curious about population. You want the countries with a higher population to stand out in a darker color:

world.plot(column='pop_est', legend=True,
legend_kwds={'label': "Population by Country",
'orientation': "horizontal"})
plt.show()

There, countries with larger populations are now shaded darker.

If you’d like to save this map for your report or blog post, saving it is as easy as:

fig = world.plot(column='pop_est').get_figure()
fig.savefig('population_map.png')

That’s a basic introduction to creating a map with GeoPandas. Bear in mind GeoPandas is mighty, and with the right data, the sky’s the limit. There are vast amounts of public geospatial data available from sources like Natural Earth or the U.S. Geological Survey.

You could, for instance, get more granular and map specific streets in your city, if you found or created the right shapefile. The flow is the same: read your file, plot it, customize to your heart’s content, and display or save.

When plotting, GeoPandas is effectively leveraging Matplotlib, so a lot of customization comes from there. Tweaking your maps won’t differ much from tweaking a standard Matplotlib plot. And if you get stuck, their docs are a treasure trove of information.

There’s also a lively community around GeoPandas, so don’t hesitate to scour forums or GitHub repos for tips. Happy mapping!

Advanced Mapping Techniques and Customization with GeoPandas

In this final piece of our tutorial on creating maps with GeoPandas, let’s explore some advanced mapping techniques and customization tips that will help give your geospatial visualizations a professional touch.

First up, let’s talk about styling. One way to make your maps more informative is to use different colors to represent different values of a variable. For example, if you’re mapping population density, you could use a gradient of colors from light (low density) to dark (high density).

Here’s how I typically approach this in code:

import geopandas as gpd
import matplotlib.pyplot as plt

# Loading the GeoDataFrame
gdf = gpd.read_file("path_to_your_shapefile.shp")

# Setting the variable to be visualized
variable = 'population_density'

# Creating a figure and one subplot
fig, ax = plt.subplots(1, 1)

# Plotting the data
gdf.plot(column=variable, ax=ax, legend=True,
legend_kwds={'label': "Population Density by Area",
'orientation': "horizontal"})
plt.show()

Custom legends and scales can really help your map tell a story. Say you want to highlight specific ranges of values, you can define a custom color map (cmap):

# Defining a custom color map
import matplotlib.colors as colors

cmap = colors.LinearSegmentedColormap.from_list("", ["lightblue","yellow","red"])

gdf.plot(column=variable, ax=ax, legend=True, cmap=cmap)

Labels and titles are important as they provide context. You can add text annotations directly to your GeoDataFrame plot:

# Adding title and labels
ax.set_title('Population Density Map')
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')

What about interactivity? Sometimes static maps don’t cut it. Integrating GeoPandas with tools like Folium can provide interactive capabilities, allowing for zooming and adding layers:

import folium

# Creating a map center
map_center = gdf.geometry.centroid.iloc[0]

# Creating a Folium map
m = folium.Map(location=[map_center.y, map_center.x], zoom_start=5)

# Adding the GeoDataFrame as a layer to the Folium map
folium.GeoJson(gdf).add_to(m)

# Display the map
m

There’s a lot to play with in terms of aesthetics. GeoPandas allows you to tweak almost every visual aspect of your map. For instance, you might want to modify the transparency of your plot for better visual appeal or to layer multiple maps:

# Plotting with adjusted transparency
gdf.plot(column=variable, alpha=0.5, legend=True)

For more intricate customizations, digging into GeoPandas documentation and examples from the GitHub repository can help immensely (GeoPandas GitHub).

Finally, remember that GeoPandas is built on top of Matplotlib, so any advanced Matplotlib technique can be applied to a GeoPandas plot. This cross-compatibility significantly extends your toolkit for making compelling maps. Additional resources such as Matplotlib’s gallery of examples are excellent for inspiration (Matplotlib Gallery).

Creating these maps could be somewhat intimidating at first, but once you get the hang of it, the possibilities are endless. Keep experimenting with styles, layers, and interactivity options. The more you practice, the more you’ll discover just how flexible and powerful GeoPandas is for geospatial mapping.

And that’s a wrap on our series! With this toolkit, I’m confident you can start crafting maps that not only serve their functional purpose but also engage and inform your audience effectively. Happy mapping!