## What's a Series?¶

Series is one of the fundamental data structures in pandas. It's essentially an array *with an index*. Because it's an array, every value in a Series must be of the same type. You can have a Series of ints, a Series of floats, or a Series of booleans, but you can't have a Series of ints, floats and booleans together.

## Series Documentation¶

You'll want to familiarize yourself with pandas' documentation. Here's the documentation for Series. It's the first place you should look when you have questions about a Series or Series method.

## Series Creation¶

### How to make a Series from a list¶

The easiest way to make a series is from a list.

If we print the series, we get back something like this

Notice how it already looks a bit different from a NumPy array. The column of values on the left is the Series
*index* which you can use to access the Series elements in creative and meaningful ways. More on that later..

Also notice the output includes 'dtype int64' which tells us the data type of the elements in the Series.

### How to check if an object is a Series¶

You can use Python's `type()`

function to check that `x`

is indeed a Series object.

### How to check the type of data stored in a Series¶

If you want to check the internal data type of the Series elements without printing the whole Series, you can use the
`Series.dtype`

attribute.

### How to access the underlying NumPy array¶

Most pandas Series store the underlying data as a NumPy array. You can access the underlying NumPy array via
`Series.to_numpy()`

.

You might also see people using the `Series.values`

attribute here, but this technique
is not recommended.

### How to access the first N elements of a Series¶

You can use the highly popular `Series.head()`

method to pick out the first N elements of
a Series. For example, `x.head(6)`

returns the first 6 elements of `x`

as a new Series.

### How to access the last N elements of a Series¶

You can use `Series.tail()`

to pick out the last N elements of a Series.
For example, `x.tail(3)`

returns the last 3 elements of `x`

as a new Series.

### How to make a Series from a dictionary¶

You can make a Series from a python dictionary, like this

In this case, pandas uses the dictionary keys for the series index and the dictionary values for the series values. Again, we'll cover the index and its purpose shortly. For now, just know it's a thing.

### How to make a Series of strings¶

If we wanted to make a Series of strings, we could do that too.

If we `print(z)`

, notice the `dtype`

is listed as "object".

*Why?*

The short answer is, this is *not* a Series of strings. Rather, this is a Series of *pointers*. Since strings are objects that vary in size, but arrays (and thus Series) use fixed-size memory blocks to store their data, pandas implements a common trick - store the strings randomly in memory and **put the address of each string in the underlying array**. (Memory addresses are fixed-size objects - usually just 64-bit integers). If you're confused by this - don't worry, it's a tricky concept that'll make more sense later on.

The newer and better approach to creating a Series of strings is to specify `dtype='string'`

.

Now when we `print(z)`

, pandas reports the *dtype* as 'string'.

(There's a lot to discuss here, but we'll cover these things later.)

### How to make a Series from a NumPy array¶

Perhaps the most powerful way to make a Series from scratch is to make it from a NumPy array.

If you have a NumPy array like this

you can convert it to a Series just by passing `x`

into `pd.Series()`

*Why is this so "powerful"?*

Well, suppose you wanted to make a complex Series from scratch like a random sample of values from a normal distribution. The somewhat lame, but practical way to do this is to use NumPy. NumPy has lots of great tools for making arrays from scratch, and converting them into a Series is a piece of cake 🍰.

Is your NumPy rusty?

Check out our NumPy problem set

## Series Basic Indexing¶

Suppose we have the following Series, `x`

.

If you wanted to access the ith element of the Series, you might be inclined to use square-bracket indexing notation just like accessing elements from a Python list or a NumPy array.

`x[0]`

returns the 1st element, `x[1]`

returns the 2nd element and so on.

This *appears* to work like List indexing, but don't be fooled! ** x[0] actually returns the element(s) of the Series with index label 0.** In this example, that element

*happens*to be the first element in the Series, but if we shuffle the index like this

now `x[0]`

returns 20 instead of 5.

*However*, if we change the index to `['a','b','c','d','e']`

This time, `x[0]`

*does* return the first value in the Series.

Caution

The takeaway here is that square-bracket indexing in pandas isn't straight-forward. Its behavior changes depending on characteristics of the Series. For this reason, we recommend using more explicit indexing techniques - `Series.iloc`

and `Series.loc`

.

### Indexing by position¶

#### How to access the ith value of a Series¶

Use the `Series.iloc`

property to access the ith value in a Series.

#### Negative Indexing¶

`Series.iloc`

supports negative indexing like Python lists and NumPy arrays.

#### Positional Slicing¶

`Series.iloc`

supports negative indexing like Python lists and NumPy arrays.

Notice the result is a Series object whereas in the previous examples the results were scalars.

#### How to select multiple elements by position¶

`Series.iloc`

can receive a list, array, or Series of integers to select multiple values in `x`

.

### Indexing by label¶

Let's talk about the index. Every Series has an index and its purpose is to provide a label for each element in the Series. When you make a Series from scratch, it automatically gets an index of sequential values starting from 0.

For example, here we make a Series to represent the test grades of five students, and you can see how the index automatically gets created.

We can change the index pretty easily, just by setting it equal to another array, list, or Series of values with the proper length. The index values don't even need to be integers, and in fact, they're often represented as strings.

#### How to access the value of a Series with label¶

To fetch a Series value(s) with some specific label, use the `Series.loc`

method.

For example, to get bart's grade in the Series above, we can do `grades.loc['bart']`

.

#### Label Slicing¶

`Series.loc`

supports slicing by label. For example, to fetch the grades between homer and grandpa, we could do `grades.loc['homer':'grandpa']`

.

Warning

Notice that the slice `'homer':'grandpa'`

includes homer *and grandpa*. By contrast, the equivalent positional slice `0:2`

would exclude the right endpoint (grandpa).

#### How to select multiple elements by label¶

Just like `Series.iloc[]`

, we can pass a list, array, or Series of labels into `Series.loc[]`

to retrieve multiple elements.

### RangeIndex¶

When you make a Series without specifying its index, pandas automatically gives it a RangeIndex.

By contrast, when you explicitly set the index as a list of integers, pandas gives it an Int64Index.

For most situations, the difference is irrelevant. However, note that the RangeIndex is more memory efficient and has faster access times.

### Modifying Series Data¶

Consider this Series `foo`

.

#### Basic Series Modifications¶

We can change the second element to 200.

We can set the 1st, 2nd and 3rd elements to 99.

or with *slicing*

or with *slicing*

#### How to update a Series with an array¶

Suppose you have a Series `foo`

and a NumPy array `bar`

and your goal is to update `foo`

's values with `bar`

. If you overwrite `foo`

, you'll lose its index.

Instead, use slicing to overwrite `foo`

's values without overwriting its index.

#### How to update a Series with another Series¶

Suppose you have a Series `x`

and a Series `y`

whose indices are different but share a few common values.

Predict the result of `x.loc[[0, 1]] = y`

.

you may be surprised..

Index Alignment

When you assign a Series `y`

to a Series `x`

, pandas uses *index alignment* to insert values from `y`

into `x`

based on matching index labels.

In the previous example, pandas starts by searching `x`

for the values with index labels 0 and 1. Then it looks for matching labels in `y`

to use to overwrite `x`

. Since `x`

's label 1 doesn't match any elements in `y`

, pandas assigns it the value NaN. And since NaN only exists as a floating point value in NumPy, pandas casts the entire Series from ints to floats.

#### How to update a Series with a NumPy array¶

Given `x`

and `y`

from the previous section,

If we do `x.loc[[0, 1]] = y.to_numpy()`

we'll get the error:

ValueError: cannot set using a list-like indexer with a different length than the value

**When you assign a NumPy array to a Series, pandas assigns the ith element of the array to the ith value of the Series.**

In this case, `x.loc[[0, 1]] = y.to_numpy()`

attempts to assign a 4-element array to a 2-element subseries, hence the error.

If we restrict the numpy array to its first two elements, the assignment works.

## Series Basic Operations¶

It's important to understand how pandas handles basic operations between arrays. Here we'll look at addition, although the core concepts apply to other operations such as subtraction, multiplication, etc.

### Adding a scalar to a Series¶

When you add a scalar to a Series, pandas uses broadcasting to add the scalar to each element of the Series.

### Adding a Series to a Series¶

Series arithmetic is fundamentally different from NumPy arithmetic. When you add two Series `x`

and `y`

, pandas only combines elements with the same index label.

In this example, `x`

has index labels 0, 1, 2, 3, and `y`

has index label 0.

The result of `x + y`

will be a Series whose index labels is a combination of `x`

's index labels and `y`

's index labels. In this case, the label 0 is in both Series, so the corresponding elements are added together. However, labels 1, 2, and 3 in `x`

don't have matching elements in `y`

, so Pandas converts these to NaN in the result. Since,
NaN only exists as a floating point constant in NumPy (i.e. you can't have an integer array with NaNs), Pandas casts the entire Series from `int64`

to `float64`

.

### Add two Series' elements by position¶

If you want to add two Series' elements by position, convert them to NumPy arrays before adding them. For example,

If we add `A + B`

, pandas uses index alignment to add elements *by matching index label*.

If we add the NumPy arrays underlying each Series, their elements are added *by position*.

To convert the resulting NumPy array back to a Series, just wrap it with `pd.Series()`

.

This technique drops `A`

's index labels. If you want to retain `A`

's labels, only convert `B`

to an array.

### Add Series by label, prevent NaNs in the result¶

If you add two Series by index label, you'll often get NaNs in the result where an index label didn't exist in both Series.

If you wish to add `y`

to `x`

by matching label without introducing NaNs in the result, you can use `x.loc[y.index]`

to select elements of x with a matching index label in `y`

, combined with `+= y`

.

## Boolean Indexing¶

You can use a boolean Series `x`

to subset a different Series, `y`

via `y.loc[x]`

.

For example, given a Series of integers, `foo`

,

you can set `mask = foo < 20`

to build a boolean Series, `mask`

, that identifies whether each element of `foo`

is less than 20.

Then you can pass `mask`

into `foo.loc[]`

to select elements of `foo`

which are less than 20.

Boolean Index Alignment

pandas uses *index alignment* to select elements in the target Series based on matching index label amongst elements in the boolean index Series whose value is `True`

.

For example, if we shuffle `mask`

's index (but not `mask`

's values), `foo.loc[mask]`

produces a different result.

### Boolean Indexing by Position¶

If you want to select elements from a Series based on the position of True values from another Series, convert the boolean index Series to a NumPy array.

### Combining Boolean Series¶

You can combine two boolean Series to create a third boolean Series. For example, given a Series of person ages

and a series of person genders

you can create a boolean Series identifying males younger than 18 like this.

Attention!

When you combine two logical expressions in this way, each expression **must** be wrapped in parentheses. In this case, `genders == 'male' & ages < 18`

would raise an error.

#### Logical Operators¶

## Missing Values (NaN)¶

You can use NaN to represent missing or invalid values in a Series.

### NaN before pandas 1.0.0¶

Prior to pandas version 1.0.0, if you wanted to represent missing or invalid data, you had to use NumPy's special floating point constant, `np.nan`

. If you had a Series of integers

and you set the second element to `np.nan`

the Series would get cast to floats because `NaN`

only exists in NumPy as a floating point constant.

### NaN after 1.0.0¶

pandas' release of version 1.0.0 included a
Nullable integer data type. If you want to make Series of integers with NaNs, you can specify the Series `dtype`

as "Int64" with a capital "I" as opposed to NumPy's "int64" with a lower case "i".

Now if you set the second element to `NaN`

, the Series retains its Int64 data type.

Note

A better way insert NaNs in modern pandas is to use `pd.NA`

.

#### Pandas Nullable Data Types¶

### NaN Tips and Tricks¶

Given a Series, `x`

, with some NaN values,

You can use `pd.isna()`

to check whether each value is NaN.

You can use `pd.notna()`

to check whether each value is not NaN.

If you want to replace NaN values in a Series with a fill value, you can use the `Series.fillna()`

function.

### Boolean Indexing with NaN¶

It's important to understand how NaNs work with boolean indexing.

Suppose you have a Series of integers, `goo`

, and a corresponding Series of booleans, `choo`

, with some NaN values.

If you attempt to index `goo`

with `choo`

, Pandas throws an error.

"ValueError: Cannot mask with non-boolean array containing NA / NaN values"

Notice that `choo`

has dtype 'object'.

This happens because pandas relies on NumPy's handling of NaNs by default, and NumPy doesn't "play nicely" with NaN values unless you happen to be working with an array of floats. In this case, dtype='object' is an indicaiton that the underlying numpy array is really just a Series of *pointers*.

To overcome this issue, we can rebuild `choo`

with `dtype = "boolean"`

.

Now the boolean index `goo.loc[choo]`

returns a 2-element subSeries as you might expect.

In this case, the NaN value in `choo`

is essentially ignored.

Note that **the negation of NaN is NaN**, so `goo.loc[~choo]`

does not return the compliment of `goo.loc[choo]`

.