# Building Pandas Dataframes From Ndarrays

At work, I have to switch between numpy & pandas depending on the computational needs. Numpy is faster. Pandas is easier to work with.

One of the tasks I’ve faced often was to convert a 3-dimensional ndarray to a pandas dataframe. I will share my preferred technique in this post today.

For the purpose of this exercise, I’ll generate dummy sales data for a retail company. The dimensions include products, locations, and sales.

# Convert ndarray to pandas dataframe

``````import pandas as pd
import numpy as np
``````

## 1d : dimension = sales

``````arr_1d = np.random.randint(
low=1,
high=10,
size=3,
)
print(arr_1d)
``````
``````[9 3 6]
``````

That’s easy. Ideally, 1-d information should be represented as a Series.

``````df_1d = pd.DataFrame(arr_1d, columns=["sales"])
print(df_1d)
``````
``````   sales
0      9
1      3
2      6
``````

## 2d: dimension = sales * product

Let’s move on to 2 dimensions. Now, we have data corresponding to different products.

``````arr_2d = np.random.randint(
low=1,
high=10,
size=(3, 2),
)
print(arr_2d)
``````
``````[[4 6]
[8 1]
[2 7]]
``````

Pandas DataFrame can handle 2-D ndarrays out of the box.

``````df_2d = pd.DataFrame(arr_2d, columns=["product", "sales"]).set_index("product")
print(df_2d)
``````
``````         sales
product
4            6
8            1
2            7
``````

## 3d : dimension = location x product x sales

Now, what if we have a ndarray corresponding to all products for several locations?

``````# failure
arr_3d = np.random.randint(
low=1,
high=10,
size=(5, 3, 1),
)
print(arr_3d)
``````
``````[[

]

[

]

[

]

[

]

[

]]
``````
``````# the following raises ValueError
# pandas DataFrame expects a 2-d input
df_3d = pd.DataFrame(arr_3d, columns=["location", "product", "sales"])
``````

pandas won’t work out of the box. It cannot handle more than 2 dimensions. So, it raises a `ValueError`.

``````    ---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

/var/folders/jq/ksxbjg7d58g9v9rrcl0f38380000gn/T/ipykernel_12628/1531564731.py in <module>
1 # the following raises ValueError
2 # pandas DataFrame expects a 2-d input
----> 3 df_3d = pd.DataFrame(arr_3d, columns=["location", "product", "sales"])
.
.
.
ValueError: Must pass 2-d input. shape=(5, 3, 1)
``````

The solution?

`MultiIndex`.

Assuming that the ndarray is ordered by location/products, we could prepare a multi-index, flatten our ndarray and let Pandas reshape it according to the provided index.

Sweet!

``````index = pd.MultiIndex.from_product(
[range(dim) for dim in arr_3d.shape[:-1]],
names=["location", "product"],
)

df_3d = pd.DataFrame(arr_3d.flatten(), index=index, columns=["sales"])
print(df_3d)
``````
``````                  sales
location product
0        0            9
1            6
2            2
1        0            1
1            4
2            4
2        0            2
1            5
2            6
3        0            9
1            6
2            5
4        0            1
1            6
2            1
``````

We just have sales corresponding to each location and product. What if the final `sales` dimension includes sales for yesterday/today (or for every month, every week, etc.) ?

## 3d : dimension = location x product x sales (multi)

``````arr_3d = np.random.randint(
low=1,
high=10,
size=(5, 3, 2),
)
print(arr_3d)
index = pd.MultiIndex.from_product(
[range(dim) for dim in arr_3d.shape],
names=["location", "product", "sales"],
)
``````
``````[[[1 9]
[8 6]
[9 4]]

[[4 9]
[3 9]
[1 8]]

[[5 2]
[9 9]
[1 9]]

[[4 5]
[7 4]
[7 7]]

[[6 9]
[4 2]
[7 1]]]
``````

No major changes. Pandas should handle it just like before. Just unstack the sales dimension and rename the columns for readability.

``````df_3d = pd.DataFrame(
arr_3d.flatten(),
index=index,
columns=["sales"],
)
df_3d = df_3d.unstack(-1).rename(
columns={0: "yesterday", 1: "today"},
)
print(df_3d)
``````
``````    sales            yesterday today
location product
0        0               1     9
1               8     6
2               9     4
1        0               4     9
1               3     9
2               1     8
2        0               5     2
1               9     9
2               1     9
3        0               4     5
1               7     4
2               7     7
4        0               6     9
1               4     2
2               7     1
``````

Do you know of other ways to switch between ndarray and DataFrame? Comment below :)