Concatenate Excel Files
Have you ever been asked to analyze dozens of Excel files containing the same data structure? This is a common request that can easily be automated using a programming language such as Python. In this post, we will discuss how to concatenate Excel files into a single spreadsheet, which will result in increased productivity and fewer opportunities for errors.
- Risk Management – Data may be transferred in small files to limit exposure if a single file is intercepted
- Privacy Concerns – Data is reduced to a small files which accumulate over time (e.g.: HIPAA regulations)
- Data Storage – Data is simply stored in this fashion (not recommended)
Regardless of the cause, we now need to find a quick solution. If we imagine the Excel files as free-roaming piggies, as the data scientist cowboy/cowgirl, you could throw your Python lasso to wrangle them up!
Read Excel files into Python
- See more here
# Find file names in the specified directory loc = 'C:\Users\Filepath_to_Excel_Files\\' files = os.listdir(loc)
Wait, what if I don’t want to include the Facebook photos and my torrented Jay-Z album stored in this filepath in my analysis?
# Find the Excel files files_xlsx = [f for f in files if f[-4:] == 'xlsx']
This code iterates through each item stored in the files list, and saves any with an xlsx extension into the files_xlsx list. Now that we’ve identified the Excel file names, we can focus on concatenating Excel files.
Concatenate Excel files
# Create empty dataframe and read in new data sales = pd.DataFrame() for f in files_xlsx: New_data = pd.read_excel(os.path.normpath(loc + f), 'Sheet1') sales = sales.append(New_data)
Write to Excel
# Output new dataframe to Excel sales.to_excel(loc + '\\sales_merged.xlsx')