Extract compressed files in Python

0
25061
Extract compressed files in Python

As previous article, I’ve shown you how to compress files in Python. To continue, I will show you how to extract compressed files in Python in this article.

Extract compressed files in Python

When creating ZIP package, we use zipfile module, and again, we use it to extract ZIP files.

Extract full contents

To extract, we call extractall() function to do the job.

from zipfile import ZipFile

z = ZipFile('FILE.zip', 'r')
z.extractall()
z.close()

The script is simple enough, it will extract all contents inside FILE.zip package to current directory.

If you want to specify the output directory for extracted contents, set the path for the function.

z.extract('output')

It will extract everything into output directory.

Extract single file

If you want to do selective extraction, you can use extract() function to extract a single file.

z.extract('data/US.txt')

It will extract the US.txt in data directory inside ZIP package into current directory.

If you want to specify output directory, use path= to set output directory.

Extract with password

The compressed ZIP file may require password for decompressing. You can set password on the option pwd= when calling extract() or extractall().

z.extractall(pwd='SECRET2019')

You can also use setpassword() to avoid passing pwd option.

z.setpassword('SECRET2019')

Extract tarball

Tarball is also a common compression format. To extract tarball file, you will need to use tarfile module, which is similar to zipfile module.

The usage is the same as using zipfile.

import tarfile

t = tarfile.open("FILE.tar.gz")
t.extractall()

Extract RAR file

This article cannot be completed without telling you how to extract RAR file.

To deal with RAR format, we will use rarfile module.

Even though, the module is not provided by official Python distribution, the author has tried to make the interfaces to be the same as zipfile as possible.

This is how you unrar files:

import rarfile

r = rarfile.RarFile('FILE.rar')
r.extractall()
r.close()

Similarly, you can also extract the encrypted RAR file by passing password with same option pwd=.

r.extract('FILE', pwd='SECRET2019')

r.extractall(pwd='SECRET2019')

There is one problem using this rarfile module, you need to install unrar executable, and config rarfile module.

rarfile.UNRAR_TOOL = "unrar"

Extract various compression format

To extract other compression format like 7-Zip, ACE, BZip2… you might need to use pyunpack module to handle.

Under the hood, pyunpack uses patool, a command utility to help dealing with common archive files. Therefore, you might need to install more utilities to use pyunpack module properly. Checkout its documentation for installation guide.

The usage is straight-forward for all archive formats.

from pyunpack import Archive

Archive('FILE.zip').extractall('OUTPUT_DIR')

The input archive can be any supported format by zipfile and patool.

Conclusion

There are many ways to extract compressed files in Python. Depending on the situation, you will need to pick up appropriate solutions.