How to define and interact with Modules and Packages in Python
In Software programming, a package is a collection of files and directories that contains various programs and entities for serving similar purposes.
Python provides it own style for packaging application code through Modules arranged into tree format.
What is a Python Module ?
It's a regular .py file that contains a set of variables, functions and classes which can be imported inside another python script.
Consider we want to build a python scientific package for mathematical computations, in order to do so we will define a class named vector as well as two methods dot_product and vector_by_scalar inside vector.py.
class Vector:
def __init__(self, vector: list):
self.vector = vector
def dot_product(self, other: list) -> float:
return sum(a * b for a, b in zip(self.vector, other))
def vector_by_scalar(self, scalar: float) -> list:
return [scalar * x for x in self.vector]
Now if we want to use the Vector class inside another script, we simply call the vector module which is the name of the file using the import keyword.
import vector
vect = vector.Vector([8.1, -9.3, -1.1])
Or by importing the Vector class directly from the vector module.
from vector import Vector
vect = vector.Vector([8.1, -9.3, -1.1])
How does python locate Modules ?
When we've imported the module named vector, the python interpreter searches these locations:
- Built-ins modules, you can see the full list by printing dir(__builtins__).
- The directory that contains the input script.
- Directories stored in the PYTHONPATH environment variable.
- The installation-dependent default, which includes the site-packages directory handled by the site module.
Now going back to our scientific package, we need not only vectors but also to handle scalar numbers, matrices, tensors and more.
A very straightforward way is to put all the files in the current directory but as the project goes bigger it will become a huge mess, so we need to reorganize it into subdirectories, each corresponds to a collection of modules aiming the same purpose.
How to turn this project into a Python Package ?
A python package is no other than a directory containing subdirectories and modules, with an additional __init__.py file on each level.
When a package is imported, __init__.py is implicitly executed and can do the following:
- Labeling directory as a python package.
- Specifying the submodules to be exported.
- Running initialization code, for example assigning a value to a global variable (at the directory level) or create an instance of a class.
- Initializing the __all__ variable, a list of public objects wrote inside the current directory or a sublevel.
About __all___
It's useful when we want to import * from a package, for example let's take a look at the corresponding __init__.py of the subpackage matrices.
from .basis_change import basis_change
from .dim_reduction import dim_reduction
from .gradient import gradient
from .matrix import Matrix
__all__ = ['basis_change', 'dim_reduction', 'Matrix']
Only objects defined inside the __all__ list can be imported using the import * statement.
Relative imports
When the package is composed of subpackages like our case, Relative imports are used to exchange objects between modules and subpackages using dots, for example if we want to use Scalar and Matrix inside the gradient module it will be as following:
from .matrix import Matrix
from ..scalars import Scalar
Absolute imports
Absolute imports are mandatory to import objects and use them in a main script, for example if we try to use a relative import inside a main script at the scalars package level as following:
from .scalar import Scalar
if __name__ == '__main__':
scalar = Scalar()
This will fail and raise an ImportError.
But using and absolute import.
from scientific_computations.scalars.scalar import Scalar
if __name__ == '__main__':
scalar = Scalar()
Will definitely do the work.
Nested imports
When the package is ready, Users, Test Engineer or anyone willing to use an object from the scientific_computations package inside a main script outside the package is going to perform an absolute import, For example:
from scientific_computations.matrices.matrix import Matrix
if __name__ == '__main__':
matrix = Matrix()
But imagine if we have five more sublevels of packages, the import statement will be too long and considering the workload we've put on the programmer this is a bad practice. A better way to do that is with exploiting the __init__.py we've defined already to build a bridge between the user and the object.
Now we can import the Matrix directly as the following:
from scientific_computations import Matrix
if __name__ == '__main__':
matrix = Matrix()
In that way, we can expose the most used objects directly to the programmer, avoiding the dotted import fashion and importing private objects used within or between modules.
๐ Thanks for reading. Any feedback is welcome, you can DM me on LinkedIn. ๐