When I tried to execute my
gopher daemon, pygopher in a
I faced some problems and thus tried to understand how the memory management of the modules in Python works.
This page describres my observations and is not a correct nor complete description of the memory management in Python.
They are probably incomplete and wrong and are not based on the official documentation (which I probably can’t understand).
When a Python program is started, I can observe that Python:
- load a module when the line
import <module>is reached (it finds in the known paths of the PYTHONPATH variable)
- load all modules requested by a module loaded in the previous case
- does not load a module when the line
import <module>is conditionnaly executed (dynamic loading)
- possibly unload the modules that are not used
- keep in memory the absolute path of every module already loaded
In the case of a
chrooted process, the last point is problematic: Python does not update the path when the process
goes in the
chroot for already loaded modules.
I tried to force the unload/reload (
importlib.reload) of a module but:
- this is fully functionnal nor recommanded for
coremodules (that are used everywhere)
- a module can produce a huge dependence tree so it is not possible to reload all modules
virtualenv won’t help: it only copies a subset of the
You should juggle (symbolic links) with absolute paths that point:
- in a folder inside the
- in the system folders
- in the
chrootmaking it believe it is not in a
virtualenv doesn’t allow to manager different versions of Python.
The solution is to use pyenv.
It allows to build every version of Python in any directory.
It thus simplified the management of the paths.
To overcome the problem of the paths kept in memory for
core modules, I simply created a symbolic link inside the
chroot in order to fake the chrooted processus and make it believes the path still exists.
Differences with Perl (or others)
I designed my
finger daemon, pfinger, with the same goals.
I didn’t face the same problems with Perl because it loads all modules in memory once and fo all (except for dynamic loaded ones): the memory management is totally different (no GC).
I don’t have a lot of experiences with Python but this problem helps me to learn quite useful things.
I also had to write my code in a more reusable design:
- modules for important functionnalities
- good timing to load modules