sábado, 7 de marzo de 2009

Limiting memory usage of a python script

I've been running recently some python and java experiments with important memory requirements on a machine shared with other people, some of them with the same needs on that machine.
There were no restrictions on resources usage by a user or a process, so controlled memory usage was the user's responsibility.

Java has a parameter to limit the amount of memory the virtual machine can use. If it exceeds that value, the machine exits and the running program fails. It is specified with java -Xmx value.

I couldn't find something similar for python so I procrastinated a bit writing a bash script to control the processes.
I made a cycle that monitored the values from /proc/meminfo and killed the running process if some limit was reached.
The problem with that solution was that my experiments were not just one program, but many, controlled by a bash script, so this solution would just kill the bash script and not the process doing the important memory usage. So I added some extra code borrowed from the web to kill also the subprocesses. That made the code longer and more complex, leading to this:

control_run.sh:
PROGRAM=$*

#Also kills process children, and their children, and so on
KILL_CHILDREN=1

#Minimum free memory available (in Kb.)
MEM_FREE_LIMIT=300000

#Minimum free swap (in Kb.)
SWAP_FREE_LIMIT=500000

#Run the program in background and get its PID
$PROGRAM &
PID=$!

#While the program is alive
ps $PID > /dev/null
while [ $? -eq "0" ]; do


MEM_FREE=$(grep MemFree: /proc/meminfo | egrep [0-9]+ -o)
#echo "Free Memory: ${MEM_FREE}"

SWAP_FREE=$(grep SwapFree: /proc/meminfo | egrep [0-9]+ -o)
#echo "Free Swap: ${SWAP_FREE}"

if [ "${MEM_FREE}" -le ${MEM_FREE_LIMIT} ] || [ "${SWAP_FREE}" -le ${SWAP_FREE_LIMIT} ]; then

if [ "${MEM_FREE}" -le ${MEM_FREE_LIMIT} ]; then
echo "free memory limit reached, exiting...";
else
echo "free swap limit reached, exiting...";
fi

if [ "${KILL_CHILDREN}" -eq 0 ]; then
echo "killing process with id $PID"
kill $PID
else
#code based on http://www.unix.com/unix-dummies-questions-answers/5245-script-kill-all-child-process-given-pid.html
KILL_PIDS=$PID

CHILDREN=`ps -ef| awk '$3 == '$PID' { print $2 }'`
while [ "$CHILDREN" != "" ]; do
KILL_PIDS="$KILL_PIDS $CHILDREN"
OLD_CHILDREN=$CHILDREN
CHILDREN=''
for i in $OLD_CHILDREN; do
CHILDREN="$CHILDREN `ps -ef| awk '$3 == '$i' { print $2 }'`"
done
done

echo "killing process with id $PID and its children"

for i in $KILL_PIDS
do
echo killing $i
kill $i
done

fi
exit 1;
fi

sleep 1

ps $PID > /dev/null
done

echo "program finished"


I tested it with a small python script that just consumes memory:

memory_consumer.py:
import time
a = range(100000)
while True:
a += range(100000)
print "printing something"
time.sleep(1)


and some bash scripts to execute three instances concurrently:


memory_consumer_main.sh:
#!/bin/sh
./memory_consumer_child.sh


memory_consumer_child.sh:
#!/bin/sh
python memory_consumer.py &
python memory_consumer.py &
python memory_consumer.py



we would run:
$./control_run.sh ./memory_consumer_main.sh
printing something
printing something
printing something
printing something
printing something
printing something
printing something
printing something
printing something
printing something
printing something
printing something
printing something
printing something
printing something
free memory limit reached, exiting...
killing process with id 9348 and its children
killing 9348
killing 9349
killing 9350
killing 9351
killing 9353



Another possibility I found was setting the memory limit with ulimit.
We can set it from the command line and enable it for the bash session, or incluide the statements in the first lines of a bash script.

That will limit the allowed memory for any process

for example:
$ ulimit -v 40000
$ ulimit -H -v 40000


There, we limited the available memory for a process to 40Mb.
Then,
$ python memory_consumer.py
printing something
printing something
printing something
printing something
Traceback (most recent call last):
File "memory_consumer.py", line 4, in
a += range(100000)
MemoryError


While simple, the problem with this approach is that it will only kill the process exceeding the memory usage. If we are running just one isolated process, that's fine. But in my case, the batch would continue in an erroneous state and produce wrong results.
Any comment or suggestion, including "hey stupid, python has the option -blabla to do exactly that" will be appreciated.

1 comentario:

davi dijo...

Probably too late :) But I just stumbled upon this entry, and I've also just found out you can make the script kill itself when it exceeds some amount of memory (at least on Linux) with:

import resource
resource.setrlimit(resource.RLIMIT_AS, (soft_limit, hard_limit))

I don't really understand what the soft and hard limits are, but I just use the same value for both.