Method and Object Serialization in Python
While working on my personal general-purpose task distribution system, (antnest), I came across the problem of serializing the methods of the task units to be assigned to the slave nodes for them to work on. The purpose is to implement something similar to Remote Procedure Calls but without having to define all the callable procedures ahead of time or statically.
Then I found the inspect
library that lets us inspect live objects. It provides the
getsource(object)
method which, given an object, retrieves the source for
the object (Duh!). So let’s put this code in a file, save it, and run it
with python inspect-test.py
:
1
2
3
4
5
6
import inspect
def square(num):
return num * num
print(inspect.getsource(square))
This prints the source of the function square
as expected.
So what if we try to getsource()
for square
, exec()
the source and run
getsource()
on it? It should work and print the source for square
right?
1
2
3
4
5
6
7
8
import inspect
def square(num):
return x * x
exec(inspect.getsource(square))
print(inspect.getsource(square))
1
2
3
4
5
6
7
8
9
10
Traceback (most recent call last):
File "inspect-test.py", line 8, in <module>
print(inspect.getsource(square))
File "/usr/lib/python3.4/inspect.py", line 830, in getsource
lines, lnum = getsourcelines(object)
File "/usr/lib/python3.4/inspect.py", line 819, in getsourcelines
lines, lnum = findsource(object)
File "/usr/lib/python3.4/inspect.py", line 667, in findsource
raise OSError('could not get source code')
OSError: could not get source code
Nope! Turns out you need to have the source loaded from a file somehow
(imports etc.) for getsource()
to work. The reason might be that it cannot
keep all the source code in memory. Either way, this doesn’t work for us because
we will be sending the source over the network, receive it on the other end,
somehow execute the source and add it to the local/global scope/namespace.
So the option I went with is to kind of cache the source (put it in a file
under the cache
directory) and __import__
it from there. Having taken care
of this annoyance, we move on to using it.
Pickle, meet steroids
What we want is something that is similar to the python pickle module, but
more generic. What we can do is iterate the object’s attributes, and recursively
call our serialize()
function on it. Iterating over the object’s attributes
can be easily done with the built-in dir()
function.
So consider the object cat
, an instance of class Cat
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import types
class Cat:
num_legs = 4
cuteness = 9001
def __init__(self, fur_color, eyes_color):
self.fur_color = fur_color
self.eyes_color = eyes_color
def cute(self):
print("Meow!")
cat = Cat('brown', 'blue')
cat.cuteness = 9100
def cuter(self):
print("*rolls over*")
print("Meow!")
setattr(cat, 'cute', types.MethodType(cuter, cat))
We want to serialize cat
it to its JSON representation which might look
something like this:
1
2
3
4
5
6
7
{
'class': 'Cat',
'num_legs': 4,
'cuteness': 9100,
'fur_color': 'brown',
'eyes_color': 'blue',
}
But what if we also wanted to be able to serialize the cute()
method? What if
some cat also rolls over and then “Meow!”s to look cute? We can use types.MethodType
class to create a bound instance of a new method cuter()
which we can
then inject into the cat
(so morbid!).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import types
class Cat:
num_legs = 4
cuteness = 9001
def __init__(self, fur_color, eyes_color):
self.fur_color = fur_color
self.eyes_color = eyes_color
def cute(self):
print("Meow!")
cat = Cat('brown', 'blue')
cat.cuteness = 9100
def cuter(self):
print("*rolls over*")
print("Meow!")
setattr(cat, 'cute', types.MethodType(cuter, cat))
Now calling cat.cute()
prints “rolls over” and then “Meow!” as expected.
But since the cute()
method is now instance-specific, we would also like to
be able to serialize it to get something like the following (after injecting
the new method):
1
2
3
4
5
6
7
8
{
'class': 'Cat',
'num_legs': 4,
'cuteness': 9100,
'fur_color': 'brown',
'eyes_color': 'blue',
'cute': 'def cuter(self):\n print("*rolls over*")\n print("Meow!")\n'
}
This can be easily done if we run inspect.getsource(cat.cute)
. This looks a lot
like pickling then but the methods bound to the class instances are also serialized.
The __init__
method wasn’t serialized because it isn’t instance-dependant. There
are also a number of other methods that the class inherits, none of which are
serialized either.
Unpickle, meet steroids
On the receiving end, we can check deserialize the methods by caching and
__import__
ing them, as shown above. The other attributes types (like strings,
numerical etc.) can be trivially deserialized with the python json
library.
The deserialized methods can then be injected into a new instance of the
class.
But what if we also want a new class on the receiving end? We can easily extend
this method to cache and __import__
a new class if we wanted although my
project (antnest) doesn’t need or use this.
Conclusion
Even though this method seems cumbersome, it can be made more robust and systematic then the ad-hoc overview of the method described here. It exploits the dynamic nature of python to add arbitrary new procedures to RPC systems which is what my project aims to achieve. It takes away the hassle of setting up RPC systems, defining interfaces on both ends, implementing the interfaces, updating the implementations etc.. All that needs to be done is that we write the simple python code that we are all familiar with and love and simply have the system take care of sending over the new implementations and “defining interfaces” (or something like that) on both the client and the serving ends.