While working on my personal general-purpose task distribution system, (antnest), I came across the problem of serializing the methods of the task units to be assigned to the slave nodes for them to work on. The purpose is to implement something similar to Remote Procedure Calls but without having to define all the callable procedures ahead of time or statically.

Then I found the inspect library that lets us inspect live objects. It provides the getsource(object) method which, given an object, retrieves the source for the object (Duh!). So let’s put this code in a file, save it, and run it with python inspect-test.py:

1
2
3
4
5
6
import inspect

def square(num):
    return num * num

print(inspect.getsource(square))

This prints the source of the function square as expected.

So what if we try to getsource() for square, exec() the source and run getsource() on it? It should work and print the source for square right?

1
2
3
4
5
6
7
8
import inspect

def square(num):
    return x * x

exec(inspect.getsource(square))

print(inspect.getsource(square))
1
2
3
4
5
6
7
8
9
10
Traceback (most recent call last):
    File "inspect-test.py", line 8, in <module>
        print(inspect.getsource(square))
      File "/usr/lib/python3.4/inspect.py", line 830, in getsource
        lines, lnum = getsourcelines(object)
      File "/usr/lib/python3.4/inspect.py", line 819, in getsourcelines
        lines, lnum = findsource(object)
      File "/usr/lib/python3.4/inspect.py", line 667, in findsource
        raise OSError('could not get source code')
    OSError: could not get source code

Nope! Turns out you need to have the source loaded from a file somehow (imports etc.) for getsource() to work. The reason might be that it cannot keep all the source code in memory. Either way, this doesn’t work for us because we will be sending the source over the network, receive it on the other end, somehow execute the source and add it to the local/global scope/namespace. So the option I went with is to kind of cache the source (put it in a file under the cache directory) and __import__ it from there. Having taken care of this annoyance, we move on to using it.

Pickle, meet steroids

What we want is something that is similar to the python pickle module, but more generic. What we can do is iterate the object’s attributes, and recursively call our serialize() function on it. Iterating over the object’s attributes can be easily done with the built-in dir() function.

So consider the object cat, an instance of class Cat:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import types

class Cat:
  num_legs = 4
  cuteness = 9001

  def __init__(self, fur_color, eyes_color):
    self.fur_color = fur_color
    self.eyes_color = eyes_color

  def cute(self):
    print("Meow!")


cat = Cat('brown', 'blue')
cat.cuteness = 9100

def cuter(self):
  print("*rolls over*")
  print("Meow!")

setattr(cat, 'cute', types.MethodType(cuter, cat))

We want to serialize cat it to its JSON representation which might look something like this:

1
2
3
4
5
6
7
{
  'class': 'Cat',
  'num_legs': 4,
  'cuteness': 9100,
  'fur_color': 'brown',
  'eyes_color': 'blue',
}

But what if we also wanted to be able to serialize the cute() method? What if some cat also rolls over and then “Meow!”s to look cute? We can use types.MethodType class to create a bound instance of a new method cuter() which we can then inject into the cat (so morbid!).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import types

class Cat:
  num_legs = 4
  cuteness = 9001

  def __init__(self, fur_color, eyes_color):
    self.fur_color = fur_color
    self.eyes_color = eyes_color

  def cute(self):
    print("Meow!")


cat = Cat('brown', 'blue')
cat.cuteness = 9100

def cuter(self):
  print("*rolls over*")
  print("Meow!")

setattr(cat, 'cute', types.MethodType(cuter, cat))

Now calling cat.cute() prints “rolls over” and then “Meow!” as expected. But since the cute() method is now instance-specific, we would also like to be able to serialize it to get something like the following (after injecting the new method):

1
2
3
4
5
6
7
8
{
  'class': 'Cat',
  'num_legs': 4,
  'cuteness': 9100,
  'fur_color': 'brown',
  'eyes_color': 'blue',
  'cute': 'def cuter(self):\n  print("*rolls over*")\n  print("Meow!")\n'
}

This can be easily done if we run inspect.getsource(cat.cute). This looks a lot like pickling then but the methods bound to the class instances are also serialized. The __init__ method wasn’t serialized because it isn’t instance-dependant. There are also a number of other methods that the class inherits, none of which are serialized either.

Unpickle, meet steroids

On the receiving end, we can check deserialize the methods by caching and __import__ing them, as shown above. The other attributes types (like strings, numerical etc.) can be trivially deserialized with the python json library. The deserialized methods can then be injected into a new instance of the class.

But what if we also want a new class on the receiving end? We can easily extend this method to cache and __import__ a new class if we wanted although my project (antnest) doesn’t need or use this.

Conclusion

Even though this method seems cumbersome, it can be made more robust and systematic then the ad-hoc overview of the method described here. It exploits the dynamic nature of python to add arbitrary new procedures to RPC systems which is what my project aims to achieve. It takes away the hassle of setting up RPC systems, defining interfaces on both ends, implementing the interfaces, updating the implementations etc.. All that needs to be done is that we write the simple python code that we are all familiar with and love and simply have the system take care of sending over the new implementations and “defining interfaces” (or something like that) on both the client and the serving ends.