Embedding Python in my App Part II The Desolation of Documentation

In our last thrilling adventure, I mentioned a few ways I didn’t put Python into my app.  Having settled on Boost.Python, the documentation naturally contained a few simple examples to get me started, and some very obscure reference material, but relatively little of the intermediate stuff.  You know, like basically all documentation for anything that has some nonobvious gotchas, it never makes it easy to figure out YOUR problem when you don’t know enough about it to understand what’s going on.  This post covers a few specific warts that I lost time on.  They won’t necessarily be what you lost time on, but that are the examples that would have shaved a week off of my project if I had found them all on one place at the right moment, so maybe you will find them useful.  Here’s the simple example from the docs:

#include <boost/python.hpp>

BOOST_PYTHON_MODULE(hello_ext)
{
    using namespace boost::python;
    def("greet", greet);
}

So, let’s start with BOOST_PYTHON_MODULE(foo).  According to module.hpp, that resolves to BOOST_PYTHON_MODULE_INIT, which can have a slightly different definition depending on platform.  And that definition contains some other preprocessor macros, which have some other macros.  It’s quite a few layers to pick apart what’s actually going on, even if you are willing to sit down and try to read Boost code, which is something you should generally try to avoid doing.  So, what does it do that you actually care about?  It creates a function called void initfoo() that runs executes the stuff in the module block at run time.  Initially, after a brief glance over the docs I had assumed that the “boost::python::def()” was doing some magic to generate classes at compile time, and I didn’t realise that it was just a function that executed at runtime.  So, you can stick any C++ in there that you like, such as ‘std::cout<< “When does this happen? << std::endl;’ when you are trying to figure out what is going on, or do something “interesting.”  And, if you are embedding Python in your app, rather than building a standalone module, you will need to deal with that function yourself.  Which isn’t so bad, except that in order to mesh with what the Python libs expect since Python is written in C, you need to be aware that the function initfoo() is declared as ‘extern “C”.’  At least if the module declaration and your actual embedding setup are happening in different places.  Which is never the case in teh simple examples in the docs, but in teh context of a larger application that will definitely be an issue.

Okay, so moving on to dealing with classes.  The next slightly fancier mini example shows exposing a simple class to Python.

#include <boost/python.hpp>
using namespace boost::python;

BOOST_PYTHON_MODULE(hello)
{
    class_<World>("World")
        .def("greet", &World::greet)
        .def("set", &World::set);
}

But if you have, for example, a Node class that can reference a Graph, which the Graph class can also reference a Node, you might try this:

bpy::class_<Node>("node").def("getGraph", &playback::node::getGraph);
bpy::class_<Graph>("nodalGraph").def("getNode", &playback::nodalGraph::getNode);

And then you’ll spend some time angry about a rather vague compiler error, and lose half a day after you through things were going swimmingly based on how easy the trivial example built.  You may even start looking at all the other options I looked at in the first blog post and see if maybe they will work better than Boost.  But fear not.  The problem is that we can’t create a method of Node that refers to a Graph until we first create the Python binding for Graph.  Of course, by that logic we also can’t do the binding for Graph first since it has a method that deals with Node which would need to be bound first.  Continue to fear not!  It’s a little nonobvious in the trivial example, but binding a class and binding the methods of the class don’t need to happen at the same time.   boost::python::class_<foo>(“bar”) constructs an instance of the template class named “class_” which is templated with type “foo” and exposed to Python with the name “bar”.  (In practice you’ll usually want foo and bar to be the same name.)  That instance is something that we can hang on to.  Which means we can make a bunch of them for all the various classes that we need to wrap.

auto nodeBinding = bpy::class_<NodeWrap, boost::noncopyable>("node", bpy::no_init);
auto graphBinding = bpy::class_<GraphWrap, boost::noncopyable>("nodalGraph", bpy::no_init);

nodeBinding.def("getGraph", &playback::node::getGraph, bpy::return_internal_reference<>());
graphBinding.def("getNode", &playback::nodalGraph::getNode, bpy::return_internal_reference<>());

See, that’s not so bad, right?  I am using noncopyable and no_init because I want to create these things only on the well manicured C++ side.  A Node should ALWAYS be made by the Graph, and the Graph should always live in a project, etc.  I just wanted Python to have access to existing objects, so that was the easiest way to do it given things like private constructors, and no copy constructor.  I’m only about 10% sure that return_internal_reference is actually correct for my use of returning bare pointers.  I’m still slogging my way through some understanding there.

So, what about when the methods you are trying to bind are ambiguous?  Or you need to return arrays of stuff?  Well, you need member function pointers when you have methods with the same name.  Which is something that I use rarely enough I think they look really weird.  This is another thing that took me a while to figure out because I got confused by the docs, and I couldn’t find a useful example.  BOOST_PYTHON_MEMBER_FUNCTION_OVERLOADS sure sounds like it would have something to make the pain go away, but that didn’t turn out to be what I needed.  So, we’ll start with the declaration of a generic parameter class in the header:

class Parameter : public SomeBase {
public:
    explicit Parameter(OutputType T, std::string parameterName, QObject *parent);
    explicit Parameter(OutputType T, std::string parameterName);
    virtual void setValue(int newValue);
    virtual void setValue(double newValue);
    virtual void setValue(Color newValue);
    virtual void setValue(std::string newValue);
    std::vector<std::string> getTags();
private:
    // Some implementation nonsense to arcane to consider exposing to mere Python scripters.
}

Then this is the Python binding for that class.  Note that one of those returns a vector of strings.  You’ll need to bind every template variation of vector that you use, even for standard stuff like strings, so I include my binding of a vector of strings using the vector indexing suite for the sake of a slightly more complete example.


auto parameterBinding = bpy::class_<core::Parameter, boost::noncopyable>("Parameter", bpy::init<core::OutputType, std::string>());
bpy::class_<StringList>("StringList")
 .def(bpy::vector_indexing_suite<StringList>() );
parameterBinding.def("setValue", static_cast<void (core::Parameter::*)(int)>(&core::Parameter::setValue) )
 .def("setValue", static_cast<void (core::Parameter::*)(double)>(&core::Parameter::setValue) )
 .def("setValue", static_cast<void (core::Parameter::*)(core::Color)>(&core::Parameter::setValue) )
 .def("setValue", static_cast<void (core::Parameter::*)(std::string)>(&core::Parameter::setValue) )
 .def("getTags", &core::Parameter::getTags);

Note that the original C++ class has two constructors, but I only expose one to Python.  That’s fine.  So far, I haven’t really needed to expose every last constructor that seems useful in C++ to the scripting engine. YMMV, just pick the constructor you like based on the types it uses as above.  Then wait to see if anybody files any support tickets about actually needing another constructor.  Then I have a bunch of different versions of setValue() which take different types and do different things and have the same name.  It’s admittedly not something to hold up as a great example of perfect design, but you will see similar things in real C++ projects, so now you have an example.  Doesn’t that pointer to member function syntax seem gross?  I think “static_cast<void (core::Parameter::*)(double)>(&core::Parameter::setValue)” belongs in a god damned zoo.  But it does work.

And since we have figured out how to do bindings for a bunch of cool stuff now, that brings us to actually doing the Python embedding.  All of the trivial examples are based on building a module to run from standalone Python.  But if you want to script an existing app, it’s not always practical to build your app as a giant python module and have the python side do all the driving.  In my case, I had an existing native C++ app that I wanted to add scripting features to.  So, the next post will be about some of that magic.  In the context of an app with multiple modules to be exposed, and how to layer things so that the UI can trigger Python, but Python can also expose the UI in a cross platform way.  You knew it was going to be spread across three posts when you saw the first one was titled “An unexpected Journey!”

Embedding Python in my App Part I An Unexpected Journey

Boost Python is pretty awesome.  It’s a way to wire up your C++ code to Python without having to go all the way into the fairly low level Python C API.  I have been working on a C++ application using Qt, and I wanted to embed Python in the app to allow user scripting.  There’s a lot I could write about that, but I’m lazy, so maybe some of the subtleties of allowing users to script your app will wind up in another post.

I looked at a few alternatives to Boost before I settled on it.  Since I am working on a Qt app, I looked at PyQt and PySide and how they do things first.  Both of them are bindings for the whole Qt framework with a bunch of UI and miscellaneous stuff, but also have systems for wrapping that stuff which you can use for your own code and classes.  Exposing Qt to my users would have been pretty neat, so they could build their own UI’s, or talk to databases or network services with the Qt classes pretty much “for free.”  PyQt licensing is a bit tricky, and I’m not certain if my app will eventually be a commercial product, so I didn’t want to deal with the complexities.  On a personal project like this one, sometimes laziness is the best way forward.  That said, it works well, and the license cost is not at all unreasonable.  If I had a boss on this project who handed me PyQt as the way forward, I probably would have been able to make it work.  PySide is similar to PyQt.  Instead of being managed by a third party like PyQt, PySide is developed in-house by the maintainers of Qt, Nokia erm I mean Digia.  Nope, I think it’s the Qt Company for the moment.  That said, I need Qt 5.4 support, and PySide seems to have gotten of given up at Qt 4.x.  There may be some work maintaining it someplace that I am not looking, but the official release doesn’t seem to be very current.  They also don’t support Python 3, which I want to support going forward.  My current build is done with Python 2.7, but I’d rather support it from the get-go than have a big panic when I decide I can’t possibly live without the latest greatest python next month.  Shiboken, the wrapper generator used with PySide is also terribly documented.  In theory it supports doing your own stuff without involving PySide, but finding a simple guide to doing it was frustrating, and I had no idea what I was doing.

I tried SWIG, but it hates namespaces.  Dealing with it in a real live modern-ish C++ code base proved to be really annoying.  I could wrap simple classes, but wrapping QObject derived classes spread across multiple modules in different namespaces using SWIG proved to be remarkably similar to bashing my face against the mouth of an angry lion.  It’s not that it can’t be done, but you’ll frequently find your face being ripped off if you do.  Also, SWIG is very much targeted toward making Python modules, rather than embedding python.  While I did make simple Python modules in my experiments, I never did manage to get it embedded in my app properly.

Which brings me to Boost Python.  It builds as a part of my existing code, so it works fine with whatever my real app is.  On the other hand, while SWIG and Shiboken are generators for the binding, Boost requires me to actively expose stuff by hand.  I would have much preferred to have my Python bindings ‘just work’ which is basically impossible with Boost.  In an ideal world, I’d just write doxygen comments in my header files, run a generator, and magically have a fully documented, working Python API for my app.  Oh well.  Programming sucks, and you have to do work to make the program that you want.  I’ll dig more into the nitty gritty of using Boost.Python in my next post, now that I have established the rationale for using it that led me down that path.

A small weapon in the war on redundant data

I recently discovered a neat utility called rdfind that searches a path for duplicate files.  Dedup can be super useful when you realise you have hundreds of GB’s of redundant data floating around your PC.  (I had about 400 GB after moving a bunch of scattered data from several smaller hard drives to a new 3TB drive I just bought.  A lot of the drives had copies of some of the same data.)  It’s pretty easy to install (it’s in the standard repositories for apt-get on Ubuntu)  and to use:

will@will-desktop:/storage/test$ sudo rdfind /mnt/Quantum2/
 Now scanning "/mnt/Quantum2", found 259947 files.
 Now have 259947 files in total.
 Removed 0 files due to nonunique device and inode.
 Now removing files with zero size from list...removed 651 files
 Total size is 1445615350230 bytes or 1 Tib
 Now sorting on size:removed 72229 files due to unique sizes from list.187067 files left.
 Now eliminating candidates based on first bytes:removed 117614 files from list.69453 files left.
 Now eliminating candidates based on last bytes:removed 5872 files from list.63581 files left.
 Now eliminating candidates based on md5 checksum:removed 5166 files from list.58415 files left.
 It seems like you have 58415 files that are not unique
 Totally, 394 Gib can be reduced.
 Now making results file results.txt

It has some other options to do things like delete duplicate files automatically.  But, that terrifies me.  So, I don’t use that feature.  The result after it cranks away for quite a while is a file called results.txt with everything you need to know to go wastehunting.  Unfortunately, the output format is a bit obscure if you just want to know what will free up the most space easiest:

# Automatically generated
 # duptype id depth size device inode priority name
 DUPTYPE_FIRST_OCCURRENCE 58483 3 1 2082 1710946 1 /home/will/Downloads/pattern-2.6/Pattern.egg-info/not-zip-safe
 DUPTYPE_WITHIN_SAME_TREE -58483 3 1 2082 1710957 1 /home/will/Downloads/pattern-2.6/Pattern.egg-info/dependency_links.txt

It doesn’t directly give you the count of a given file, or the total waste for a given file.  It just gives you a file ID for each file, and the size of each copy.  You have to count,multiply, and sort by yourself to understand where your worst offenders are.  So, I wrote a little python script to process that file and save me counting file ID’s on my fingers.  It’s not anything fancy, but it looks like this:

will@will-desktop:~$ python Documents/rdproc.py /storage/test/results.txt
...
(4618027008, 1539342336, 3, '/mnt/Quantum2/vidprod/libby/TableRead/2013.09.17/PRIVATE/AVCHD/BDMV/STREAM/00000.MTS', 152486)
(4633560010, 2316780005, 2, '/mnt/Quantum2/vidprod/mdwm/WholeDriveBackup/Caitlin/MDWM/Transcoded', 162706)
(4927753612, 2463876806, 2, '/mnt/Quantum2/vidprod/mdwm/WholeDriveBackup/Caitlin/MDWM/Transcoded', 162593)
(5807821978, 2903910989, 2, '/mnt/Quantum2/vidprod/mdwm/WholeDriveBackup/MDWM', 160710)
(7562474188, 3781237094, 2, '/mnt/Quantum2/vidprod/mdwm/WholeDriveBackup/Caitlin/MDWM/Transcoded', 162601)


The order of output is explained at the github link.  But, it lets me easily see that my biggest waste comes from having a bunch of footage from My Dinner With Megatron, as well as a backup of a Whole Drive that was used during production.  Hence, 2 copies of a bunch of that stuff that I can merge back down quite easily.  I also have no fewer than 3 copies of a Table Read that I shot for a friend quite a while ago because I never cleared that memory card, and wound up re-importing it a few extra times after I shot more stuff on it.  As you can see, having everything sorted and summed makes it a lot easier to understand than if you were to try and use the results.txt file yourself.  So, feel free to use the python script I wrote.  It’s not complicated or fancy, but I figure it may be useful enough to save somebody from having to reinvent it for themselves.  Let me know if you find it useful.