illicitonion: 2012

Tuesday 4 September 2012

One of the most nefarious language features I've ever seen

Take a look at the following code:

file1.py:
from package.lib import do_something_which_raises
from otherpackage.myexception import MyException
try:
do_something_which_raises()
except MyException:
print 'Oops, raised'

file2.py:
from otherpackage.myexception import MyException
def do_something_which_raises():
raise MyException()

What would you say if I told you that this code didn't print 'Oops, raised!', but instead bubbled MyException all the way up to terminate execution?

"Sure, you have two classes named MyException".

You can see that both files import exactly the same exception class from exactly the same package I promise you, there is only one file named myexception.py, which contains exactly one class named MyException. If you walk all the entries of sys.path, you will only reach one definition of MyException.

"That's not possible!" I hear you say.

That's certainly what I thought. But I was faced with the fact that I'd moved a couple of files around in my source tree, fixed up the package references, and three quarters of my tests were failing. I had no bloody clue why.

It turns out, class equality in python depends not on whether two types were defined the same, but rather, whether they were loaded exactly the same.

You see, file1.py and file2.py are in different directories; the structure is actually:

./file1.py
./package/file2.py
./package/otherpackage/myexception.py

but both top_level and top_level/package are in my sys.path. . is always implicitly in sys.path, and I added /path/to/package to my $PYTHONPATH.

So when, in file1, I import from otherpackage.myexception, Python actually goes "otherpackage... Well, I don't have a file named that in the current directory... Or a folder named that... Let's start going through $PYTHONPATH".

For file2, it goes "Aha! The current directory has a folder named otherpackage! And inside it is a file named myexception! And that defines a class named MyException! You're sorted!"

But as far as Python knows, these files were reached differently. They have different paths to themselves (one is "/path/to/package" / "otherpackage/myexception.py" and the other is "./" "otherpackage/myexception.py" - the fact that, "." happens to be the same as "/path/to/package" is neither here nor there. Python's package cache, apparently, doesn't do that level of resolution¹ (I guess it assumes no one would be silly enough as to have overlapping sys.path entries).

So when I said "you will only reach one definition of MyException", I was only telling half a truth. You will only reach one definition, but you will reach it twice. And that confuses python.

So here's a warning for you. Don't have overlapping sys.path entries. And if you do, always reference every definition therein from a consistent top-level folder. Because if you get this wrong, equality isn't equality, and everything goes to hell.

1: This is conjecture; I've looked through the source of the equality functions, and exception definitions, in both 2.7 and 3.3 to see whether anything funky was going on there, I've not yet delved in to the package cache, but this seems like reasonable conjecture, I'll probably get around to reading through the package cache source at some point, and may follow up then with a blog post and/or patch to python.

Monday 27 August 2012

Python, driving you to do the right thing, sometimes

This week I've been writing my first real Python. I mean, I've hacked together 100-line scripts before, but I've been writing real code from scratch, structured, with tests and everything.

I've noticed some really nice things, and some really horrible things.

Nice thing: Python doesn't let you hash mutable collections:

>>> hash(set()) # mutable
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'

>>> hash(frozenset()) # immutable

133156838395276

>>> hash([1,2,3]) # mutable

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

TypeError: unhashable type: 'list'

>>> hash((1,2,3)) # immutable

2528502973977326415

This is great, because if you hash something, it tends to be in order to store it in a set or dict or something, and if you change the thing after hashing it, you've broken the contract of the set/dict.

Awesome. Guiding you toward doing the right thing.

Horrible thing: unittest¹ has a method: assertRaises. Great - a simple, concise way of asserting that a single call raises an exception.

Except I wanted to assert that something is raised, without specifying what, so I tried skipping the exception type from the call to assertRaises.

>>> import unittest
>>>
>>> def does_raise():
... raise Exception()
...
>>> def does_not_raise():
... return 1
...
>>> class TC(unittest.TestCase):
... def test_raises(self):
... self.assertRaises(Exception, does_raise)
... def test_does_not_raise(self):
... self.assertRaises(does_not_raise)
...
>>> unittest.main()
..
----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK

WAIT! I called self.assertRaises with a method which *does not raise*! It's in the name and everything! It feels like if the first arg to assertRaises is a callable, and not a type, unittest could perhaps at least warn, if not throw.

If I hadn't run this test before implementing the backing code to see it fail, and seen it passed, I would have blindly been assuming that my code correctly raised an exception (as my test showed!) when it didn't!

1: unittest from Python 2.7, backported as unittest2 before

Saturday 25 August 2012

Your app isn't all UI, don't test it like it is!

There are two common pieces of advice I give about testing software:

Get a walking skeleton up quickly, and drive your development from user-acceptance tests (testing from the UI down), inspired by the fantastic Growing Object-Oriented Software Guided By Tests
You should have very, very few UI tests¹.

These two pieces of advice seem to directly contradict each other. They do and they don't.

The idea behind 1 is that you start with a single UI test, and you write many smaller tests as you implement the sub-features required to make the UI test pass, but that having the UI tests gives you the really useful confidence that you get from actually being able to prove your app does all the things you say it does, and discourages you from writing code not required to make the UI test pass.

The idea behind 2 is that UI tests are slow, give you poorly localised feedback, are often flaky, and require writing a lot more test infrastructure and harnesses than smaller tests. So you want to avoid them.

My experience with trying to follow both pieces of advice at the same time has been pretty poor. Driving from UI tests encourages me to write UI tests for every single thing. Which is useful, but: a) slow, b) means I'm distracted from getting things done at a lower level, because I'm busy writing UI tests and infrastructure, and c) means I end up with a lot of UI tests.

There's a side-project I've been meaning to write for a while now, and a few times I've started it, gotten bogged down with UI writing and UI testing, gotten bored, and stopped. Yesterday, I started it again, but with a different approach, and it feels much better:

Don't have a UI.

UIs are great; they let people interact with your system. But that's not the thing I want to be testing when I'm implementing logic, or interacting with a data store, or doing any of the many other things which are nothing to do with the UI. When developing core functionality, the UI is really just a way to trigger calls in to your core library. UI tests are great for showing that your UI makes the right calls, and everything is wired up properly, but I definitely don't want them giving me confidence that my core library works! I want much, much smaller tests for that!

So what have I been doing? Well, I've been writing the library first. I know roughly what actions my UI needs to perform in my backend, so I can write code to perform those actions. My code is unit-tested and has some larger tests showing that it interacts with a (fake, in-memory) data store correctly and such, but there are absolutely no UI tests, because there is absolutely no UI! A whole host of things not to distract me!

Now, at some point I'm going to have a problem. One of the drivers for the Walking Skeleton is that integrations are hard, and should be done early so you can find integration problems, and at some point, I'm going to need to integrate my UI (when I write it) with my library, but I'm ok with leaving that to a later date. I like that my UI is largely independent of the implementation. It means my UI tests will really only be testing the UI, and how it integrates with the backend. It means when I'm trying to write the UI using whatever web framework I happen to choose, I can really focus on the UI and the web framework, and not be thinking about a database, or whether my loop can really always terminate. And hopefully, it means that I can write just a few UI tests, which exercise a few useful paths through my system, and show it works, without needing to test every little thing. Because I know every little thing works, I just don't know whether the UI pokes it correctly. So my UI tests can test exactly that.

There are other benefits too. My UI should be more independent of my backend, so I should be able to more easily switch out one or the other with a whole new implementation. Because my library has been designed to be an API for a UI, rather than being the implementation of the UI, opening up that API in other ways should be easier too, as I have a well defined interface. But mostly, it lets me focus on the thing I'm trying to do right now, without being distracted by a largely separate part of the system.

This is something I've been finding works great for me - How do you manage the trade-off between the confidence UI tests give you, and the pain they entail? How do you keep your test-pyramid bottom-heavy?

1: People are often surprised that, given most of my day job involves making UI testing easier, I don't think people should be doing it. I think of it as a necessary evil, which should be minimised. I really liked what someone said at CITCON in London this year, roughly: "Every time someone on my team wants to add a UI test, they have to justify to the team why the thing they're testing can't possibly be tested at a lower level. As a result, we don't have many UI tests, and that's just fine." - sorry I can't attribute the line!