Geoff Chappell - Software Analyst
A gauche attempt on Twitter to suggest that poor performance of a relatively new Windows feature—described by Microsoft as “a highly-optimized platform security feature”—may have other explanations than incompetent choice of algorithm led to my thinking that I might usefully get up-to-date about the feature and especially that I should put my money where my mouth is about how observation is not explanation.
One of the things that consistently surprises me about that side of the software industry that examines technical shortcomings is that we easily enough prize observation but see little commercial imperative in converting our observation of a fault into an understanding of it. Much of the reason, of course, is that when the fault is in somebody else’s software—as it always is, of course—then the cost of understanding it is for the somebody else. Fair enough, too! Indeed, disciplined observation is not cost-free, either, and I leave for another time that even though you save the somebody else the initial grunt work of pinpointing for them where to start looking, you may yet be brushed off countless times by several levels of obtuse screeners before you get the attention of any programmer who actually knows the subject.
But I think this all disguises something: the cost of understanding, as most of us perceive this cost, is higher than it would be if we were more skilled at the understanding. Perhaps because we tend to think of this work as something to push off to others or to brush away, we tend not to cultivate the necessary investigative skills or to support the maintenance of huge amounts of background knowledge that give the skills more to feed on. Anything that we’re not well practised at inevitably seems more difficult and costly than it really is.
Now, this is not the place to talk in detail of the particular observations that I investigated for an explanation—or for me to write up the explanation. What Bruce Dawson observed was O(n^2) in CreateProcess from spending crazy amounts of time in one routine that he described rather too casually for my tastes as “too large to be easily reverse engineered”. To me, explaining Bruce’s observations was incidental work that very plausibly took me less time than Bruce spent on making them and even on writing them up. This is not at all to say that Bruce’s observations weren’t worth making or that they weren’t made well: disciplined observation is no small skill. It’s also not to say that an explanation isn’t worth having. Microsoft will easily enough fix its bug. Sufficiently interested “security researchers” will then pick apart what’s changed and get some idea of the problem’s cause. But I, for one, do count it as valuable to have an explanation in advance. A relatively slight coding error is credibly the cause of numerous reports over roughly five years that this highly promoted security feature is a poor performer despite what its manufacturer claims is high optimisation. We’re unlikely ever to have an account from Microsoft. Our society needs more resources for independent evaluation of what our software does. Our industry ought to be better at delivering this.
Of course, I too have no end of excuses about how I could do this or that study except that it’s impractical in competition with everything else that seems more important. I very likely never will write up this passing curiosity of Control Flow Guard’s poor performance at something that doesn’t directly affect me. I did, however, take it as a spur to catch up on the feature overall as an important development in Windows which my busy life had somehow left neglected. In the process I had to draw on a surprisingly wide range of notes on memory management in the Windows kernel, both old and new, some in good shape, most not, including many that I started piecing together in 2016 but did not publish before moving on to other things. Let’s see how far I can get with a publication effort this time.
It will be some time—a few weekends, anyway—before this work settles. It will be even longer before I return to the documentation I started of Image File Execution Options, including the Global Flags. That was all supposed to be new for April!