Geoff Chappell - Software Analyst
A detailed note about missing icons in the notification area eventually got prepared in January 2009 from an analysis done two years earlier. Sorry, but I get round to writing up only a fraction of what I discover, and even then it often takes a while. However, at the time of my investigation into this problem, I figured that having myself been so irritated by the problem, the least I should do is explain it for the many others who had been similarly affected. I therefore published elsewhere a brief explanation of the problem’s cause, with the observation that the problem is amenable to patching. Unfortunately, the website that I published to seems to have disappeared. It was not the most prominent that had turned up in a search at Google, but I do not contribute to sites that simply assume I accept cookies and allow ActiveX controls, and I’m anyway not very keen on having to register various amounts of personal information as a price for providing answers.
A search of Google two years after my first work on this problem reveals that the same cause and solution have been found by another, which I take to be independent discovery despite experience that many people’s notion of “serious research” is actually nothing more than collating other people’s work from the Internet. Anyway, my own research is nothing more than the intellectual slumming of reading Microsoft’s work as binary code in the software they sell. Whatever the means, the more who find what a mess Microsoft has made of this and, more importantly, what little Microsoft cares to do about it, the better.
What follows was posted as a comment to http://pointerx.net/blogs/glozano/archive/2006/08/29/Missing-Windows-XP-Taskbar-Notification-Area-icons.aspx on 15th January 2007. I believe it was then the world’s first assertion in public that the problem is caused by hard-coded timeouts in the Windows API function Shell_NotifyIcon.
As for apparently many others, this problem of missing icons has been irritating me for some years. It strikes more often not on my notebook - usually, but not only, for the power icon. But since it hardly ever affects my main work machine, I have not cared too much. It is anyway so obvious a defect that I expected the "fix" to be to upgrade Windows. (I still use Windows XP Service Pack 1a.) Still, from your site and countless others that turn up in a search on Google, I gather the problem is one for which some proper attention might benefit very many people. It really is a wonder to me that so many have apparently looked into this and not found the cause. The obvious starting point for any investigation would be the shell API function with which applications (understood broadly) may add an icon to the taskbar's notification area, namely Shell_NotifyIcon (in SHELL32.DLL). Even brief inspection of this function suggests an obvious hypothesis. Though Microsoft documentation of Shell_NotifyIcon doesn't say so, the function is subject to two time-related conditions that can cause failure. (Note that everyone's observations of the problem strongly suggest variability from the relative timing of probably unrelated activities.) One of these conditions is an explicit timeout for the sending of a message to the target window. The timeout is 4 seconds. Of course, the insertion of an icon into a toolbar is hardly likely to take anything like that time. However, the target (EXPLORER.EXE) must first get the message. The system may be so busy that it doesn't deliver the message in time. The target may be so busy that it doesn't pick up the message in time, e.g., because it has been flooded with other messages. The other condition is that the message is not to be sent if the target appears to be hung. According to Microsoft's documentation of SendMessageTimeOut, this means that 5 seconds have elapsed since the target was last seen to try picking up any messages. Clearly, during startup, there must be expected a lot of activity by many hands, such that someone may call Shell_NotifyIcon but EXPLORER doesn't get the message. Of course, it takes much more time to confirm this in a debugger than to hypothesise it. Even so, by patching SHELL32.DLL to extend the timeout and remove the "abort if hung" test, I was able to observe how long the message delivery was taking. On my notebook, which is admittedly one of those neat, small things and thus underpowered (933MHz), I had round-trip times for message delivery up to 12.658 seconds. No, I don't know how many of those digits actually are significant. As I see it, the patch presents no danger. A longer timeout (than 4 seconds) clearly is needed during startup. I set it to 60 seconds for my patch, and I have no problem of missing icons. My view is that if, at other times, EXPLORER needs 60 seconds to attend to messages, then something has already gone very, very wrong, and this patch would be the least of the trouble. There's really not much to be done - even by Microsoft in its source code - except to lengthen the timeout and remove the "abort if hung" test. On this point, it must be noted again that Microsoft's documentation of Shell_NotifyIcon does mention that the function can fail but omits to mention the significant cause of such failure. I would be surprised if many programmers think to test for failure: they'll think that the function is just about inserting an icon into a toolbar, what can go wrong with that, and this function is therefore one of those for which failure is merely a theoretical possibility and needn't be defended against in the real world. There will already exist a large body of code, both by Microsoft (e.g., STOBJECT.DLL) and by others, in which Shell_NotifyIcon is called and is assumed to have succeeded, even though there is no icon in the notification area. This is probably too late to change. The only thing to be done is to make failure less likely, i.e., by lengthening the timeout. Whoever has looked at this at Microsoft and not got anywhere with it, he should find another job. Indeed, the vast amount of supposed analysis that this problem has attracted on the Internet ought be to the embarrassment of everyone. As an industry, and even as consumers, we really ought to be able to do a lot better at studying the problems we observe in this pervasive software. Note that we don't need to wait for Microsoft to acknowledge a problem, or to explain it, or to confirm an explanation, or to devise a fix, though I do admit that we do need them when it comes to getting a fix implemented in their source code - and in that sense, the ball is now in their court.