Missing Icons in Notification Area

A detailed note about missing icons in the notification area eventually got prepared in January 2009 from an analysis done two years earlier. Sorry, but I get round to writing up only a fraction of what I discover, and even then it often takes a while. However, at the time of my investigation into this problem, I figured that having myself been so irritated by the problem, the least I should do is explain it for the many others who had been similarly affected. I therefore published elsewhere a brief explanation of the problem’s cause, with the observation that the problem is amenable to patching. Unfortunately, the website that I published to seems to have disappeared. It was not the most prominent that had turned up in a search at Google, but I do not contribute to sites that simply assume I accept cookies and allow ActiveX controls, and I’m anyway not very keen on having to register various amounts of personal information as a price for providing answers.

A search of Google two years after my first work on this problem reveals that the same cause and solution have been found by another, which I take to be independent discovery despite experience that many people’s notion of “serious research” is actually nothing more than collating other people’s work from the Internet. Anyway, my own research is nothing more than the intellectual slumming of reading Microsoft’s work as binary code in the software they sell. Whatever the means, the more who find what a mess Microsoft has made of this and, more importantly, what little Microsoft cares to do about it, the better.

What follows was posted as a comment to http://pointerx.net/blogs/glozano/archive/2006/08/29/Missing-Windows-XP-Taskbar-Notification-Area-icons.aspx on 15th January 2007. I believe it was then the world’s first assertion in public that the problem is caused by hard-coded timeouts in the Windows API function Shell_NotifyIcon.

As for apparently many others, this problem of missing icons has been 
irritating me for some years. It strikes more often not on my notebook - 
usually, but not only, for the power icon. But since it hardly ever affects 
my main work machine, I have not cared too much. It is anyway so obvious a 
defect that I expected the "fix" to be to upgrade Windows. (I still use 
Windows XP Service Pack 1a.) 

Still, from your site and countless others that turn up in a search on 
Google, I gather the problem is one for which some proper attention might 
benefit very many people. 

It really is a wonder to me that so many have apparently looked into this 
and not found the cause. The obvious starting point for any investigation 
would be the shell API function with which applications (understood 
broadly) may add an icon to the taskbar's notification area, namely 
Shell_NotifyIcon (in SHELL32.DLL). Even brief inspection of this function 
suggests an obvious hypothesis. 

Though Microsoft documentation of Shell_NotifyIcon doesn't say so, the 
function is subject to two time-related conditions that can cause failure. 
(Note that everyone's observations of the problem strongly suggest 
variability from the relative timing of probably unrelated activities.) 

One of these conditions is an explicit timeout for the sending of a message 
to the target window. The timeout is 4 seconds. Of course, the insertion of 
an icon into a toolbar is hardly likely to take anything like that time. 
However, the target (EXPLORER.EXE) must first get the message. The system 
may be so busy that it doesn't deliver the message in time. The target may 
be so busy that it doesn't pick up the message in time, e.g., because it 
has been flooded with other messages. 

The other condition is that the message is not to be sent if the target 
appears to be hung. According to Microsoft's documentation of 
SendMessageTimeOut, this means that 5 seconds have elapsed since the target 
was last seen to try picking up any messages. 

Clearly, during startup, there must be expected a lot of activity by many 
hands, such that someone may call Shell_NotifyIcon but EXPLORER doesn't get 
the message. Of course, it takes much more time to confirm this in a 
debugger than to hypothesise it. Even so, by patching SHELL32.DLL to extend 
the timeout and remove the "abort if hung" test, I was able to observe how 
long the message delivery was taking. On my notebook, which is admittedly 
one of those neat, small things and thus underpowered (933MHz), I had 
round-trip times for message delivery up to 12.658 seconds. No, I don't know 
how many of those digits actually are significant. 

As I see it, the patch presents no danger. A longer timeout (than 4 seconds) 
clearly is needed during startup. I set it to 60 seconds for my patch, and I 
have no problem of missing icons. My view is that if, at other times, 
EXPLORER needs 60 seconds to attend to messages, then something has already 
gone very, very wrong, and this patch would be the least of the trouble. 

There's really not much to be done - even by Microsoft in its source code - 
except to lengthen the timeout and remove the "abort if hung" test. On this 
point, it must be noted again that Microsoft's documentation of 
Shell_NotifyIcon does mention that the function can fail but omits to 
mention the significant cause of such failure. I would be surprised if many 
programmers think to test for failure: they'll think that the function is 
just about inserting an icon into a toolbar, what can go wrong with that, 
and this function is therefore one of those for which failure is merely a 
theoretical possibility and needn't be defended against in the real world. 
There will already exist a large body of code, both by Microsoft (e.g., 
STOBJECT.DLL) and by others, in which Shell_NotifyIcon is called and is 
assumed to have succeeded, even though there is no icon in the notification 
area. This is probably too late to change. The only thing to be done is to 
make failure less likely, i.e., by lengthening the timeout. 

Whoever has looked at this at Microsoft and not got anywhere with it, he 
should find another job. Indeed, the vast amount of supposed analysis that 
this problem has attracted on the Internet ought be to the embarrassment of 
everyone. As an industry, and even as consumers, we really ought to be able 
to do a lot better at studying the problems we observe in this pervasive 
software. 

Note that we don't need to wait for Microsoft to acknowledge a problem, or 
to explain it, or to confirm an explanation, or to devise a fix, though I do 
admit that we do need them when it comes to getting a fix implemented in 
their source code - and in that sense, the ball is now in their court.