I'm afraid that for this blog entry you're going to have to sit through some "bullshit-back-story".
For whatever reason our live cluster, running under System Center Virtual Machine Manager (SCVMM) 2008 R2 had gained duplicate copies of 3 virtual machines, and all the duplicate VMs (Virtual Machines) were marked as missing.
SCVMM is a relatively new "toy" for me, however I'm already starting to feel the love-hate relationship growing, probably indicated by the fact that I'm referring to it as "scum" to my co-workers and friends. My point being forgive me if this is common knowledge. It doesn't appear to be.
Delving deeper the Failover Cluster Manager MMC was showing only a single copy of the virtual machines in question. That narrowed it down to just SCVMM being the problem child. All attempts to perform a repair, or even an attempt at removal (despite my better judgement) in SCVMM itself resulted in a bullshit error message. Frankly a restore at 23:00 isn't exactly what I'd want to have been doing any way, so I was somewhat pleased by that.
Poking further there appeared to be a MS KB (KB2308590) that directly addressed this problem No joy.
So, doing what I always do in these situations, I started prodding at the database that powers SCVMM, using SQL Studio Manager. Yeah it's a GUI, but it was nearly midnight, and it was easier.
If you're using the default database you'll want to connect to COMPUTERNAME\MICROSOFT$VMM$. Otherwise it'll be where ever you specified at install. The table that interests most of all is the tbl_WLC_VObject. If you select with a where statement to find your problem machines you should fine that you have duplicate entries. Carefully choose the entry that is not running, and delete it. Luckily our duplicates had no tags, and no owners, so it was actually fairly easy to figure out which ones to remove. Close and reload the SCVMM console and you should find that you have a less scary looking SCVMM administration console.
I'm sure that there are some bits left behind in other tables that are referencing the VMs, however I'm going to bed. It can be tidied in the morning.
Unfortunately this isn't one of those success stores. But then again if I wrote about those I'd be hitting a few thousand posts a year, and plus they're really boring to write about.
We began the project by powering up some virtual machines and test importing the configuration from ISA 2006 to Forefront TMG 2010, and all appeared fine. The ruleset was there, the VPN configurations were there, and so on. Test data seemed to pass through nicely.
The migration went through and we put the box live, decommissioning the old ISA 2006 hardware. Everything seemed fine until larger quantities of traffic started passing through the box. The logging was showing a lot of packets getting dropped on the floor, but with no source, destination or protocol, active FTP and SIP traffic was also being problematic, and the box would randomly decide to stop passing everything, like the service had stopped. The irritating thing was that it simply wasn't consistent.
After poking into the configuration we started noticing that a lot of problems were evident in the configuration;
- The domain controllers computer set had entries that were flat out wrong and not present in the ISA configuration
- The Web Proxy Auto-Discovery Protocol (WPAD) file was wrong
- DNS was starting to go down VPN tunnels, but there were no DNS addresses configured on the interfaces
- And a whole host of other niggly issues
After fixing these the box was still randomly dropping things, but as the data flow increased (and not to extreme levels - we're talking a 10Mbit/s leased line here) so did the drop outs. At this point it was starting to become more than an irritation and more of a service affecting problem. I elected to rebuild it with non-R2 Windows Server 2008, and to manually create the configuration from documentation. Although I would've loved to have got to the bottom of the problem rolling back would've been as much of a pain at this point, and the customer was rightly beginning to get fidgetity.
So why non-R2 Windows Server 2008? A couple of reasons; All our other deployments of TMG 2010 are on non-R2 and are stable, we noticed our original test box for this project was non-R2, and there are also rumblings of other people having issues with R2 on a couple of technet threads. Although I'm not 100% convinced that R2 is to blame here frankly we didn't need R2, and I only wanted to do this the once as the whole job needed to be done out of working hours.
Since the OS rebuild and manual build of the configuration, touch wood, it seems to be a lot more stable. No more weird packets getting logged, no more weird FTP or SIP problems, no more random drop outs.
My thoughts on TMG 2010 aren't favourable at this point, but it's not just because of the problems. Ostensibly it feels like ISA 2006 with a few interesting bits bolted on, but unless you require ISA or TMG in your environment, I wouldn't recommend it. There's still no real IPv6 support, without SP1 it feels very wobbly, and for a few features that you might not need its an expensive upgrade.
Realistically you can pull off the same feature set with a different combination of products; a "real" firewall, and an internal proxy server, for example. This isn't to say that you shouldn't put TMG 2010 in anywhere. It does have some very useful features, but just look at your options carefully. Perhaps you don't need to upgrade. Perhaps you may find a better fit solution.
If you're using a combination of a scripting language, diskshadow and task scheduler to backup your Hyper-V machines take special care to make sure that task scheduler does not cut off the job whatsoever. Doing so can cause the host server to crash out, although it doesn't seem to be perfectly repeatable I've been able to track down an issue we were having at work where the power was blipping at a customer's site very briefly causing task scheduler to stop the job which immediately crashed out the host box. Unfortuantely it only seems to crash out in this circumstance, when attempting to backup certain virtual machines, although I'm yet to figure out a pattern.
Removing the "stop task if computer goes onto battery power" option and then ensuring that the UPS interface software takes care of it when the battery runs low is a good enough solution for us, for now.
- Dec 18, 2009 by the_angry_angel
- Geek, Work and Mindless Hatred
If you're even slightly geeky you will have seen any of several articles in the last 2 years that state "the URL is dead". With the inclusion of the search box in many browsers this is starting to become true, and is starting to present some interesting support challenges.
Every now and then you will need someone to visit a specific site, and you might not be able to connect to the user's device to assist. The solution in most cases is to politely educate the user (or get another user to assist) and move on, but I have had a few users who have been unable to understand the concept that the address/location input is actually what we're looking for. Perhaps the user has removed or shrunk the location bar so that its really insignificant, or perhaps they're just really too stressed to follow simple instructions.
For publically accessible websites the answer is to ensure that your site can be reliably found via all the major search engines, and have a link if necessary. This means that SEO becomes an important feature of your support framework. This is scary but something that very well will become a genuine systems and support concern.
Things get worse for internal-only addresses. In theory you shouldn't be in the position where you're not able to remotely assist a user inside of your own network, but lets face it, shit does happen - or it might be a guest/embedded device (such as a WiFi enabled phone). Whats the answer in this instance? Application level filtering and redirection in your proxy server(s)?
- Dec 13, 2009 by the_angry_angel
- Geek, Personal, Work and Mindless Hatred
The last 6 months I've seen 3 companies that I've used both professionally and personally for various services using CC to mass email their clients. This is not acceptable. As result one of my personal accounts is on various lists and receives a marked increase in junk mail.
The latest cock up came from MessageLabs. This is a company that provides email services. If they can't get this right, what hope is there for anyone else out there? If you're in the business of mass emailing any of your customers please, please either send individual mails or use BCC, and make sure that your staff understand why. It's not just a case of your customer's privacy, it's your company's also. Whos to say that on your list you don't have someone who want to steal your business?
This whole cock up doesn't fill me with confidence for MessageLabs, which is unfortunate as Symantec has bought Softscan, whom we use for mail filtering at work and they're now pushing new contracts onto the MessageLabs system instead. It begs the question as to whether or not they're actually technically a competent solution in comparison. In the past I've only had bad experiences. Anyone want to weigh in?