On IIS6: A process serving application pool ‘MyAppPMool’ terminated unexpectedly. The process id was '1234'. The process exit code was '0xffffffff'

January 13, 2010 00:18 by Marc

Suddenly some apparently well performing IIS servers recently started reporting this error regularly. some of them were also running SharePoint or OWA. all of them configured to use Integrated Windows Authentication (IWA) as authentication mechanism.

The problem with IIS worker process is that it can have so many explanation depending on the code executed that you can easily waste a week until you find a reasonable explanation. In this case, all servers were affected, regardless of the application they run. So my first idea was “they might be under attack”. But that was not the case: performance counters related to the worker process did not give any sign of that, this was confirmed by the IIS logs. Next”usual suspect”, a patch recently installed: bingo, that was it. Here are the details:

  • The Security Update implementing “Extended Protection” for authentication in IIS (KB973917) was just deployed on all servers
  • All impacted servers are running Windows Server 2003 Service Pack 2
  • One or multiple application served by that application pool/worker process have IWA enabled
  • After intensive file version analysis, it appeared that numerous IIS-related files (EXE, DLL’s…) were with a version prior SP2

Due to the inconsistency of IIS files in combination with that extra hot fix, the worker process keeps crashing –> root cause found!

Now how to fix it:

  1. Perform an inventory of currently installed post-SP2 fixes. I personally do it in a very straightforward way using psinfo but I am sure you’ll find plenty of methods to do it the way you like
  2. Reinstall the Service Pack 2
  3. Redeploy post-SP2 hot fixes, see step 1
  4. Check installed IIS File versions
  5. If file versions are OK, Install KB973917

Additional information’s:

Note: Make sure you pay attention to the process exit code which is always 0xffffffff. If you see another code, it might of course have another cause.

Marc


Best Wishes, MVPx2, Congrats and FAQ

January 6, 2010 11:00 by Marc

First my best wishes to you all over the place visiting this blog and site. The number of visits as well as the time spent reading was really incredible in 2009 and all I can hope for 2010 is to be equally good or better.

I was also pleased to see my MVP status renewed for another year. I’d like to thank my MVP Lead Martine T for her nearly real-time support, Prasad G for the great chat and collaboration and Jose B (and his team of course) for the excellent work they delivered last year.

Besides this I’d like to sincerely congratulate two other MVP for their nomination : First, a newcomer named Tonino Bruno, who’s also a very good friend and part-time colleague. He’s, with his mates, the active hand behind the Belgian MS Exchange User Group http://www.proexchange.be/. Congrats Toni and remember you owe me a beer now ;) Second, I’d like to congratulate a “serial” French MVP (7 times nominated!) named JC Bellamy not only knowledgeable regarding Windows but also continuously reinventing the French vocabulary related to IT;)

Now for a Silly Little FAQ, since I got these questions numerous times, time to reply has come, at least as seriously as possible ;)

Q: Where does the sentence “Happy is the one who could enter the secret causes of things” come from?

A: This is the direct translation of the French sentence present on a large pillar in the entrance of the University of Guernon (France), a fictional school depicted in the move “The Crimson Rivers”. It totally reflects my (compulsive?) willingness to understand the bits and bytes therefore never being satisfied with the marketing or technical “claims”, very fashionable nowadays. Note: although I picked up this sentence, needless to say I strongly disagree with the philosophy in place in the at Guernon ;). I am not the one to shout louder, I prefer to say it right because real facts mean real safety.

Q: Where is Anthony, what is he doing?

A: At home or at work, doing fine! No seriously, although we started this blog together, he also has to deal with other personal and professional challenges keeping him away from this online activity, though I am not totally desperate about having some contributions from him coming one day ;)

Q: Who is reading the blog, what are the readers looking for and where are they from?

A: Who: I have very few names to give out ;) but mostly IT people of course!

A: What: mostly hands-on problem-solving information, although I got a lot of positive feedback from more pro-active posts

A: Where: everywhere, but with more than 50% located in North America

Q: Why ain’t you talking about movie anymore?

A: Things tend to change, I prefer not to confuse the audience too much with cinema-related content and focus on more on technique. Besides this, No I did not particularly like Cameron’s Avatar, which indeed blew my senses out but miserably failed in touching my soul. Did all talented story tellers definitely leave Hollywood?

Q: What about the long-time promised posts on Federated Search, people-picker and so on?

A: Still coming (but not as soon as I promised I’m afraid)

Marc


Enterprise File Services using Windows Server 2008 R2 – Comprehensive Guidance: Implementing an End-User Data Centralization Solution

November 9, 2009 17:44 by Marc

Microsoft has finally releases a truly great document late October: Implementing an End-User Data Centralization Solution. This documents covers in a very comprehensive while practical manner everything related to storing, protecting and and making accessible user data using Windows File Sharing.

Unlike many white papers out there, this one can be directly used for planning, building, implementing and operating with no room for improvisation. It comes with real-life metrics, group policy settings as well as scripts and tools.

Important to mention that it is also 2008 R2/Seven-ready and covers the use of FSCT.

Download

Marc


Disabling PAC Validation II: Won't Get Fooled Again

November 4, 2009 15:13 by Marc

I did not expect to receive so much feedback by mail regarding this (not so fascinating) topic. Not to mention referring sites and so on… This brought the motivation to loop the loop by testing on Windows Server 2008 (SP2) as well as on 2008 R2 in-depth in order to cover the whole stuff.

So in summary, when will PAC signature verification will finally occur?

The table hereunder summarizes possible scenario’s:

Server OS/
Target Application or Service
Server 2003 pre SP2 Server 2003 SP2 and above
with extra registry configuration
Server 2008,
Server 2008 R2
File & Print Sharing NO Validation NO Validation NO Validation
Exchange Server Validation NO Validation NO Validation
SQL Server Validation NO Validation NO Validation
IIS with application pool identity set to Local System or Network Service Validation NO Validation NO Validation
IIS with application pool identity set to a domain account Validation Validation Validation

So in short, the only difference between Server 2003 and 2008/2008 R2 is that with from 2008, you do not need to modify registry anymore since the default value is inverted.

Once again, the important point here is: if you configure Kerberos on a IIS farm (SharePoint or “simple” ASP.Net), PAC Validation will ALWAYS occur, regardless what you will do to prevent it UNLESS the application is granted and makes use of the right “Act as part of the operating system”.

If the target application is granted seTCB making use of it:

Granting the seTCB privilege is not sufficient because it will be disabled by default until the application effectively requests it. But why would it need it? For various reasons this privilege might be needed by the server application. 2 common usages are described in the sections below.

Protocol transition

Protocol transition is the ability for a server application to delegate user credentials to a back-end service using Kerberos while they were not initially provided under that form by the client.

In clear, this means that a user may be authenticated by a service using non-Kerberos protocols such as Basic, NTLM, Digest and this service, making use of that feature, will transform the credentials in order to propagate them to another server. Example: a user authenticates against SharePoint using NTLM, want to use reporting service while it runs on a 2nd server, the SharePoint server will perform the necessary transition to push (aka “delegate”) the user’s credentials to the SRS server using Kerberos.

IIS MVP Ken Schaefer gives an excellent overview on his blog: IIS and Kerberos Part 5 - Protocol Transition, Constrained Delegation, S4U2S and S4U2P.

Services For User

SU4 extensions are tightly linked to Protocol Transitions. In very very short, they allow, under certain conditions, an application to perform a logon on behalf of a user without knowing his/her password.

This feature is, for example, used in IAM/SSO products such as IBM TAM/WebSEAL or CA SiteMinder

For both technologies, since the user does not initially authenticate using Kerberos, there is no PAC to validate.

OK but finally, why is disabling PAC validation so important?

Well I won’t say it is “so important”. I might help improving performances under some circumstances.

Since, in short, the PAC is verified by the server application before granting a Ticket-Granting-Service (TGS) to the client, it does not occur at every request as long as the TGS remains valid (note: there are some exceptions to this rule). BUT in some case, this initial verification can take some times because 1) the client’s AD is far (in term of network, hops, latency, bandwidth…) from the server’s AD or 2) the client’s AS is too busy. This could therefore give the wrong impression that client to server authentication seem slow while you expect a big boost by switching to Kerberos.

Additional Resources

Marc


Enterprise File Services using Windows Server 2008 R2 - Tools you can’t afford to miss Part I: FSCT (Overview)

October 7, 2009 09:51 by Marc

Since I do not like one-liner posts, I recently started writing an (ambitious?) serie of posts covering enterprise file services based on Windows Server 2008 R2 based on a presentation I gave to certain customers of mine . The first “shot” is dedicated to a new swiss-army knife-like tool from Microsoft: FSCT, standing for File Server Capacity Tool.

Until now, validating a Windows file server setup has always been a difficult task since very few tools were available on the market to adequately simulate a realistic user load. In past, tools like NetBench were considered as references. Nowadays, if you’re lucky, you can rely on your own scripting toolkit, if you’re not, you may have use Intel’s NAS Performance Toolkit, which is not bad, but farm from being “enterprise” ready. I’ve even seen some people trying to benchmark file services using SQLIO…

Architecturally speaking, FSCT is similar to other load-tests tools you would use of web application, for example: it is made of a controller, a server (the one to be benchmarked) and one or multiple clients. Optionally, you can also include an AD domain controller in the picture in order to simulate AD-based authentication. Nevertheless, FSCT is also compatible with workgroup environments, but in a degraded manner.

Note: Combining roles is a possibility but as you expect, it may negatively affect the tests. So if you’re short on machines, combine wisely and keep the other roles off the “server” role.

On the other hand, you can conduct test campaigns from more that one client simultaneously, that’s where the architectural choice is paying: up to my knowledge, no other tool can do that today.

Plan and deploy your test environment carefully…

  • First, take time to read the whole paper included in the package carefully. Everything you need to know about the tool is in there
  • Practise before conducting the “real” tests: since the tool is command-line based and due to the way it is distributed among systems, you may not get the result you expect from your first try (it’s not point-and-click)
  • Make sure all components involved are healthy: server & clients (network configuration, drivers…, but also network components (switches and if applicable, routers or access points…). A single component improperly working may severly affect the result of the tests (argh, hard-coded duplexing/link speed)
  • Copy FSCT to all systems involved (unless the server is not Windows-based) and build your own batch files to speed-up the configuration, the execution and finally the cleanup
  • Unless you want to reach the limits of an HP Proliant DL 58x, do not bump the client + user count to the maximum, plan then realistically. An in any case, it is not advisable to conduct your test in production, especially, for the network in general as well as for the sanity of the AD you would populate users in...

Don’t be afraid of command-line based execution

Okay, there is no UI but who cares? Me? Yes I’ve made my own little WinForm apps to save me time (I’ll post it in the coming weeks) but frankly, once your config files and batches are ready (it take 2 hours max), command-line rules supreme over the mouse;)

And before you ask, no, there is no PowerShell support, it sounds a little old-fashion I admit

Plan multiple test scenario’s keeping in mind important factors such as:

  • CIFS/SMB Version: depending on the client and server/Configuration OS version, the usage of SMB2 will greatly improve performances under virtually any circumstances. If you plan to use pre-Vista client OS or maybe have a mix of them, take this into account in your scenario’s
  • SMB-related security settings like signing and so on also affect performances
  • Other security configuration like TCP/IP stack hardening or IPSec
  • The presence of a file-based Anti-virus: it is wise to test with and without. You might be surprised by the performance loss an A-V implies, particularly on heavily used servers of course. BTW, since most of (if not all) A-V are implemented as file system filter driver, do not simply disable it during the tests, uninstall it, for certainty
  • Take into account the other side-activities, particularly server-side like: backups (using shadow copies or third party solutions), monitoring or other background processing tasks that may affect tests (reporting and so on…)
  • So called “performance-boost” tweaks like cache manager, NTFS tweaks, disk alignment, cluster sizes… All-in all, they may greatly affect the results. BTW, I will dedicate another post to those tweaks and debunk some myths at the same time as well

What do you get from the tests?

Besides generating the load itself, FSCT, assuming you’re working in a standard setup, will provide detailed tests results containing the following useful information’s retrieved from the server and client(s):

Data collected from the following performance counters:

  • \Processor(_Total)\% Processor Time
  • \PhysicalDisk(_Total)\Disk Write Bytes/sec
  • \PhysicalDisk(_Total)\Disk Read Bytes/sec
  • \Memory\Available Mbytes
  • \Processor(_Total)\% Privileged Time
  • \Processor(_Total)\% User Time
  • \System\Context Switches/sec
  • \System\System Calls/sec
  • \PhysicalDisk(_Total)\Avg. Disk Queue Length
  • \TCPv4\Segments Retransmitted/sec
  • \PhysicalDisk(_Total)\Avg. Disk Bytes/Read
  • \PhysicalDisk(_Total)\Avg. Disk Bytes/Write
  • \PhysicalDisk(_Total)\Disk Reads/sec
  • \PhysicalDisk(_Total)\Disk Writes/sec
  • \PhysicalDisk(_Total)\Avg. Disk sec/Read
  • \PhysicalDisk(_Total)\Avg. Disk sec/Write

As well as the following metrics (correlated to the number of users simulated):

  • % Overload
  • Throughput
  • # Errors
  • % Errors
  • Duration in ms

Once the % Overload is higher than 0%, it will indicate the threshold above which your file server infrastructure does not scale anymore for the given number of users

Special Cases

Using FSCT against DFS-N

Can FSCT work against DFS-N: yes it does. But it will not allow you to stress-test the DFS part of your design since it has no knowledge of it and does no embark any technology to capture DFS’ behavior during the load test. Moreover, it may require to configure the “server” part as if it was a non-Microsoft file server (see below for details). Moreover, capturing performance counters on the server using FSCT itself might be an issue, the workaround being the good old “manual” capture using perform or logman.

Using FSCT against a failover cluster

Using FSCT against a failover cluster works perfectly but with one limitation identical as above: the tool will ne be able to collect performance counters directly. Instead, you will have to plan for manual capture on the node designated as owner of the file share resource, or on both if you wish to perform failovers during the tests.

Using FSCT against non-Microsoft File Server or NAS

Assuming you can leave with the same limitations as stated above, FSCT will work like a charm against non-MS file server, including SOHO devices. Depending on the server or device’ capabilities, you might be able to collect a reduced set of performance indicators using SNMP polling for example. Of course, FSCT itself does not include any SNMP but there are plenty of tools available and during the test I lead Cacti was very helpful.

Ready, Go?

Well, not 100% ready yet. The only workload scenario available at RTM release time being “Home Folders”, you might not be able to validate your setup realistically. But according to MS, an SDK is on the way in order to allow the creation of custom workloads. In the mean time, you can already start playing with the tool itself and with the customization of “HomeFolder” profile config file but you will not go far with that.

Additional Resources

In a coming post I will cover practical usage of FSCT.

Marc