Friday, 22 August 2014

Creating a Report of Broken Web Pages in SharePoint

Recently I presented a session at SharePoint Saturday Melbourne on Being Crafty with PowerShell. One of the examples was how to audit all the aspx pages in a large farm. I want to share the first part of that script here now.

The example came from a large migration project I was working on. I needed to give the client confidence site pages where working ok, before business users started testing. The challenge was:

1. How to check a large farm with tens of thousands of pages
2. How to determine if the page is ok?

I addressed the issue using a PowerShell script that did the following for each page in the farm:

1. Hit the page, and get the HTTP response code
2. If the response code was 200, then download the page as HTML text, and check the contents for an occurences of "Correlation Id"
3. Log the results in a custom PSObject that could be saved and analysed later

I'll add the full script in a follow up post. In this post, I just want to demonstrate how to achieve these objectives for a single page.

Step 1. Create a custom PSObject to store the results.

For each page that I hit, I'm going to store the results into one of these objects. Each object will then be added to a collection (or an array). The object will record the URL's, page name, HTTP response code and if the page contains "Correlation Id".

$PageInfo = New-Object PSObject            
$PageInfo | Add-Member -MemberType NoteProperty -Name "WebUrl" -value ""            
$PageInfo | Add-Member -MemberType NoteProperty -Name "PageUrl" -value ""            
$PageInfo | Add-Member -MemberType NoteProperty -Name "Response" -value ""            
$PageInfo | Add-Member -MemberType NoteProperty -Name "ResponseCode" -value ""
$PageInfo | Add-Member -MemberType NoteProperty -Name "MatchFound" -value ""

Step 2. Get the response code for the page.

#url to check            
$urlToCheck = ""            
# Create a credential object that can be used to authenticate against SharePoint            
$credentials = Get-Credential -UserName $env:USERNAME -Message "Enter credentails for SharePoint"            
# Use the System.Net.WebRequest class to create a web request            
$wrq = [System.Net.WebRequest]::Create("");

# Set the Credentials property of the WebRequest object            
$wrq.Credentials = $credentials;            
# The GetResponse method returns a System.Net.WebResponse object, which has property called Status code.            
$wrp = $wrq.GetResponse()
# One thing to note, is that if the request fails (e.g. with HTTP 401), an exception is raised. So you need to capture the exception, and then use the Response property of the Exception (System.Net.WebException) to get the status code            
try {            
    #GetResponse() returns a System.Net.WebResponse object            
 $wrp = $wrq.GetResponse()            
} catch [System.Net.WebException] {            
    #If an exception was thrown, you can get the System.Net.WebResponse object from the Exceptions Response property            
 $wrp = $_.Exception.Response            
#Get the status code            
# The StatusCode is an enum, and can be cast to an Int to get the HTTP response number            

Step 3. If the page returned an HTTP 200 response, download the pages contents as text (HTML) and check it for the presence "Correlation Id"

if(([int]$wrp.StatusCode) -eq 200)            
    # The System.Net.WebClient class has a nice function called DownloadString(). We'll call it, passing in the same url, then check to see if the string returned contains "Correlation Id"            
    $wc = New-Object System.Net.WebClient;            
    # Reuse the credential object we created earlier            
    $wc.Credentials = $credentials;            
    # Store the downloaded string into a variable            
    $content = $wc.DownloadString($urlToCheck);             
    # To find if the page has an error, parse the html content looking for the text "Correlation Id"            
    $content.Contains("Correlation Id");            
    # Finally, dispose the WebClient object            

Step 4. Put all the information into the custom PSObject we created, and add it to an array.

# Create a instance of the object            
$pi = $PageInfo | Select-Object *            
# Set the properties of the custom object (like you set properties on any other object)            
$pi.WebUrl = $Web.Url;            
$pi.PageUrl = $urlToCheck;            
$pi.Response = $wrp.StatusCode            
$pi.ResponseCode = [int]$wrp.StatusCode            
$pi.MatchFound = $content.Contains("Correlation Id");            
#Print the contents            
#Create an array to store the object(s) in            
$results = @();            
#Add the current custom object to the array.             
$results += $pi;