Understanding processing time

December 7th, 2011 No comments

I want to give a brief tutorial on understanding CPU processor utilization since it is commonly an area of confusion during performance analysis.

CPU time

CPU time is the percentage of clock ticks than a processor spends waiting for instructions. This is opposed to wall-clock time, which is the total time the whole computer takes to perform an operation. The “load” on a system is the ratio of clock ticks which are performing operations versus the click ticks spent waiting for instructions over a given time period. Thus, load only makes sense as an average over a particular time period. During any single clock tick, the CPU is either processing an instruction or a HLT.

Idle time

The CPU processes a set of instructions fed from memory. The memory contains a set of opcodes (commands) which tell the CPU which operation to perform and the memory where it is located. The rate of clock ticks is constant – several gigahertz in a modern CPU – and during each clock tick, the CPU processes either an instruction to do some work, or a HLT – an opcode which tells the CPU to turn keep components idle until the next cycle.

Monitoring CPU time

So if we were to “observe” a CPU at the clock-tick level of detail, we would see that it’s always either working or waiting. But we can’t actually do that, since the more closely we monitor clock cycles, the more clock cycles are needed to do the monitoring. So to get an accurate picture of activity, we have to step back to a level where the monitoring tool does not interfere too much with the process being observed.

Input/output overhead

Any given computer task has several components: CPU instructions, memory IO, disk IO, network IO, and many others. Only extremely simple tasks (like calculating π) can fit in the small (but very fast) memory buffers on the CPU itself. Real-world tasks will almost always require the CPU to wait for other components to finish shuffling data back and forth in a state where it is ready for the CPU to work on it. But the ideal scenario is for the CPU to be kept as busy as possible (constant 100% load) until the task is completed. This is the minimum possible time in which a task can be completed. If we were to monitor CPU activity during such a task, we’d see the load jump to 100% during the task than back to the baseline when it’s done.  But if the task is shorter than the measurement frequency of the CPU monitoring tool, it would be an average over two periods, or it might not be detected at all.

Multitasking

The situation is more complicated when there are multiple tasks to run, each off which requires some fraction of CPU time. The operating system will run the tasks concurrently. Both the processing time and the IO of tasks will overlap, but because the operating system takes turns running each task, the total CPU load will be some combination which may be higher or lower than the total of the individual tasks – depending on the other competing resources the tasks use. For example, two disk-intensive tasks will use less CPU than the sum of both running individually because the disk IO will be the bottleneck. But two CPU-intensive tasks would more than double total load because the CPU will have to run both tasks and have to handle the context switching between concurrent tasks.

Further reading

Thoughts on ORMs & Entity Framework performance

December 3rd, 2011 No comments

An ORM provides value by doing a lot of things for you – virtualizing databases as native objects and converting types automatically. But the work the ORM does to reduce developer work has a cost too – it has an inherent performance penalty and may encourage some bad development practices.

An ORM by definition has to do some work to convert a database schema into a native view. ORM’s can try to minimize this performance penalty by defining the transformation at design time or caching the database to native object transformation at different points.

Entity Framework is built on top of ADO.Net, which is just an API that does not “know” anything about the database. To convert .Net code into SQL queries and back into CLR objects, Entity Framework performs a set of operations at different stages: (1) compile time (2) first run (3) each execution. To improve performance, EF allows you to shift some work from step 3 to 1 or 2. But the details can be tricky and understanding the best optimization strategy requires understanding the EF query execution pipeline.

The EF execution pipeline

The following information is based on this MSDN EF Performance page.

There are six steps I want comment on in EF execution:

  1. loading metadata
  2. generating views
  3. preparing the query
  4. executing the query
  5. tracking
  6.  materializing objects

1: Loading metadata: The metadata is the mapping defined in your EDMX file. This is a very expensive operation (see the breakdown here), but it only happens once. EF applications use more memory and have a warm-up penalty which should be ignored in performance analysis if you are not concerned with startup times.  Models with with very large (200+) number of entities have a number of problems – see this and this.

2: Generating views: The local query views are static objects which are cached per application domain. This is also an expensive operation but it can be pre-generated and embedded in the application if you care about first run times.

3: Preparing the query: Each unique query must be compiled into the EF version of a stored procedure before it is executed. Microsoft says that the commands are cached for later executions, but I’m confused about what exactly is cached because profiling shows that the query is compiled every. In any case, Microsoft suggest caching the compiled query to avoid this penalty.

Caching precompiled queries is somewhat unwieldy, so it is only advisable in performance-critical contexts, but it makes a big difference for frequently executed queries.

(Note: Take care when using .Count() or Any() to avoid unnecessary query recompilation and avoid unnecessary enumeration when a simple boolean check will do.)

By the way, all the steps above can be skipped by using direct SQL queries or stored procedures with EF. The performance of direct entity SQL falls between pure ADO.Net and Entity Framework queries, so it is only useful as the occasional exception to LINQ queries against a data store.

4: Executing the query: Query execution time depends on the underlying data source. Entity Framework is just a library built on top of ADO.Net, so pure ADO.Net queries are the a benchmark for any ORM’s built on top of it. Because, ADO.Net will always be faster than any framework built on top of it, it can be used as a fallback when other options have been exhausted.

5: Tracking: tracking is used to track changes for updates. If you only need to read data, you can get a small performance boost by disabling it.

6: Materializing objects: each object returned from the database must be converted into a class instance to be used. There is no way to avoid this penalty, but it is worthwhile to keep in mind that the less data there is, the faster it will materialized – not to mention transferred over the wire.

For best performance, queries should be as specific as possible. I have noticed that ORM’s encourage the bad habit of always selecting an entire row. This is because developers using an ORM tend to think of the database as an object repository rather than as a relational store, whereas raw SQL queries encourage manually selecting the needed rows. (Unless one has the awful habit of “select *”.) So to minimize overhead, pull just the data you need (the MVVM pattern is helpful in this regard.)

Final thoughts:

As the ADO.NET program manager himself has pointed out, Entity Framework is inherently slower than ADO.Net SQL queries. But performance should always be balanced against productivity. There are some applications which are definitely not suitable for an ORM and many others that are. Some operations can only be done using raw SQL and can take forever using an ORM (“truncate table”, temp tables, etc).  I think the best approach is to use an ORM where appropriate and optimize in the specific scenarios where performance is inadequate. I do think that Entity Framework is the best, simplest, and the safest choice out of all .Net ORM’s.

These are just a few tips on Entity Framework performance. There are other tips and much more information in the pages below and the links therein.

More reading:

Categories: development Tags: ,

The agile sprint cycle & team roles

July 27th, 2011 No comments

 

From a recent presentation: 

Categories: development Tags:

Optimal compression for IIS7

June 23rd, 2011 No comments

httpCompression in applicationhost.config should look like this:

<httpcompression directory="%SystemDrive%\inetpub\temp\IIS Temporary Compressed Files" maxDiskSpaceUsage="1024" noCompressionForHttp10="false" noCompressionForProxies="false">
      <scheme name="gzip" dll="%Windir%\system32\inetsrv\gzip.dll" dynamicCompressionLevel="9" />
      <scheme name="deflate" doStaticCompression="true" dll="%windir%\system32\inetsrv\gzip.dll" dynamicCompressionLevel="9" />
      <dynamictypes>
        <add mimeType="text/*" enabled="true" />
        <add mimeType="message/*" enabled="true" />
        <add mimeType="application/x-javascript" enabled="true" />
        <add mimeType="*/*" enabled="true" />
      </dynamictypes>
      <statictypes>
        <add mimeType="text/*" enabled="true" />
        <add mimeType="message/*" enabled="true" />
        <add mimeType="application/javascript" enabled="true" />
        <add mimeType="application/pdf" enabled="false" />
        <add mimeType="application/x-zip-compressed" enabled="false" />
        <add mimeType="text/css" enabled="true" />        
        <add mimeType="*/*" enabled="true" />
      </statictypes>
    </httpcompression>

For some reason text/css was not compressed by default for me and I had to add it.

Categories: code Tags:

Render tags in templates as a partial view with MVC3

May 18th, 2011 No comments

Suppose you are rendering templated content. Sometimes you want to reference partial views (or actions) in your templates and have them render with attributes provided by your template. One option is to use a Razor templating engine. But I just needed to render partial views based on a custom tag format, so I came up with my own solution:

        /// <summary>
        /// Supports parameterized tags such as <view:leadform city=NewYork name="Susie Q" country=USA />
        /// Call with @Html.ParseCmsContent(yourTemplateText)
        /// </summary>
        /// <param name="helper"></param>
        /// <param name="content">Your template</param>
        /// <returns></returns>
        private static string RenderPartialViewTagsInTemplate(HtmlHelper helper, string content)
        {
            const string controlPrefix = "<view: ";             
           foreach (var index in content.IndexOfAll(controlPrefix))        
            {      
           string attributeString = content.Substring(index, content.IndexOf(">", index) - index);
                var attributes = attributeString.Split(' ').ToList();
 
 
                string viewName = attributes[0].Substring(controlPrefix.Length);
                var properties = new ViewDataDictionary();
 
                attributes.Where(a => a.Contains("=")).ToList().ForEach(attribute =>
                                                                            {
                             string key = attribute.Substring(0, attribute.IndexOf('='));
                             string value = attribute.Substring(attribute.IndexOf('=') + 1).Trim('"');
                             properties.Add(key, value);
                                                                            });
 
                var leadForm = Html.PartialExtensions.Partial(helper, viewName, properties); // tags are added to the ViewBag
// By the way, you can render an Action tag with "var tagHtml = Html.ChildActionExtensions.Action(helper, viewName, properties);        "                
                content= content.Insert(index, leadForm.ToHtmlString());
            }
            return content;
        }
 
</view:>

Categories: code Tags: , ,

Protected: Getting around packet-inspecting firewalls with free VPN+proxy tools

May 6th, 2011 Enter your password to view comments.

This post is password protected. To view it please enter your password below:


Categories: tutorials Tags:

Reading Excel files in .Net

May 6th, 2011 No comments

This should work for most Excel versions including both xls and xslx:

 
const string ExcelConnString =
                @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml;HDR=YES"";";
 
            var adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", String.Format(ExcelConnString, physicalPath));
            var ds = new DataSet();
 
            adapter.Fill(ds, "anyNameHere");
            var data = ds.Tables["anyNameHere"].AsEnumerable();
 
            EnumerableRowCollection<etag> tags =
                data.Where(x => x.Field<string>("tag") != string.Empty).Select(x =>
                                                                                 new Tag()
                                                                                     {
                                                                                         Description = x.Field</string><string>("Description")
                                                                                     });
 
 
</string></etag>

Categories: code, development Tags:

Cross browser multi-columns with JQuery and CSS3

March 23rd, 2011 1 comment

column-count was proposed in January 2001, candidate in December 2009. It’s supported in WebKit and Mozilla via extensions and Opera directly, but it’s not in IE9. Y U no support columns IE9? That’s OK, we can work around this with columnizer:

<script type="text/javascript">
if ($.browser.msie && $.browser.version < 10) { // am I a hopeless romantic for assuming that IE10 will support it?
        $('.multicolumn').columnize({
            width: 600,
            columns: 3
        });
    }
</script></script>
<style type="text/css">
/* Support for Webkit, Mozilla, Opera */
div#multicolumn, .multicolumn {
	-moz-column-count: 3;
	-moz-column-gap: 20px;
	-webkit-column-count: 3;
	-webkit-column-gap: 20px;
	column-count: 3;
	column-gap: 20px;
	width:600px;
}
</style>

Caching with a SHA1 hash (C# 4.0)

March 22nd, 2011 No comments

Here is the cache code:

#region don't process the same address twice
            if (newEmail.EmailHash == 0) newEmail.EmailHash = newEmail.Email.GetSHA1Hash();
            if (emailsToAdd.Contains(newEmail.EmailHash)) 
            { return false; }
            emailsToAdd.Add(newEmail.EmailHash);
#endregion

For this implementation, I am returning the hash as a 32-bit integer because I am using it for caching, not security. It’s a little faster to use an int32 index than the default 160 hex digest. If avoiding collisions is important (only 4,294,967,295 values in an int32), remove the BitConverter.ToInt32 call and return a string.

It’s for C# 4.0 because it’s an extension method.

Here is the hash class. Call with STRING.GetSHA1Hash().

public static class SHA1Hash
    {
        private static SHA1CryptoServiceProvider _cryptoTransformSHA1;
 
        public static SHA1CryptoServiceProvider CryptoProvider
        {
            get { return _cryptoTransformSHA1 ?? (_cryptoTransformSHA1 = new SHA1CryptoServiceProvider()); }
        }
 
        public static int GetSHA1Hash(this string stringToHash)
        {
            return string.IsNullOrWhiteSpace(stringToHash) ? 0 : (Hash(stringToHash, Encoding.Default));
        }
 
        public static int Hash(string stringToHash, Encoding enc)
        {
            return BitConverter.ToInt32(CryptoProvider.ComputeHash(enc.GetBytes(stringToHash)), 0);
        }
    }

Categories: code Tags:

Basic AJAX with MVC3 & Razor

March 14th, 2011 2 comments

Suppose you have a page which requires you to load content inline in response to some action.  In the below example, you have a filter and a “preview” button which needs to show some sample data.

You could have the button do a post and have the MVC handler parse the form fields and render the view with the data you want to show.  But, you may have a complex page with many tabs and want to optimize form performance.    In such a case, you want to send just the form data for the current tab and inject the response into the current page.  Here is how to do that:

Here is the HTML of the relevant section of the “Export” tab:

<fieldset id="previewFld" title="Preview Results">        
 
<div class="btn green preview">
<a href="#">Preview</a></div>
<div class="clear"></div>
 
<span id="preview">&nbsp;</span>    
</fieldset>
 
<div class="btn orange submit">
<a href="#">Export to CSV</a></div>
<div class="clear"></div>

Note the two href’s with attributes of btn.  Here is the relevant JavaScript:

$(document).ready(function () {
 
    $('.preview').click(function () {
        $('#preview').html('<strong>Loading...</strong>');
        $.post('Home/Preview/?' + $(this).closest("form").serialize(), function (data) {
            $('#preview').html(data);
        });
    });
 
    $('.submit').click(function () {
        $('.submit a').html("Loading...");
        $(this).closest("form").submit();
        return false;
    });
 
});

The “Export” button submits the entire form.  The “Preview” button on the other hand:

  1. Shows the “loading” text.
  2. Serializes the content of the parent form
  3. Posts the serialized content to a url
  4. Renders the response from that URL in the designated div

Here is Preview.cshtml:

@{
    Layout =null;
}
@model  Models.ExportFilter
 
<h2>@ViewBag.Message</h2>
 
<strong>Name, Source, Email, DateAdded</strong>
<ul>
    @foreach (var item in Model.MarketingList)
    {
	<li>@item.FirstName, @item.Source, @item.Email, @item.DateAdded.ToShortDateString()</li>
    }&nbsp;
</ul>&nbsp;

Note that I am overriding the default Layout because I don’t want to show the normal header/footer. _Blank.cshtml contains solely “@RenderBody()”

The handler for the /Preview target is:

[HttpPost]
        public ActionResult Preview(ExportFilter filter)
        {
            var export = new EmailExporter(filter);
            List emailList = export.Process();
            ViewBag.Message = string.Format("{0:##,#} records to be exported", emailList.Count); 
            return View(filter); 
        }

Now when I click “preview”, I get a momentary “loading” screen and then the rendered view.

Categories: development Tags: , , ,