I wrote a table-data editor using ProfGrid.  It takes advantage of ProfGrid's ability to copy and paste data between a ProfGrid and Excel.  A bottleneck appeared when copying some of the larger tables (1221x120) to the Windows Clipboard.  SaveToStrings() took about a second, but CopyToClipboard() took over 26 minutes.

   Granted, I don't have the fastest machine in the world.  But still, this suggests a problem.  And a closer look confirms it.
   CopyToClipboard() is showing an O(n2) pattern in the total number of cells copied.  When I cut the number of cells copied in half, the execution time dropped to about a quarter of its original time.  Halving it again, cut the time to about 1/16.
   I strongly suspect that the bottleneck can be traced to use of a single, growing buffer (a string) for the clipboard text.  Before long, the string must be reallocated, which means copying the entire text (so far) into a new buffer.  Each time this happens, the text is longer, and takes longer to copy.

   The performance is consistent with a buffer that grows by the same number of bytes each time.  If N is the text length, then, on average, N/2 bytes get copied N/c times, c being some constant.  This yields an estimate of N2/(2c) as the number of bytes copied.

   But we can do much better.

   SaveToStrings() seems to avoid this by not copying older parts of the buffer.  Each row is saved as an independent string, which thereafter needs copying only once, at the end of the process.

   In my case, I have 1221 rows, so my N (for the row) is 1/1221 of the total, taking (1/1221)2 as much time.  Of course, I still have 1221 of these rows.  Still, the overall factor is 1/1221, which is a good fit for the execution time I observed.

   If all 1221x120 cells were in the same row, however, SaveToStrings() would see no benefit.

   One suggestion, then, to drastically improve the speed of CopyToClipboard(), is to have it build a sequence of strings, limiting the size of any one chunk of the whole.  Chunk size should be picked so that reallocation (and its attendant copying) are eliminated, or at least drastically reduced. 

   Once you have a complete set of chunks, summing their lengths gives you the size of the final buffer you need.  This, too, can be allocated and filled in a time proportional to N, not N2

   Of course, if you could calculate, at the start, a worst-case buffer size, then you could fill it in a single pass, and never copy any text more than once. 

   Such a size might be impractically large.  However, if you're willing to grow the buffer at a fixed percentage rate (e.g., by 100% of its then-current size), instead of a fixed number of bytes, then you could start with a somewhat smaller buffer, and still drastically reduce copying costs.  The number of copies would be proportional to log(N), not N.

   I tend to favor the "chunk" approach as being less memory-hungry.

   Comments, anyone?