c# - How to make xml to csv parsing/conversion faster? -


i'm using snippet below convert xml data(not formed) .csv format after doing processing in between. converts elements in xml data contain integer list testlist (list<int> testlist). converts , writes file once match has been made. need use algorithm files several gb's in size. processes 1 gb file in ~7.5 minutes. can suggest changes make improve performance? i've fixed won't faster. appreciated!

note: message.tryparse external parsing method have use , can't exclude or change. note: streamelements customized xmlreader improves performance.

foreach (var element in streamelements(p, "xml"))                 {                     string joined = string.concat(element.tostring().split().take(3)) + string.join(" ", element.                         tostring().split().skip(3));                     list<string> listx = new list<string>();                     listx.add(joined.tostring());                     message msg = null;                     if (message.tryparse(joined.tostring(), out msg))                     {                         var values = element.descendantnodes().oftype<xtext>()                         .select(v => regex.replace(v.value, "\\s+", " "));                          foreach (var val in values)                         {                             (int = 0; < testlist.count; i++)                             {                                 if (val.tostring().contains("," + testlist[i].tostring() + ","))                                 {                                     var line = string.join(",", values);                                     sss.writeline(line);                                 }                             }                         }                     }     } 

i'm seeing things improve:

  • you're calling .tostring() on joined couple of times, when joined string.
  • you may able speed regex replace compiling regex first, outside of loop.
  • you're iterating on values multiple times, , each time has re-evaluate linq makes definition values. try using .tolist() before saving result of linq statement values.

but before focusing on stuff this, need identify what's taking time in code. guess it's spent in these 2 places:

  1. reading xml stream
  2. writing sss

if i'm right, else focus on going premature optimization. spend time testing happens if comment out various parts of for loop, see time being spent.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -