c# - How to make xml to csv parsing/conversion faster? -
i'm using snippet below convert xml data(not formed) .csv format after doing processing in between. converts elements in xml data contain integer list testlist (list<int> testlist
). converts , writes file once match has been made. need use algorithm files several gb's in size. processes 1 gb file in ~7.5 minutes. can suggest changes make improve performance? i've fixed won't faster. appreciated!
note: message.tryparse
external parsing method have use , can't exclude or change. note: streamelements
customized xmlreader improves performance.
foreach (var element in streamelements(p, "xml")) { string joined = string.concat(element.tostring().split().take(3)) + string.join(" ", element. tostring().split().skip(3)); list<string> listx = new list<string>(); listx.add(joined.tostring()); message msg = null; if (message.tryparse(joined.tostring(), out msg)) { var values = element.descendantnodes().oftype<xtext>() .select(v => regex.replace(v.value, "\\s+", " ")); foreach (var val in values) { (int = 0; < testlist.count; i++) { if (val.tostring().contains("," + testlist[i].tostring() + ",")) { var line = string.join(",", values); sss.writeline(line); } } } } }
i'm seeing things improve:
- you're calling
.tostring()
onjoined
couple of times, whenjoined
string. - you may able speed regex replace compiling regex first, outside of loop.
- you're iterating on
values
multiple times, , each time has re-evaluate linq makes definitionvalues
. try using.tolist()
before saving result of linq statementvalues
.
but before focusing on stuff this, need identify what's taking time in code. guess it's spent in these 2 places:
- reading xml stream
- writing
sss
if i'm right, else focus on going premature optimization. spend time testing happens if comment out various parts of for
loop, see time being spent.
Comments
Post a Comment