regex - grep string between two other strings as delimiters -
i have report on how many times css class appears in content of our pages (over 10k pages). trouble is, header , footer contains class, grep returns every single page.
so, how grep content?
edit: looking if page has list-unstyled
between <main>
, </main>
so use regular expression grep? or need use powershell have more functionality?
i have grep @ disposal , powershell, use portable software if option.
ideally, report (.txt or .csv) pages , line numbers class shows up, list of pages suffice.
edit: progress
i have in powershell
$files = get-childitem -recurse -path w:\test\york\ -filter *.html foreach ($file in $files) { $htmlfile=[system.io.file]::readalltext($file.fullname) $regex="(?m)<main([\w\w]*)</main>" if ($htmlfile -match $regex) { $middle=$matches[1] [regex]::matches($middle,"list-unstyled") write-host $file.fullname has matches in middle: } }
which run command .\findstr.ps1 | export-csv c:\tools\text.csv
it outputs filename , path string in console, put not add csv. how can added in?
what ansgar wiechers' answer says advice. don't string search html files. don't have problem worth noting not html files same , regex searches can produce flawed results. if tools exists aware of file content structure should use them.
i take simple approach reports files have enough occurrences of text list-unstyled
in html files in given directory. expect there 2? if more show there enough. have done more complicated regex solution since want line number came compromise.
$pattern = "list-unstyled" get-childitem c:\temp -recurse -filter *.html | select-string $pattern | group-object path | where-object{$_.count -gt 2} | foreach-object{ $props = @{ file = $_.group | select-object -first 1 -expandproperty path patternfound = ($_.group | select-object -expandproperty linenumber) -join ";" } new-object -typename pscustomobject -property $props }
select-string
grep
tool can search files string. reports located line number in file why using here.
you should output looks on powershell console.
file patternfound ---- ------------ c:\temp\content.html 4;11;54
where 4,11,54 lines text found. code filters out results count of lines less 3. if expect once in header , footer results should excluded.
Comments
Post a Comment