iOS - Differentiate between background text (watermark) and real text in PDF -


i have pdf watermark @ background of it. when start scanning highlighting word watermark or annotation @ background, gets selected found first in touch area.

using cgpdfscanner scan text.

question how detect if scanned text text @ background or real text in pdf? how differentiate between standard text , annotation text?

thanks.

in general have no chance reliably differentiate between "background" , "real" text. text drawn somewhere on page in order, , foreground, background, normal text, ..., matter of human perception , may not @ reflected in structure of pdf content stream.

you can try educated guesswork, e.g. assuming "real" text in strong colors while background text in lighter colors, or "real" text arranged in horizontal lines while background text more diagonal, etc. guesswork after all, nothing rely on sure.

on other hand, in case of tagged pdfs might have chance, watermark may tagged artifact data.

ps saw shared file again. in case of document heuristics mentioned work, background text greyish , printed diagonally.

thus, while scanning have keep track of fill color and/or transformation matrices. scanner finds text, know whether background or foreground based on current color and/or matrix value.

be aware, though, not easy documents.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -