Are there any command line tools to find this out in Linux (NB: The values are numerically sorted).
This answer starts output results more quickly and efficiently.
In particular, necessary with tr, as it does not process file args.
The reputation requirement helps protect this question from spam and non-answer activity.
Not the answer youre looking for Browse other questions tagged shell text-processing or ask your own question.
This site is not affiliated with Linus Torvalds or The Open Group in any way.
Im working on files that are several gigabytes large, so performance is a key issue.
I wonder whether there is a tool that does just the counting in a single pass using a prefix tree (in my case strings often have common prefixes) or similar, that should do the trick in O(n) avglinelen.
You can use grep to filter out unique lines: sort FILE uniq -c grep -v 1.
So, for each 1st column of line in data file, the node of the array named dups is incremented.
It is linear in the number of lines and the space usage is linear in the number of different lines.
OTOH, the awk solution needs to keeps all the different lines in memory, while (GNU) sort can resort to temp files.