Listing Terms and Acronyms in text or LaTeX

·

2 min read

I often need to check for inconsistent capitalization in my tex files. So listing all the consecutive capitalized words and characters helps me to decide which one is intentional capitalization and which one is not. The following bash script has two functions can lists all terms (Capitalized Phrase) and acronyms used throughout the input file.

To reuse save the code shown at the end as $HOME/shortcuts.sh then issue command source $HOME/shortcuts.sh. use terms and acronyms functions as shown below.

$ terms filename.tex 
     19 Cloud Station
      9 Sensor Gateway
      7 Sensor Cloud Infrastructure
      7 Resource Registry
      ...
$ acronyms filename.tex      
     34 VM
     13 PM
     13 IaaS
     13 CPU
     ...

And here is the shortcuts.sh.

#!/bin/bash
# source shortcuts.sh
# terms filename.tex
# acronyms filename.tex

terms(){
    grep -o -P "(?:[A-Z][a-z]+)\s+(?:\s*[A-Z][a-zA-Z]+)+" $1 | sort | uniq -c | sort -nr
}

acronyms(){
    grep -o -P "\b(?:[A-Z][a-z]*){2,}\b" $1 | sort | uniq -c | sort -nr
}