Not logged in. Login

Intro to Parabix Programming - wc (Word Count)

The Word Count Utility

A standard utility on Posix platforms is wc a program to count the lines, words and characters in files. This makes a simple and useful example to illustrate basic concepts of Parabix programming.

Word Count Performance

Before we get started, let's just take a look at the performance of wc.

Performance on Mac OS X

First we apply the Mac OS x version of wc to a 650MB collection of Wikibooks documents in different languages.

Robs-MacBook-Pro:~ cameron$ time wc Wikibooks/*xml
  149060 2193307 19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml
 3822595 22409963 204979131 Wikibooks/dewikibooks-20141216-pages-articles.xml
  149198 1664829 19608623 Wikibooks/elwikibooks-20141226-pages-articles.xml
 1421406 9094432 73912792 Wikibooks/eswikibooks-20141223-pages-articles.xml
  256564 2175758 20153325 Wikibooks/fawikibooks-20141217-pages-articles.xml
  380247 1783033 19968387 Wikibooks/fiwikibooks-20141221-pages-articles.xml
 1726088 10712579 91820621 Wikibooks/frwikibooks-20150106-pages-articles.xml
  281479 1585857 15202378 Wikibooks/idwikibooks-20141221-pages-articles.xml
  916939 1853282 56483233 Wikibooks/jawikibooks-20150103-pages-articles.xml
  207600 1300598 12608713 Wikibooks/kowikibooks-20141223-pages-articles.xml
  634492 4425740 59197467 Wikibooks/ruwikibooks-20150123-pages-articles.xml
  133998  419752 11582333 Wikibooks/thwikibooks-20150104-pages-articles.xml
  196735 1241781 12358807 Wikibooks/trwikibooks-20141227-pages-articles.xml
  245338 1266055 11846638 Wikibooks/viwikibooks-20141221-pages-articles.xml
  390368 1055539 20515544 Wikibooks/zhwikibooks-20141225-pages-articles.xml
 10912107 63182505 649901238 total

real	0m4.588s
user	0m4.417s
sys	0m0.142s

Now let us try the Parabix version of wc:

Robs-MacBook-Pro:~ cameron$ time icgrep-devel/icgrep-build/wc Wikibooks/*xml
   149060   1799447  19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml
  3822595  22396483 204979131 Wikibooks/dewikibooks-20141216-pages-articles.xml
   149198   1485655  19608623 Wikibooks/elwikibooks-20141226-pages-articles.xml
  1421406   9089650  73912792 Wikibooks/eswikibooks-20141223-pages-articles.xml
   256564   1882940  20153325 Wikibooks/fawikibooks-20141217-pages-articles.xml
   380247   1782667  19968387 Wikibooks/fiwikibooks-20141221-pages-articles.xml
  1726088  10693871  91820621 Wikibooks/frwikibooks-20150106-pages-articles.xml
   281479   1584729  15202378 Wikibooks/idwikibooks-20141221-pages-articles.xml
   916939   1444253  56483233 Wikibooks/jawikibooks-20150103-pages-articles.xml
   207600   1119971  12608713 Wikibooks/kowikibooks-20141223-pages-articles.xml
   634492   4309975  59197467 Wikibooks/ruwikibooks-20150123-pages-articles.xml
   133998    410393  11582333 Wikibooks/thwikibooks-20150104-pages-articles.xml
   196735   1239293  12358807 Wikibooks/trwikibooks-20141227-pages-articles.xml
   245338   1236504  11846638 Wikibooks/viwikibooks-20141221-pages-articles.xml
   390368    912902  20515544 Wikibooks/zhwikibooks-20141225-pages-articles.xml
 10912107  61388733 649901238 total

real	0m0.625s
user	0m0.439s
sys	0m0.180s

Performance on Linux

Now let's look at the performance on Linux. In this case we will also use the perf stat tool that gives more performance information.

cameron@cs-osl-10:~$ perf stat wc Wikibooks/*s.xml
   149060   1799442  19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml
...

 Performance counter stats for 'wc Wikibooks/arwikibooks-20150102-pages-articles.xml ...':

      12392.055025      task-clock (msec)         #    1.000 CPUs utilized          
                 5      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                70      page-faults               #    0.006 K/sec                  
    24,724,090,853      cycles                    #    1.995 GHz                    
   <not supported>      stalled-cycles-frontend  
   <not supported>      stalled-cycles-backend   
    68,722,674,380      instructions              #    2.78  insns per cycle        
    16,032,433,867      branches                  # 1293.767 M/sec                  
        85,758,878      branch-misses             #    0.53% of all branches        

      12.391519289 seconds time elapsed


Using the Parabix version, we have:

cameron@cs-osl-10:~$ perf stat icgrep-devel/icgrep-build/wc Wikibooks/*s.xml
   149060   1799447  19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml
...

 Performance counter stats for 'icgrep-devel/icgrep-build/wc Wikibooks/arwikibooks-20150102-pages-articles.xml ...':

       3099.410393      task-clock (msec)         #    1.000 CPUs utilized          
                 6      context-switches          #    0.002 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
            10,635      page-faults               #    0.003 M/sec                  
     6,184,014,465      cycles                    #    1.995 GHz                    
   <not supported>      stalled-cycles-frontend  
   <not supported>      stalled-cycles-backend   
     8,218,706,189      instructions              #    1.33  insns per cycle        
        80,762,055      branches                  #   26.057 M/sec                  
           352,384      branch-misses             #    0.44% of all branches        

       3.099577407 seconds time elapsed

Here the Parabix version is 4 times faster, but executes 8x fewer instructions! There is still room for improvement.

The Parabix wc Implementation

Now let's take a look under the hood and see how the Parabix wc application (http://parabix.costar.sfu.ca/browser/icGREP/icgrep-devel/icgrep/wc.cpp) is put together.

  • The word count main program has two steps
    • Create the word count pipeline and compile it.
    • Apply the compiled pipeline to each of the input files.
  • The wcPipelineGen function builds up the pipeline in 4 stages:
    • mmap: set up the input byte stream using memory-mapped IO.
    • s2p: transpose the byte stream to eight basis bit streams using the serial to parallel (s2p) kernel.
    • wc: apply the logic of a word-count kernel to calculate the counts using Parabix methods.
    • Use the recordCounts procedure to return the counts to the main program.
  • The wc kernel is written using the Pablo bitwise processing language.
    • The input stream set u8bit has the 8 basis bit streams.
    • An instance of the character class compiler cc::CC_Compiler is initialized.
    • Variables to hold the output counts lineCount, wordCount and charCount are set up.
    • Lines are counted by making a character class for the LF character and using the Pablo Count operation (population count of a bit stream.
    • Word counting is somewhat more complex: we count the transitions between word and WS characters.
    • Characters are counted by counting only the ASCII bytes and prefixes of UTF-8 sequences.
Updated Sun Jan. 21 2018, 09:35 by cameron.