Not logged in. Login

Parabix-Wc

The Word Count Utility

A standard utility on Posix platforms is wc a program to count the lines, words and characters in files. This makes a simple and useful example to illustrate basic concepts of Parabix programming.

Word Count Performance

Before we get started, let's just take a look at the performance of wc.

Performance on Mac OS X

First we apply the Mac OS x version of wc to a 650MB collection of Wikibooks documents in different languages.

cameron@Robs-iMac-Pro ~ % time wc Wikibooks/*wikib*xml 
  149060 2193307 19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml
 3822595 22409963 204979131 Wikibooks/dewikibooks-20141216-pages-articles.xml
  149198 1664829 19608623 Wikibooks/elwikibooks-20141226-pages-articles.xml
 1421406 9094432 73912792 Wikibooks/eswikibooks-20141223-pages-articles.xml
  256564 2175758 20153325 Wikibooks/fawikibooks-20141217-pages-articles.xml
  380247 1783033 19968387 Wikibooks/fiwikibooks-20141221-pages-articles.xml
 1726088 10712579 91820621 Wikibooks/frwikibooks-20150106-pages-articles.xml
  281479 1585857 15202378 Wikibooks/idwikibooks-20141221-pages-articles.xml
  916939 1853282 56483233 Wikibooks/jawikibooks-20150103-pages-articles.xml
  207600 1300598 12608713 Wikibooks/kowikibooks-20141223-pages-articles.xml
  634492 4425740 59197467 Wikibooks/ruwikibooks-20150123-pages-articles.xml
  133998  419752 11582333 Wikibooks/thwikibooks-20150104-pages-articles.xml
  196735 1241781 12358807 Wikibooks/trwikibooks-20141227-pages-articles.xml
  245338 1266055 11846638 Wikibooks/viwikibooks-20141221-pages-articles.xml
  390368 1055539 20515544 Wikibooks/zhwikibooks-20141225-pages-articles.xml
 10912107 63182505 649901238 total
wc Wikibooks/*wikib*xml  2.24s user 0.08s system 99% cpu 2.330 total

Now let us try the Parabix version of wc:

cameron@Robs-iMac-Pro ~ % time parabix-devel/build9/bin/wc Wikibooks/*wikib*xml
   149060   1799446  19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml
  3822595  22397478 204979131 Wikibooks/dewikibooks-20141216-pages-articles.xml
   149198   1485690  19608623 Wikibooks/elwikibooks-20141226-pages-articles.xml
  1421406   9090066  73912792 Wikibooks/eswikibooks-20141223-pages-articles.xml
   256564   1882967  20153325 Wikibooks/fawikibooks-20141217-pages-articles.xml
   380247   1782683  19968387 Wikibooks/fiwikibooks-20141221-pages-articles.xml
  1726088  10702791  91820621 Wikibooks/frwikibooks-20150106-pages-articles.xml
   281479   1584727  15202378 Wikibooks/idwikibooks-20141221-pages-articles.xml
   916939   1478026  56483233 Wikibooks/jawikibooks-20150103-pages-articles.xml
   207600   1120153  12608713 Wikibooks/kowikibooks-20141223-pages-articles.xml
   634492   4319980  59197467 Wikibooks/ruwikibooks-20150123-pages-articles.xml
   133998    410390  11582333 Wikibooks/thwikibooks-20150104-pages-articles.xml
   196735   1239209  12358807 Wikibooks/trwikibooks-20141227-pages-articles.xml
   245338   1236528  11846638 Wikibooks/viwikibooks-20141221-pages-articles.xml
   390368    915449  20515544 Wikibooks/zhwikibooks-20141225-pages-articles.xml
 10912107  61445583 649901238 total
parabix-devel/build9/bin/wc Wikibooks/*wikib*xml  2.49s user 0.14s system 705% cpu 0.373 total

Performance on Linux

Now let's look at the performance on Linux. In this case we will also use the perf stat tool that gives more performance information.

cameron@cs-osl-11:~$ perf stat wc Wikibooks/*s.xml
   149060   1799442  19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml
  3822595  22396747 204979131 Wikibooks/dewikibooks-20141216-pages-articles.xml
   ...
 10912107  61426221 649901238 total

 Performance counter stats for 'wc Wikibooks/arwikibooks-20150102-pages-articles.xml Wikibooks/dewikibooks-20141216-pages-articles.xml ...':

       8512.296572      task-clock (msec)         #    1.000 CPUs utilized          
                10      context-switches          #    0.001 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                74      page-faults               #    0.009 K/sec                  
    32,864,698,940      cycles                    #    3.861 GHz                    
     8,462,253,767      stalled-cycles-frontend   #   25.75% frontend cycles idle   
    70,780,555,696      instructions              #    2.15  insn per cycle         
                                                  #    0.12  stalled cycles per insn
    16,197,188,530      branches                  # 1902.799 M/sec                  
        85,007,661      branch-misses             #    0.52% of all branches        

       8.512993252 seconds time elapsed

Using the Parabix version, we have:

cameron@cs-osl-11:~$ perf stat parabix-devel/build8/bin/wc Wikibooks/*s.xml
   149060   1799446  19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml
  3822595  22397478 204979131 Wikibooks/dewikibooks-20141216-pages-articles.xml
   ...
 10912107  61445583 649901238 total

 Performance counter stats for 'parabix-devel/build8/bin/wc Wikibooks/arwikibooks-20150102-pages-articles.xml Wikibooks/dewikibooks-20141216-pages-articles.xml ...':

       1155.681628      task-clock (msec)         #    3.008 CPUs utilized          
                50      context-switches          #    0.043 K/sec                  
                 1      cpu-migrations            #    0.001 K/sec                  
            13,670      page-faults               #    0.012 M/sec                  
     4,226,676,458      cycles                    #    3.657 GHz                    
     1,534,466,490      stalled-cycles-frontend   #   36.30% frontend cycles idle   
    11,759,089,025      instructions              #    2.78  insn per cycle         
                                                  #    0.13  stalled cycles per insn
     1,701,110,069      branches                  # 1471.954 M/sec                  
         3,795,727      branch-misses             #    0.22% of all branches        

       0.384153114 seconds time elapsed


Notice that the Parabix version is 20 times faster! It executes about 6x fewer instructions and uses 4 cores automatically.

The Parabix wc Implementation

Now let's take a look under the hood and see how the Parabix wc application (https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/-/blob/master/tools/wc/wc.cpp) is put together.

  • The word count main program has two steps
    • Create the word count pipeline and compile it.
    • Apply the compiled pipeline to each of the input files.
  • The wcPipelineGen function builds up the pipeline in 4 stages:
    • mmap: set up the input byte stream using memory-mapped IO.
    • s2p: transpose the byte stream to eight basis bit streams using the serial to parallel (s2p) kernel.
    • wc: apply the logic of a word-count kernel to calculate the counts using Parabix methods.
    • Use the recordCounts procedure to return the counts to the main program.
  • The wc kernel is written using the Pablo bitwise processing language.
    • The input stream set u8bit has the 8 basis bit streams.
    • An instance of the character class compiler cc::CC_Compiler is initialized.
    • Variables to hold the output counts lineCount, wordCount and charCount are set up.
    • Lines are counted by making a character class for the LF character and using the Pablo Count operation (population count of a bit stream.
    • Word counting is somewhat more complex: we count the transitions between word and "space" characters. In order to make this version Unicode-compliant, we create and compile a Unicode property expression "space" using the Parabix built-in database for Unicode properties.
    • Characters are counted by counting only the ASCII bytes and prefixes of UTF-8 sequences.
Updated Fri May 28 2021, 11:48 by cameron.