Parabix-Wc
The Word Count Utility
A standard utility on Posix platforms is wc
a program
to count the lines, words and characters in files. This makes
a simple and useful example to illustrate basic concepts of Parabix programming.
Word Count Performance
Before we get started, let's just take a look at the performance of wc.
Performance on Mac OS X
First we apply the Mac OS x version of wc to a 650MB collection of Wikibooks documents in different languages.
cameron@Robs-iMac-Pro ~ % time wc Wikibooks/*wikib*xml 149060 2193307 19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml 3822595 22409963 204979131 Wikibooks/dewikibooks-20141216-pages-articles.xml 149198 1664829 19608623 Wikibooks/elwikibooks-20141226-pages-articles.xml 1421406 9094432 73912792 Wikibooks/eswikibooks-20141223-pages-articles.xml 256564 2175758 20153325 Wikibooks/fawikibooks-20141217-pages-articles.xml 380247 1783033 19968387 Wikibooks/fiwikibooks-20141221-pages-articles.xml 1726088 10712579 91820621 Wikibooks/frwikibooks-20150106-pages-articles.xml 281479 1585857 15202378 Wikibooks/idwikibooks-20141221-pages-articles.xml 916939 1853282 56483233 Wikibooks/jawikibooks-20150103-pages-articles.xml 207600 1300598 12608713 Wikibooks/kowikibooks-20141223-pages-articles.xml 634492 4425740 59197467 Wikibooks/ruwikibooks-20150123-pages-articles.xml 133998 419752 11582333 Wikibooks/thwikibooks-20150104-pages-articles.xml 196735 1241781 12358807 Wikibooks/trwikibooks-20141227-pages-articles.xml 245338 1266055 11846638 Wikibooks/viwikibooks-20141221-pages-articles.xml 390368 1055539 20515544 Wikibooks/zhwikibooks-20141225-pages-articles.xml 10912107 63182505 649901238 total wc Wikibooks/*wikib*xml 2.24s user 0.08s system 99% cpu 2.330 total
Now let us try the Parabix version of wc:
cameron@Robs-iMac-Pro ~ % time parabix-devel/build9/bin/wc Wikibooks/*wikib*xml 149060 1799446 19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml 3822595 22397478 204979131 Wikibooks/dewikibooks-20141216-pages-articles.xml 149198 1485690 19608623 Wikibooks/elwikibooks-20141226-pages-articles.xml 1421406 9090066 73912792 Wikibooks/eswikibooks-20141223-pages-articles.xml 256564 1882967 20153325 Wikibooks/fawikibooks-20141217-pages-articles.xml 380247 1782683 19968387 Wikibooks/fiwikibooks-20141221-pages-articles.xml 1726088 10702791 91820621 Wikibooks/frwikibooks-20150106-pages-articles.xml 281479 1584727 15202378 Wikibooks/idwikibooks-20141221-pages-articles.xml 916939 1478026 56483233 Wikibooks/jawikibooks-20150103-pages-articles.xml 207600 1120153 12608713 Wikibooks/kowikibooks-20141223-pages-articles.xml 634492 4319980 59197467 Wikibooks/ruwikibooks-20150123-pages-articles.xml 133998 410390 11582333 Wikibooks/thwikibooks-20150104-pages-articles.xml 196735 1239209 12358807 Wikibooks/trwikibooks-20141227-pages-articles.xml 245338 1236528 11846638 Wikibooks/viwikibooks-20141221-pages-articles.xml 390368 915449 20515544 Wikibooks/zhwikibooks-20141225-pages-articles.xml 10912107 61445583 649901238 total parabix-devel/build9/bin/wc Wikibooks/*wikib*xml 2.49s user 0.14s system 705% cpu 0.373 total
Performance on Linux
Now let's look at the performance on Linux. In this case we will also use the perf stat
tool that gives more performance information.
cameron@cs-osl-11:~$ perf stat wc Wikibooks/*s.xml 149060 1799442 19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml 3822595 22396747 204979131 Wikibooks/dewikibooks-20141216-pages-articles.xml ... 10912107 61426221 649901238 total Performance counter stats for 'wc Wikibooks/arwikibooks-20150102-pages-articles.xml Wikibooks/dewikibooks-20141216-pages-articles.xml ...': 8512.296572 task-clock (msec) # 1.000 CPUs utilized 10 context-switches # 0.001 K/sec 0 cpu-migrations # 0.000 K/sec 74 page-faults # 0.009 K/sec 32,864,698,940 cycles # 3.861 GHz 8,462,253,767 stalled-cycles-frontend # 25.75% frontend cycles idle 70,780,555,696 instructions # 2.15 insn per cycle # 0.12 stalled cycles per insn 16,197,188,530 branches # 1902.799 M/sec 85,007,661 branch-misses # 0.52% of all branches 8.512993252 seconds time elapsed
Using the Parabix version, we have:
cameron@cs-osl-11:~$ perf stat parabix-devel/build8/bin/wc Wikibooks/*s.xml 149060 1799446 19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml 3822595 22397478 204979131 Wikibooks/dewikibooks-20141216-pages-articles.xml ... 10912107 61445583 649901238 total Performance counter stats for 'parabix-devel/build8/bin/wc Wikibooks/arwikibooks-20150102-pages-articles.xml Wikibooks/dewikibooks-20141216-pages-articles.xml ...': 1155.681628 task-clock (msec) # 3.008 CPUs utilized 50 context-switches # 0.043 K/sec 1 cpu-migrations # 0.001 K/sec 13,670 page-faults # 0.012 M/sec 4,226,676,458 cycles # 3.657 GHz 1,534,466,490 stalled-cycles-frontend # 36.30% frontend cycles idle 11,759,089,025 instructions # 2.78 insn per cycle # 0.13 stalled cycles per insn 1,701,110,069 branches # 1471.954 M/sec 3,795,727 branch-misses # 0.22% of all branches 0.384153114 seconds time elapsed
Notice that the Parabix version is 20 times faster! It executes about 6x fewer instructions and uses 4 cores automatically.
The Parabix wc Implementation
Now let's take a look under the hood and see how the Parabix wc application (https://cs-git-research.cs.surrey.sfu.ca/cameron/parabix-devel/-/blob/master/tools/wc/wc.cpp) is put together.
- The word count main program has two steps
- Create the word count pipeline and compile it.
- Apply the compiled pipeline to each of the input files.
- The
wcPipelineGen
function builds up the pipeline in 4 stages:mmap
: set up the input byte stream using memory-mapped IO.s2p
: transpose the byte stream to eight basis bit streams using the serial to parallel (s2p
) kernel.wc
: apply the logic of a word-count kernel to calculate the counts using Parabix methods.- Use the
recordCounts
procedure to return the counts to the main program.
- The
wc
kernel is written using the Pablo bitwise processing language.- The input stream set
u8bit
has the 8 basis bit streams. - An instance of the character class compiler
cc::CC_Compiler
is initialized. - Variables to hold the output counts
lineCount
,wordCount
andcharCount
are set up. - Lines are counted by making a character class for the
LF
character and using the PabloCount
operation (population count of a bit stream. - Word counting is somewhat more complex: we count the transitions between word and "space" characters. In order to make this version Unicode-compliant, we create and compile a Unicode property expression "space" using the Parabix built-in database for Unicode properties.
- Characters are counted by counting only the ASCII bytes and prefixes of UTF-8 sequences.
- The input stream set