Intro to Parabix Programming - wc (Word Count)
The Word Count Utility
A standard utility on Posix platforms is wc
a program
to count the lines, words and characters in files. This makes
a simple and useful example to illustrate basic concepts of Parabix programming.
Word Count Performance
Before we get started, let's just take a look at the performance of wc.
Performance on Mac OS X
First we apply the Mac OS x version of wc to a 650MB collection of Wikibooks documents in different languages.
Robs-MacBook-Pro:~ cameron$ time wc Wikibooks/*xml 149060 2193307 19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml 3822595 22409963 204979131 Wikibooks/dewikibooks-20141216-pages-articles.xml 149198 1664829 19608623 Wikibooks/elwikibooks-20141226-pages-articles.xml 1421406 9094432 73912792 Wikibooks/eswikibooks-20141223-pages-articles.xml 256564 2175758 20153325 Wikibooks/fawikibooks-20141217-pages-articles.xml 380247 1783033 19968387 Wikibooks/fiwikibooks-20141221-pages-articles.xml 1726088 10712579 91820621 Wikibooks/frwikibooks-20150106-pages-articles.xml 281479 1585857 15202378 Wikibooks/idwikibooks-20141221-pages-articles.xml 916939 1853282 56483233 Wikibooks/jawikibooks-20150103-pages-articles.xml 207600 1300598 12608713 Wikibooks/kowikibooks-20141223-pages-articles.xml 634492 4425740 59197467 Wikibooks/ruwikibooks-20150123-pages-articles.xml 133998 419752 11582333 Wikibooks/thwikibooks-20150104-pages-articles.xml 196735 1241781 12358807 Wikibooks/trwikibooks-20141227-pages-articles.xml 245338 1266055 11846638 Wikibooks/viwikibooks-20141221-pages-articles.xml 390368 1055539 20515544 Wikibooks/zhwikibooks-20141225-pages-articles.xml 10912107 63182505 649901238 total real 0m4.588s user 0m4.417s sys 0m0.142s
Now let us try the Parabix version of wc:
Robs-MacBook-Pro:~ cameron$ time icgrep-devel/icgrep-build/wc Wikibooks/*xml 149060 1799447 19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml 3822595 22396483 204979131 Wikibooks/dewikibooks-20141216-pages-articles.xml 149198 1485655 19608623 Wikibooks/elwikibooks-20141226-pages-articles.xml 1421406 9089650 73912792 Wikibooks/eswikibooks-20141223-pages-articles.xml 256564 1882940 20153325 Wikibooks/fawikibooks-20141217-pages-articles.xml 380247 1782667 19968387 Wikibooks/fiwikibooks-20141221-pages-articles.xml 1726088 10693871 91820621 Wikibooks/frwikibooks-20150106-pages-articles.xml 281479 1584729 15202378 Wikibooks/idwikibooks-20141221-pages-articles.xml 916939 1444253 56483233 Wikibooks/jawikibooks-20150103-pages-articles.xml 207600 1119971 12608713 Wikibooks/kowikibooks-20141223-pages-articles.xml 634492 4309975 59197467 Wikibooks/ruwikibooks-20150123-pages-articles.xml 133998 410393 11582333 Wikibooks/thwikibooks-20150104-pages-articles.xml 196735 1239293 12358807 Wikibooks/trwikibooks-20141227-pages-articles.xml 245338 1236504 11846638 Wikibooks/viwikibooks-20141221-pages-articles.xml 390368 912902 20515544 Wikibooks/zhwikibooks-20141225-pages-articles.xml 10912107 61388733 649901238 total real 0m0.625s user 0m0.439s sys 0m0.180s
Performance on Linux
Now let's look at the performance on Linux. In this case we will also use the perf stat
tool that gives more performance information.
cameron@cs-osl-10:~$ perf stat wc Wikibooks/*s.xml 149060 1799442 19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml ... Performance counter stats for 'wc Wikibooks/arwikibooks-20150102-pages-articles.xml ...': 12392.055025 task-clock (msec) # 1.000 CPUs utilized 5 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 70 page-faults # 0.006 K/sec 24,724,090,853 cycles # 1.995 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 68,722,674,380 instructions # 2.78 insns per cycle 16,032,433,867 branches # 1293.767 M/sec 85,758,878 branch-misses # 0.53% of all branches 12.391519289 seconds time elapsed
Using the Parabix version, we have:
cameron@cs-osl-10:~$ perf stat icgrep-devel/icgrep-build/wc Wikibooks/*s.xml 149060 1799447 19663246 Wikibooks/arwikibooks-20150102-pages-articles.xml ... Performance counter stats for 'icgrep-devel/icgrep-build/wc Wikibooks/arwikibooks-20150102-pages-articles.xml ...': 3099.410393 task-clock (msec) # 1.000 CPUs utilized 6 context-switches # 0.002 K/sec 0 cpu-migrations # 0.000 K/sec 10,635 page-faults # 0.003 M/sec 6,184,014,465 cycles # 1.995 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 8,218,706,189 instructions # 1.33 insns per cycle 80,762,055 branches # 26.057 M/sec 352,384 branch-misses # 0.44% of all branches 3.099577407 seconds time elapsed
Here the Parabix version is 4 times faster, but executes 8x fewer instructions! There is still room for improvement.
The Parabix wc Implementation
Now let's take a look under the hood and see how the Parabix wc application (http://parabix.costar.sfu.ca/browser/icGREP/icgrep-devel/icgrep/wc.cpp) is put together.
- The word count main program has two steps
- Create the word count pipeline and compile it.
- Apply the compiled pipeline to each of the input files.
- The
wcPipelineGen
function builds up the pipeline in 4 stages:mmap
: set up the input byte stream using memory-mapped IO.s2p
: transpose the byte stream to eight basis bit streams using the serial to parallel (s2p
) kernel.wc
: apply the logic of a word-count kernel to calculate the counts using Parabix methods.- Use the
recordCounts
procedure to return the counts to the main program.
- The
wc
kernel is written using the Pablo bitwise processing language.- The input stream set
u8bit
has the 8 basis bit streams. - An instance of the character class compiler
cc::CC_Compiler
is initialized. - Variables to hold the output counts
lineCount
,wordCount
andcharCount
are set up. - Lines are counted by making a character class for the
LF
character and using the PabloCount
operation (population count of a bit stream. - Word counting is somewhat more complex: we count the transitions between word and WS characters.
- Characters are counted by counting only the ASCII bytes and prefixes of UTF-8 sequences.
- The input stream set