summaryrefslogtreecommitdiff
path: root/_posts/2020-11-16-Distcc-ccache-setup.md
blob: 8dc74c623d9e232d29efbc64f10edf1c99020f50 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
layout: post
title: DistCC and CCache on Linux
---

![xkcd knows this](https://imgs.xkcd.com/comics/compiling.png)

After a project reaches a certain size, compiling the code becomes a task in and
of itself and is a source of wasted time the world over. Recently I started
looking at tools to reduce the amount of time I waste waiting on code to compile
and I found a few things that can help with that.

### ccache

The first tool I found was something called [ccache](https://ccache.dev), and it
works almost like a web browser cache, i.e. it stores the output of every
compile operation I run in my home directory, and when I run the same job again
it recalls the same result.

All one needs to do to get it working is to either prefix every call to the
compiler with `ccache` or, and this is what I recommend, prefix your PATH
environment variable with `/usr/lib/ccache/bin`. I just have a line in my
`.zshenv` file that does just that:

        export PATH="/usr/lib/ccache/bin:$PATH"

Now be warned, the first run with a large project will take *longer* than a
run without ccache enabled. The reason for this is that it has to build its
internal cache which takes some amount of time between calls to the compiler.

You can check the ccache statistics with the command `ccache -s`. Here's an
example:

        $ ccache -s
        cache directory                     /home/dholman/.ccache
        primary config                      /home/dholman/.ccache/ccache.conf
        secondary config      (readonly)    /etc/ccache.conf
        stats updated                       Fri Nov  6 13:31:48 2020
        cache hit (direct)                 16980
        cache hit (preprocessed)            2373
        cache miss                         76011
        cache hit rate                     20.29 %
        called for link                     2154
        called for preprocessing            2603
        multiple source files                  2
        compiler produced stdout               4
        compiler produced empty output       652
        compile failed                       657
        preprocessor error                  1541
        bad compiler arguments               260
        unsupported source language           10
        autoconf compile/link               3087
        unsupported code directive           123
        could not write to output file       110
        no input file                       2858
        cleanups performed                    65
        files in cache                      6387
        cache size                         586.9 MB
        max cache size                      20.0 GB

### distcc

The next tool was a bit more fiddly to get working and may or may not work for
everyone, so be warned. Also, in order to get any benefit out of this, you need
a separate computer attached to the same network as your development
workstation.

Distcc, according to the [website](https://distcc.github.io), is a distributed
compiler. That's not *really* true, as it functions as an extension to the
existing compiler already running on my machine. In my case I have a Dell R710
dual Xeon system running in the basement, and I have distcc set up as a daemon
listening on TCP port 3632 on that system. Here's an excerpt from
`/etc/default/distcc` allowing all hosts on my network to use it:

        #
        # Which networks/hosts should be allowed to connect to the daemon?
        # You can list multiple hosts/networks separated by spaces.
        # Networks have to be in CIDR notation, f.e. 192.168.1.0/24
        # Hosts are represented by a single IP Adress
        #
        # ALLOWEDNETS="127.0.0.1"

        ALLOWEDNETS="127.0.0.1 192.168.1.0/24"

The next step was to install distcc on my workstation and configure it to use
the server downstairs as a slave:

        # --- /etc/distcc/hosts -----------------------
        # See the "Hosts Specification" section of
        # "man distcc" for the format of this file.
        #
        # By default, just test that it works in loopback mode.
        192.168.1.26/24,cpp,lzo

With all of this configured I could then build a project with
`make -j<number of parallel jobs> CC=distcc`. Here's a few results from some of
my personal projects and other things.

Forge game engine:

        $ time make -j20 CC=distcc
        [ 23%] Building C object CMakeFiles/forge.dir/src/ui/button.c.o
        [ 23%] Building C object CMakeFiles/forge.dir/src/data/stack.c.o
        [ 23%] Building C object CMakeFiles/forge.dir/src/data/list.c.o
        [ 30%] Building C object CMakeFiles/forge.dir/src/ui/spinner.c.o
        [ 38%] Building C object CMakeFiles/forge.dir/src/ui/text.c.o
        [ 46%] Building C object CMakeFiles/forge.dir/src/engine.c.o
        [ 53%] Building C object CMakeFiles/forge.dir/src/ui/rect.c.o
        [ 61%] Building C object CMakeFiles/forge.dir/src/graphics.c.o
        [ 69%] Building C object CMakeFiles/forge.dir/src/input.c.o
        [ 76%] Building C object CMakeFiles/forge.dir/src/entity.c.o
        [ 92%] Building C object CMakeFiles/forge.dir/src/sprite.c.o
        [ 92%] Building C object CMakeFiles/forge.dir/src/tmx.c.o
        [100%] Linking C shared library libforge.so
        [100%] Built target forge
        make -j20 CC=distcc  0.03s user 0.02s system 97% cpu 0.046 total

Linux kernel with default config:

        $ time make -j20 CC=distcc
        ...
        Kernel: arch/x86/boot/bzImage is ready  (#1)
        make -j20 CC=distcc  463.45s user 116.23s system 120% cpu 8:01.29 total

Astute readers might notice that these results seem a little far-fetched for
*just* distcc alone, and they would be right. Because I set ccache to be called
*before* the compiler, distcc gets hooked into the call of ccache. I do not
recommend doing it the other way around and neither do the developers of ccache
and distcc. I also recommend *not* setting up distcc to be called on *every*
call to the compiler, as distributing source code over TCP/IP like this can
be a very time consuming process on sub-gigabit speed networks. One other thing
to note is that the compiler versions *must* match, or all calls to distcc will
fail.

One can look at what distcc is currently doing behind the scenes with
`distccmon-text <refresh seconds>`, or if a GUI interface is preferred, you can
use `distccmon-gnome`.

![distcc monitor](/assets/distcc.png)