- every entry stood in for a library tsan could not see into; with libzmq,
libsodium and libstdc++ now tsan-instrumented in the tsan CI job, the
happens-before edges they establish are visible and nothing is left to
suppress
- suppressions were blunt (a race: entry matches any frame in the stack),
so they could also mask real races passing through those frames
- build with the spack gcc toolchain like every other job: no clang++, no
-fuse-ld=lld and no lld install step (the GNU BFD failure on tsan objects
was specific to clang's tsan runtime)
- use gcc 15 for the freshest libtsan runtime; the asan entry stays on
gcc 14, so the matrix now carries per-entry gcc and spack env names
- consume the tsan spack env and load the instrumented libstdc++ into each
test's environment via FAIRMQ_TEST_LD_LIBRARY_PATH (it shares the soname
of the compiler's own, so it substitutes process-wide at load time)
- use -fno-omit-frame-pointer for readable reports; optimization comes
from the project's Debug -Og
- verify the wiring: assert the test environment resolves libstdc++ to the
instrumented copy and that libzmq is tsan-instrumented, since both
failure modes are silent (the suite still passes, with reduced race
coverage)
- mirror spack-latest.yaml, with -fsanitize=thread on the libzmq and
libsodium nodes so tsan can observe the happens-before edges established
inside libzmq's lock-free queues, plus the libstdcxx-tsan root spec
- flags are applied per node instead of via the propagating '==' operator,
which could reach the gcc node and trigger a compiler rebuild
- unchanged roots (fairlogger, boost, ninja, cmake) keep their spec hashes,
so they are shared with the regular buildcache entries; the instrumented
nodes hash differently and coexist in the content-addressed cache
- exclude libstdcxx-tsan from concretizer reuse so recipe changes always
take effect; unchanged recipes still hit the buildcache because the spec
hash is identical
- add the tsan env to the buildcache matrix (rebuilding also on spack_repo
changes) so the instrumented binaries are cached instead of rebuilt on
every CI run
- gcc ships no supported switch to build libstdc++ with -fsanitize=thread,
and spack's gcc recipe filters all flags out of the target-library build
(CXXFLAGS_FOR_TARGET is owned by its generated --with-build-config=spack
makefile), so provide a dedicated libstdcxx-tsan package in a custom repo
- build only the libstdc++-v3 subtree from the matching gcc release tarball,
configured standalone against the already-installed toolchain (recipe
modeled on https://iree.dev/developers/debugging/sanitizers/), instead of
rebuilding all of gcc
- the result is a drop-in runtime replacement for the compiler's libstdc++
(same soname and symbol versions), to be loaded only by the instrumented
test executables
- normalize the install layout after make install: the standalone build puts
the runtime libraries into the multilib os dir (lib64 on x86_64) regardless
of --libdir, and --with-toolexeclibdir only applies to cross builds
- register the repo in the setup-deps action before creating the env
- buildcache push expands its selection into the full dependency
closure, build-time dependencies included; specs that were satisfied
from the buildcache do not have those installed locally, and the push
fails with PackageNotInstalledError
- both push sites (the early gcc node push and the env-level push) only
ever ran in fresh-build scenarios before, so the failure surfaced once
the cache was warm
- pass --allow-missing to skip what is not installed (a best-effort push
of everything that is); a freshly built gcc thus still uploads its
build-time dependencies, which a future gcc rebuild can then pull as
binaries
- libzmq is not tsan-instrumented, so tsan cannot see the happens-before
its queues establish between user threads and libzmq I/O threads,
producing false-positive data races on message buffers
- add test/thread_sanitizer_suppressions.txt and point TSAN_OPTIONS at it
via the sanitizers job env so it reaches the tests and their device
subprocesses
- suppress: accesses made directly from libzmq, the zero-copy message
deleters libzmq runs from msg_t::close, shmem receive-side metadata
reads, and std::regex/locale lazy-init races in libstdc++
- the gate did `grep -q warning: build.log`, but build.log was never
produced by the cmake-action build, so under `set -e` the grep in the
`if` condition just reported "no match" and the job always passed
- as a result ~4961 clang-tidy warnings were silently ignored
- build manually and capture output to build.log with pipefail, and
fail explicitly if the log is missing or contains a warning
- tsan build failed at link with GNU ld:
"failed to set dynamic section sizes: bad value" (known binutils +
ThreadSanitizer incompatibility); install lld and select it via
-fuse-ld=lld for the tsan job only
- pass -fuse-ld=lld through cxx-flags so it reaches the link line,
avoiding the semicolon-list pitfall of
list(APPEND CMAKE_EXE_LINKER_FLAGS ...)
- build the bundled googletest with CMAKE_POSITION_INDEPENDENT_CODE=ON:
lld rejects R_X86_64_32 relocations from the non-PIC libgtest.a when
producing the position-independent tsan executable; the bundle is built
by a separate cmake invocation, so the flag must be set there
- Region.PreallocateInsideSession.shmem and Example.region.shmem failed
because mlock() of the managed segment/region hit RLIMIT_MEMLOCK on the
runner ("Cannot allocate memory"); the region example then hung until
its 30s timeout
- raise the limit via `sudo prlimit` in the same shell that launches
ctest (per-process, so it must be done here, not in a prior step)
- replace threeal/ctest-action with the equivalent ctest invocation
Committed lockfiles pinned gcc as a host-path external (from spack compiler
find), which is not portable across runners and broke CI. Cache the gcc
compiler itself as a buildcache node instead, so CI pulls it (~1 min) rather
than building it from source (~1 h).
- push the freshly-built gcc node in setup-deps BEFORE spack compiler find
(which marks it external and excludes it from buildcache push), gated behind
a push-gcc input used only by the buildcache workflow
- drop the committed-lockfile approach: remove test/ci/locks, the lockfile
install path in setup-deps, and the lockfile export in the buildcache workflow
- drop the ignored ref input from setup-spack (v3 renamed it to spack_ref)
Reusing concretization between the weekly buildcache (fresh) and weekday CI
(reuse) can drift if runner externals change, causing avoidable cache misses.
- setup-deps installs from test/ci/locks/<env>-gcc<N>.lock when it exists,
skipping concretization for byte-identical hashes; falls back to the spec
yaml otherwise
- buildcache exports each env's spack.lock as a downloadable artifact so the
lockfiles can be regenerated on the ubuntu-24.04 runner and committed
- document the manual regeneration flow in test/ci/locks/README.md
FairMQ's own sources (library, examples, tests) were recompiled from scratch
in every matrix job on every push.
- add hendrikmuhs/ccache-action to build, sanitizers and static-analysis jobs
- set CMAKE_C/CXX_COMPILER_LAUNCHER=ccache so cmake routes through it
- key the cache per (job, env, gcc) since ccache hashes the compiler
The push trigger had a path filter but no branch filter, so any PR-branch push
touching those paths (e.g. a dependabot rebase pulling in setup-deps changes)
launched the full fresh buildcache matrix concurrently with CI.
- restrict the push trigger to branches [dev, master]
- frees runners for CI; the cache still refreshes via cron and on dev/master
gcc was built from source (~58 min/job) because the FairMQ buildcache mirror
is only configured inside the env yaml, while gcc is installed before the env
is created.
- register the mirror globally after spack setup so "Install GCC" pulls the
compiler as a binary
- pin runners to ubuntu-24.04 so the weekly buildcache and weekday CI share an
image and concretize to matching hashes
- bump setup-spack to v3 to match the update-index job
Use JSON output and jq to select the newest installed gcc version
when multiple versions match, avoiding conflicts between system and
spack-installed compilers.
- rename mirror to ghcr-buildcache
- find system compiler before building gcc
- separate update-index job to avoid race condition
- always attempt push even on partial failure