- hendrikmuhs/ccache-action: the v1 major tag also lagged behind the
latest release (v1.2.20 instead of v1.2.23), so this is a bump too
- threeal/cmake-action: v2 == v2.1.0, pinned as-is
- ENABLE_SANITIZER_UNDEFINED_BEHAVIOUR never matched the CMake option
ENABLE_SANITIZER_UNDEFINED_BEHAVIOR (cmake/FairMQProjectSettings.cmake),
so the asan+lsan+ubsan job has never actually enabled UBSan
- every entry stood in for a library tsan could not see into; with libzmq,
libsodium and libstdc++ now tsan-instrumented in the tsan CI job, the
happens-before edges they establish are visible and nothing is left to
suppress
- suppressions were blunt (a race: entry matches any frame in the stack),
so they could also mask real races passing through those frames
- build with the spack gcc toolchain like every other job: no clang++, no
-fuse-ld=lld and no lld install step (the GNU BFD failure on tsan objects
was specific to clang's tsan runtime)
- use gcc 15 for the freshest libtsan runtime; the asan entry stays on
gcc 14, so the matrix now carries per-entry gcc and spack env names
- consume the tsan spack env and load the instrumented libstdc++ into each
test's environment via FAIRMQ_TEST_LD_LIBRARY_PATH (it shares the soname
of the compiler's own, so it substitutes process-wide at load time)
- use -fno-omit-frame-pointer for readable reports; optimization comes
from the project's Debug -Og
- verify the wiring: assert the test environment resolves libstdc++ to the
instrumented copy and that libzmq is tsan-instrumented, since both
failure modes are silent (the suite still passes, with reduced race
coverage)
- mirror spack-latest.yaml, with -fsanitize=thread on the libzmq and
libsodium nodes so tsan can observe the happens-before edges established
inside libzmq's lock-free queues, plus the libstdcxx-tsan root spec
- flags are applied per node instead of via the propagating '==' operator,
which could reach the gcc node and trigger a compiler rebuild
- unchanged roots (fairlogger, boost, ninja, cmake) keep their spec hashes,
so they are shared with the regular buildcache entries; the instrumented
nodes hash differently and coexist in the content-addressed cache
- exclude libstdcxx-tsan from concretizer reuse so recipe changes always
take effect; unchanged recipes still hit the buildcache because the spec
hash is identical
- add the tsan env to the buildcache matrix (rebuilding also on spack_repo
changes) so the instrumented binaries are cached instead of rebuilt on
every CI run
- gcc ships no supported switch to build libstdc++ with -fsanitize=thread,
and spack's gcc recipe filters all flags out of the target-library build
(CXXFLAGS_FOR_TARGET is owned by its generated --with-build-config=spack
makefile), so provide a dedicated libstdcxx-tsan package in a custom repo
- build only the libstdc++-v3 subtree from the matching gcc release tarball,
configured standalone against the already-installed toolchain (recipe
modeled on https://iree.dev/developers/debugging/sanitizers/), instead of
rebuilding all of gcc
- the result is a drop-in runtime replacement for the compiler's libstdc++
(same soname and symbol versions), to be loaded only by the instrumented
test executables
- normalize the install layout after make install: the standalone build puts
the runtime libraries into the multilib os dir (lib64 on x86_64) regardless
of --libdir, and --with-toolexeclibdir only applies to cross builds
- register the repo in the setup-deps action before creating the env
- buildcache push expands its selection into the full dependency
closure, build-time dependencies included; specs that were satisfied
from the buildcache do not have those installed locally, and the push
fails with PackageNotInstalledError
- both push sites (the early gcc node push and the env-level push) only
ever ran in fresh-build scenarios before, so the failure surfaced once
the cache was warm
- pass --allow-missing to skip what is not installed (a best-effort push
of everything that is); a freshly built gcc thus still uploads its
build-time dependencies, which a future gcc rebuild can then pull as
binaries
- introduce FAIRMQ_TEST_LD_LIBRARY_PATH, which prepends a directory to
each test's environment via ctest, so the tests can run against an
alternative runtime library (e.g. a tsan-instrumented libstdc++)
- LD_LIBRARY_PATH rather than an injected rpath: an rpath added via the
linker flags cannot precede the rpath spack's gcc adds through its
specs file, so the compiler's own libstdc++ would keep winning the
runtime search order
- scoped per test on purpose: an instrumented library has unresolved
__tsan_* symbols and must not be loaded into uninstrumented tools
like cmake, ctest or ninja
- fail the configuration instead of silently dropping the injection on
CMake < 3.22 (ENVIRONMENT_MODIFICATION)
- cover the example tests too; they share the instrumented runtime but
not the locale-cache warmup (their main() is the installed public
header). The custom-controller env block was dead before: it tested
lsan_options, which only ever existed in the add_example() function
scope, so the test also never received the LSan suppressions
- CMAKE_EXE_LINKER_FLAGS and CMAKE_SHARED_LINKER_FLAGS are command-line
strings; list(APPEND) inserts a semicolon once the variable is
non-empty, which splits the link command at the shell level
- latent until now because nothing passed linker flags on the cmake
command line
- std::ctype<char> caches narrow()/widen() results per character in
plain char arrays of the global classic-locale facet, written without
synchronization from header-inlined code (locale_facets.h); two
threads exercising an uncached character concurrently (e.g. compiling
a std::regex in Channel::Validate) constitute a true data race that
ThreadSanitizer rightfully reports
- the stores are real and unsynchronized, so a tsan-instrumented
libstdc++ cannot help here; instead fill the caches before any thread
is spawned, which turns every later access into a pure read
- warm the lazily-installed num_put/num_get caches used by stream
insertion/extraction as well, via a small format/parse round-trip
- wire the warm-up into the gtest runner main() and, via a static
initializer, into the test device runner
- the pattern is constant; compiling it on every Validate() call is
wasted work and, when channels are validated from multiple threads,
needlessly exercises libstdc++'s lazily-populated ctype caches
- the subscriber threads captured the loop counter by reference while
the spawning loop kept incrementing it: a genuine data race
- depending on timing, threads could also end up with duplicate
subscriber names; capture the counter by value instead
- concurrent execute() calls print captured subprocess lines to
std::cout from multiple threads; the standard allows that, but
libstdc++ maintains the formatted-output state (ios_base::width)
with plain reads and writes -- a data race ThreadSanitizer reports
once libstdc++ itself is instrumented
- a mutex around the insertion also keeps whole lines from
interleaving
A failed region lookup was inserted into the thread-local cache as
nullptr, making the failure permanent for the lifetime of the cache
generation - retrying never healed because the fast path would return
nullptr without calling GetRegion again. Skip the cache insert on
failure so subsequent calls retry the slow path.
Add two public entry points needed by the ALICE use case where shmem
messages are allocated via a transport but never sent — their metadata
is instead serialised into Arrow tables and delivered over a separate
channel, allowing consumer devices to resolve the payload pointer
without taking ownership.
shmem::Message::GetMeta() returns the MetaHeader of the message,
mirroring the existing positional-init pattern already used in Socket.h.
shmem::GetDataAddressFromHandle(TransportFactory&, const MetaHeader&)
is a free function declared in Common.h and defined in Manager.cxx.
Keeping it out of the TransportFactory class body means callers only
need to include Common.h (available transitively via Message.h) and do
not drag in Socket.h or zmq.h. The implementation handles both managed
segments and unmanaged regions, and throws SharedMemoryError with a
typed message on a bad segment or region id. TransportFactory also
gains a same-named member for callers that already have the concrete
type. Lifetime of the returned pointer is the caller's responsibility;
the cache device is expected to hold the messages alive.
A SideChannel test covers the GetMeta/GetDataAddressFromHandle
round-trip for both standard and expanded-metadata configurations.
- libzmq is not tsan-instrumented, so tsan cannot see the happens-before
its queues establish between user threads and libzmq I/O threads,
producing false-positive data races on message buffers
- add test/thread_sanitizer_suppressions.txt and point TSAN_OPTIONS at it
via the sanitizers job env so it reaches the tests and their device
subprocesses
- suppress: accesses made directly from libzmq, the zero-copy message
deleters libzmq runs from msg_t::close, shmem receive-side metadata
reads, and std::regex/locale lazy-init races in libstdc++
- glibc declares pclose with function attributes (nothrow/leaf), so
decltype(pclose)* carries attributes that gcc ignores on the unique_ptr
template argument, emitting -Wignored-attributes
- spell the deleter as a plain int(*)(FILE*) instead; pclose converts to
it silently and the deleter behaves identically (both popen branches)
- clang-tidy enables the clang-analyzer-* group by default; in this
codebase it only yields false positives (intentional moved-from
asserts, a bitmask enum cast) and noise inside third-party boost
headers that HeaderFilterRegex does not filter
- prefix the check list with -* so only the explicitly curated checks run
- drop the broad `cppcoreguidelines-*` glob: it produced ~4500 findings
(magic numbers, non-private members, owning-memory, pointer arithmetic,
...) that are aspirational and out of scope for the warning gate
- drop modernize-use-equals-default: in this codebase it only yields
false/unsafe positives, e.g. `= default` on a constructor that
explicitly initializes atomic members (which default-init leaves
indeterminate in C++17), and invalid output on constructors with a
member-init list
- drop modernize-pass-by-value: it rewrites constructor parameters to
by-value + std::move, changing public constructor signatures, which is
an ABI-relevant change unsuitable for a library's public headers
- keep the deliberately-listed modernize/readability/performance checks
- the gate did `grep -q warning: build.log`, but build.log was never
produced by the cmake-action build, so under `set -e` the grep in the
`if` condition just reported "no match" and the job always passed
- as a result ~4961 clang-tidy warnings were silently ignored
- build manually and capture output to build.log with pipefail, and
fail explicitly if the log is missing or contains a warning
- tsan build failed at link with GNU ld:
"failed to set dynamic section sizes: bad value" (known binutils +
ThreadSanitizer incompatibility); install lld and select it via
-fuse-ld=lld for the tsan job only
- pass -fuse-ld=lld through cxx-flags so it reaches the link line,
avoiding the semicolon-list pitfall of
list(APPEND CMAKE_EXE_LINKER_FLAGS ...)
- build the bundled googletest with CMAKE_POSITION_INDEPENDENT_CODE=ON:
lld rejects R_X86_64_32 relocations from the non-PIC libgtest.a when
producing the position-independent tsan executable; the bundle is built
by a separate cmake invocation, so the flag must be set there
- Region.PreallocateInsideSession.shmem and Example.region.shmem failed
because mlock() of the managed segment/region hit RLIMIT_MEMLOCK on the
runner ("Cannot allocate memory"); the region example then hung until
its 30s timeout
- raise the limit via `sudo prlimit` in the same shell that launches
ctest (per-process, so it must be done here, not in a prior step)
- replace threeal/ctest-action with the equivalent ctest invocation
- LSan symbolizes the leak as the C++ method `zmq::msg_t::init_size`
in the Debug sanitizer build, no longer the C wrapper
`zmq_msg_init_size`
- substring match failed (`_` vs `::`), so the suppression no longer
applied and the asan+lsan+ubsan job failed in Pair/PubSub/Poller tests
- add the demangled frame, keep the old pattern for older libzmq
Committed lockfiles pinned gcc as a host-path external (from spack compiler
find), which is not portable across runners and broke CI. Cache the gcc
compiler itself as a buildcache node instead, so CI pulls it (~1 min) rather
than building it from source (~1 h).
- push the freshly-built gcc node in setup-deps BEFORE spack compiler find
(which marks it external and excludes it from buildcache push), gated behind
a push-gcc input used only by the buildcache workflow
- drop the committed-lockfile approach: remove test/ci/locks, the lockfile
install path in setup-deps, and the lockfile export in the buildcache workflow
- drop the ignored ref input from setup-spack (v3 renamed it to spack_ref)
Reusing concretization between the weekly buildcache (fresh) and weekday CI
(reuse) can drift if runner externals change, causing avoidable cache misses.
- setup-deps installs from test/ci/locks/<env>-gcc<N>.lock when it exists,
skipping concretization for byte-identical hashes; falls back to the spec
yaml otherwise
- buildcache exports each env's spack.lock as a downloadable artifact so the
lockfiles can be regenerated on the ubuntu-24.04 runner and committed
- document the manual regeneration flow in test/ci/locks/README.md
FairMQ's own sources (library, examples, tests) were recompiled from scratch
in every matrix job on every push.
- add hendrikmuhs/ccache-action to build, sanitizers and static-analysis jobs
- set CMAKE_C/CXX_COMPILER_LAUNCHER=ccache so cmake routes through it
- key the cache per (job, env, gcc) since ccache hashes the compiler
The push trigger had a path filter but no branch filter, so any PR-branch push
touching those paths (e.g. a dependabot rebase pulling in setup-deps changes)
launched the full fresh buildcache matrix concurrently with CI.
- restrict the push trigger to branches [dev, master]
- frees runners for CI; the cache still refreshes via cron and on dev/master
gcc was built from source (~58 min/job) because the FairMQ buildcache mirror
is only configured inside the env yaml, while gcc is installed before the env
is created.
- register the mirror globally after spack setup so "Install GCC" pulls the
compiler as a binary
- pin runners to ubuntu-24.04 so the weekly buildcache and weekday CI share an
image and concretize to matching hashes
- bump setup-spack to v3 to match the update-index job
Boost 1.88 replaced Boost.Process with v2, breaking the v1 API.
Boost 1.89 restores v1 compatibility via <boost/process/v1.hpp>.
- Fail configuration if Boost 1.88 is detected
- Define FAIRMQ_BOOST_PROCESS_V1_HEADER for Boost >= 1.89
- Use conditional includes to select v1.hpp or process.hpp
- Add namespace aliases (bp, bp_this) for portable API access
Use JSON output and jq to select the newest installed gcc version
when multiple versions match, avoiding conflicts between system and
spack-installed compilers.