A failed region lookup was inserted into the thread-local cache as
nullptr, making the failure permanent for the lifetime of the cache
generation - retrying never healed because the fast path would return
nullptr without calling GetRegion again. Skip the cache insert on
failure so subsequent calls retry the slow path.
Add two public entry points needed by the ALICE use case where shmem
messages are allocated via a transport but never sent — their metadata
is instead serialised into Arrow tables and delivered over a separate
channel, allowing consumer devices to resolve the payload pointer
without taking ownership.
shmem::Message::GetMeta() returns the MetaHeader of the message,
mirroring the existing positional-init pattern already used in Socket.h.
shmem::GetDataAddressFromHandle(TransportFactory&, const MetaHeader&)
is a free function declared in Common.h and defined in Manager.cxx.
Keeping it out of the TransportFactory class body means callers only
need to include Common.h (available transitively via Message.h) and do
not drag in Socket.h or zmq.h. The implementation handles both managed
segments and unmanaged regions, and throws SharedMemoryError with a
typed message on a bad segment or region id. TransportFactory also
gains a same-named member for callers that already have the concrete
type. Lifetime of the returned pointer is the caller's responsibility;
the cache device is expected to hold the messages alive.
A SideChannel test covers the GetMeta/GetDataAddressFromHandle
round-trip for both standard and expanded-metadata configurations.
- libzmq is not tsan-instrumented, so tsan cannot see the happens-before
its queues establish between user threads and libzmq I/O threads,
producing false-positive data races on message buffers
- add test/thread_sanitizer_suppressions.txt and point TSAN_OPTIONS at it
via the sanitizers job env so it reaches the tests and their device
subprocesses
- suppress: accesses made directly from libzmq, the zero-copy message
deleters libzmq runs from msg_t::close, shmem receive-side metadata
reads, and std::regex/locale lazy-init races in libstdc++
- glibc declares pclose with function attributes (nothrow/leaf), so
decltype(pclose)* carries attributes that gcc ignores on the unique_ptr
template argument, emitting -Wignored-attributes
- spell the deleter as a plain int(*)(FILE*) instead; pclose converts to
it silently and the deleter behaves identically (both popen branches)
- clang-tidy enables the clang-analyzer-* group by default; in this
codebase it only yields false positives (intentional moved-from
asserts, a bitmask enum cast) and noise inside third-party boost
headers that HeaderFilterRegex does not filter
- prefix the check list with -* so only the explicitly curated checks run
- drop the broad `cppcoreguidelines-*` glob: it produced ~4500 findings
(magic numbers, non-private members, owning-memory, pointer arithmetic,
...) that are aspirational and out of scope for the warning gate
- drop modernize-use-equals-default: in this codebase it only yields
false/unsafe positives, e.g. `= default` on a constructor that
explicitly initializes atomic members (which default-init leaves
indeterminate in C++17), and invalid output on constructors with a
member-init list
- drop modernize-pass-by-value: it rewrites constructor parameters to
by-value + std::move, changing public constructor signatures, which is
an ABI-relevant change unsuitable for a library's public headers
- keep the deliberately-listed modernize/readability/performance checks
- the gate did `grep -q warning: build.log`, but build.log was never
produced by the cmake-action build, so under `set -e` the grep in the
`if` condition just reported "no match" and the job always passed
- as a result ~4961 clang-tidy warnings were silently ignored
- build manually and capture output to build.log with pipefail, and
fail explicitly if the log is missing or contains a warning
- tsan build failed at link with GNU ld:
"failed to set dynamic section sizes: bad value" (known binutils +
ThreadSanitizer incompatibility); install lld and select it via
-fuse-ld=lld for the tsan job only
- pass -fuse-ld=lld through cxx-flags so it reaches the link line,
avoiding the semicolon-list pitfall of
list(APPEND CMAKE_EXE_LINKER_FLAGS ...)
- build the bundled googletest with CMAKE_POSITION_INDEPENDENT_CODE=ON:
lld rejects R_X86_64_32 relocations from the non-PIC libgtest.a when
producing the position-independent tsan executable; the bundle is built
by a separate cmake invocation, so the flag must be set there
- Region.PreallocateInsideSession.shmem and Example.region.shmem failed
because mlock() of the managed segment/region hit RLIMIT_MEMLOCK on the
runner ("Cannot allocate memory"); the region example then hung until
its 30s timeout
- raise the limit via `sudo prlimit` in the same shell that launches
ctest (per-process, so it must be done here, not in a prior step)
- replace threeal/ctest-action with the equivalent ctest invocation
- LSan symbolizes the leak as the C++ method `zmq::msg_t::init_size`
in the Debug sanitizer build, no longer the C wrapper
`zmq_msg_init_size`
- substring match failed (`_` vs `::`), so the suppression no longer
applied and the asan+lsan+ubsan job failed in Pair/PubSub/Poller tests
- add the demangled frame, keep the old pattern for older libzmq
Committed lockfiles pinned gcc as a host-path external (from spack compiler
find), which is not portable across runners and broke CI. Cache the gcc
compiler itself as a buildcache node instead, so CI pulls it (~1 min) rather
than building it from source (~1 h).
- push the freshly-built gcc node in setup-deps BEFORE spack compiler find
(which marks it external and excludes it from buildcache push), gated behind
a push-gcc input used only by the buildcache workflow
- drop the committed-lockfile approach: remove test/ci/locks, the lockfile
install path in setup-deps, and the lockfile export in the buildcache workflow
- drop the ignored ref input from setup-spack (v3 renamed it to spack_ref)
Reusing concretization between the weekly buildcache (fresh) and weekday CI
(reuse) can drift if runner externals change, causing avoidable cache misses.
- setup-deps installs from test/ci/locks/<env>-gcc<N>.lock when it exists,
skipping concretization for byte-identical hashes; falls back to the spec
yaml otherwise
- buildcache exports each env's spack.lock as a downloadable artifact so the
lockfiles can be regenerated on the ubuntu-24.04 runner and committed
- document the manual regeneration flow in test/ci/locks/README.md
FairMQ's own sources (library, examples, tests) were recompiled from scratch
in every matrix job on every push.
- add hendrikmuhs/ccache-action to build, sanitizers and static-analysis jobs
- set CMAKE_C/CXX_COMPILER_LAUNCHER=ccache so cmake routes through it
- key the cache per (job, env, gcc) since ccache hashes the compiler
The push trigger had a path filter but no branch filter, so any PR-branch push
touching those paths (e.g. a dependabot rebase pulling in setup-deps changes)
launched the full fresh buildcache matrix concurrently with CI.
- restrict the push trigger to branches [dev, master]
- frees runners for CI; the cache still refreshes via cron and on dev/master
gcc was built from source (~58 min/job) because the FairMQ buildcache mirror
is only configured inside the env yaml, while gcc is installed before the env
is created.
- register the mirror globally after spack setup so "Install GCC" pulls the
compiler as a binary
- pin runners to ubuntu-24.04 so the weekly buildcache and weekday CI share an
image and concretize to matching hashes
- bump setup-spack to v3 to match the update-index job
Boost 1.88 replaced Boost.Process with v2, breaking the v1 API.
Boost 1.89 restores v1 compatibility via <boost/process/v1.hpp>.
- Fail configuration if Boost 1.88 is detected
- Define FAIRMQ_BOOST_PROCESS_V1_HEADER for Boost >= 1.89
- Use conditional includes to select v1.hpp or process.hpp
- Add namespace aliases (bp, bp_this) for portable API access
Use JSON output and jq to select the newest installed gcc version
when multiple versions match, avoiding conflicts between system and
spack-installed compilers.
- rename mirror to ghcr-buildcache
- find system compiler before building gcc
- separate update-index job to avoid race condition
- always attempt push even on partial failure