Weird Bug: Force Computation Fails Without Print Statement

Aug 23, 2025 by Marco 59 views

[BUG] Force Computation Function (Self Written) Behaves Incorrectly Without Print Statement – Potential Mac ARM JIT/Compiler Bug

Introduction

Hey guys! Today, we're diving into a tricky bug encountered while implementing a variant of the Discrete Element Method (DEM) using NVIDIA's Warp. This involves simulating a particle colliding with a wall and bouncing back, using physically accurate coefficients. The issue arises in the compute_force_sw function, where the collision logic behaves strangely depending on the presence of a print statement. Let's break down the problem and explore potential solutions. This article aims to provide an in-depth look at the bug, its context, and how it can be reproduced, offering valuable insights for developers and researchers in the field of computational physics and GPU computing.

Bug Description

The core of the issue lies within the compute_force_sw function. Specifically, the force calculation seems to fail when a particle contacts a wall if a certain print statement (line 241 in the provided code) is disabled. When the print statement is enabled, the code miraculously works perfectly, reproducing the expected force curve. This inconsistent behavior points towards a potential bug in the Mac ARM JIT/compiler, especially since the problem disappears with the seemingly innocuous addition of a print statement.

The provided Python code is self-contained and doesn't rely on external files, making it easier to reproduce the bug. The code simulates particle-wall collisions using Warp, a framework designed for high-performance computing on GPUs. The inconsistency suggests a potential issue in how the code is being optimized or compiled for the ARM architecture on Mac systems. This type of behavior is often linked to JIT (Just-In-Time) compilation or compiler optimizations that might not be handling certain code patterns correctly. The bug highlights the importance of thorough testing and debugging, especially when dealing with complex numerical simulations and hardware-specific optimizations.

Detailed Explanation of the Code

To understand the bug, it's crucial to look at the code structure. The code defines two main classes: ParticlesDEM and WallsDEM. The ParticlesDEM class manages particle properties like position, velocity, force, and material characteristics. The WallsDEM class represents the properties of the walls involved in the simulation. The compute_force_sw function, which is at the heart of the problem, calculates the contact forces between particles and walls. This function considers factors like overlap, relative velocities, material properties, and friction to determine the resulting forces and torques. The simulation advances in discrete time steps, with the stage1, stage2, and stage3 kernels updating particle states based on the computed forces. The use of Warp's HashGrid for neighbor searching further optimizes the computation by reducing the number of particle-particle and particle-wall interaction checks. Understanding these core components is essential for pinpointing the exact cause of the bug and devising effective solutions.

Reproducing the Bug

To reproduce this bug, you'll need a Mac ARM system and the provided Python code. The code requires the h5py package for writing data. The key step is to run the simulation with and without the print statement on line 241 in the compute_force_sw function. When the print statement is disabled, the force computation fails when the particle is in contact with the wall. However, enabling the print statement makes the code work as expected, reproducing the correct force curve. This behavior indicates a timing or optimization-related issue, where the presence of the print statement alters the execution flow enough to mask the underlying problem. The ability to consistently reproduce the bug is crucial for further investigation and debugging.

Code Snippet

Here's the relevant code snippet where the bug manifests:

@wp.kernel
def compute_force_sw(
        grid: wp.uint64,
        x: wp.array(dtype=wp.vec3),
        u: wp.array(dtype=wp.vec3),
        au: wp.array(dtype=wp.vec3),
        force: wp.array(dtype=wp.vec3),
        torque: wp.array(dtype=wp.vec3),
        omega: wp.array(dtype=wp.vec3),
        m: wp.array(dtype=wp.float32),
        rho: wp.array(dtype=wp.float32),
        rad: wp.array(dtype=wp.float32),
        E: wp.array(dtype=wp.float32),
        nu: wp.array(dtype=wp.float32),
        G: wp.array(dtype=wp.float32),
        I: wp.array(dtype=wp.float32),
        tangential_disp_sw_x: wp.array(dtype=wp.float32),
        tangential_disp_sw_y: wp.array(dtype=wp.float32),
        tangential_disp_sw_z: wp.array(dtype=wp.float32),

        x_w: wp.array(dtype=wp.vec3),
        u_w: wp.array(dtype=wp.vec3),
        omega_w: wp.array(dtype=wp.vec3),
        normal_w: wp.array(dtype=wp.vec3),
        E_w: wp.array(dtype=wp.float32),
        nu_w: wp.array(dtype=wp.float32),
        G_w: wp.array(dtype=wp.float32),
        total_no_walls: wp.array(dtype=wp.int32),

        dt: wp.float32,
        en: wp.float32,
        fric_coeff: wp.float32
):
    i = wp.tid()

    pos_i = x[i]
    for j in range(len(x_w)):
        print(j)
        pos_j = x_w[j]
        pos_ij = pos_i - pos_j

        nij = normal_w[j]

        tmp = wp.dot(pos_ij, nij)
        overlap = rad[i] - tmp

        # Index where we save current walls contact info
        found_at = i * total_no_walls[0] + j
        # ---------- force computation starts ------------
        # if particles are overlapping
        if overlap > 1e-6:
            a_i = rad[i] - overlap / (2.)

            vel_i = u[i] + wp.cross(omega[i], nij) * a_i
            vel_j = wp.vec3(0., 0., 0.)

            # Now the relative velocity of particle i w.r.t j at the contact point is
            vel_ij = vel_i - vel_j

            # normal velocity magnitude
            vij_dot_nij = wp.dot(vel_ij, nij)
            vn = vij_dot_nij * nij

            # tangential velocity
            vt = vel_ij - vn

            ############################
            # normal force computation #
            ############################
            # Compute stiffness
            # effective Young's modulus
            tmp_1 = (1. - nu[i]**2.) / E[i]
            tmp_2 = (1. - nu_w[j]**2.) / E_w[j]
            E_eff = 1. / (tmp_1 + tmp_2)
            tmp_3 = 1. / rad[i]
            tmp_4 = 0.
            R_eff = 1. / (tmp_3 + tmp_4)
            # Eq 4 [1]
            kn = (4. / 3.) * E_eff * R_eff**0.5

            # compute damping coefficient
            tmp_1 = wp.log(en)
            tmp_2 = tmp_1**2. + wp.pi**2.
            beta = tmp_1 / (tmp_2)**0.5
            S_n = 2. * E_eff * (R_eff * overlap)**0.5
            tmp_1 = 1. / m[i]
            tmp_2 = 0.
            m_eff = 1. / (tmp_1 + tmp_2)
            eta_n = -2. * (5./6.)**0.5 * beta * (S_n * m_eff)**0.5

            fn = kn * overlap**1.5
            fn_x = fn * nij[0] - eta_n * vn[0]
            fn_y = fn * nij[1] - eta_n * vn[1]
            fn_z = fn * nij[2] - eta_n * vn[2]

            #################################
            # tangential force computation  #
            #################################
            ft_x = 0.
            ft_y = 0.
            ft_z = 0.

            vij_magn = wp.length(vel_ij)

            if vij_magn < 1e-6:
                tangential_disp_sw_x[found_at] = 0.
                tangential_disp_sw_y[found_at] = 0.
                tangential_disp_sw_z[found_at] = 0.
            else:
                tangential_disp_sw_x[found_at] = tangential_disp_sw_x[found_at] + vt[0] * dt
                tangential_disp_sw_y[found_at] = tangential_disp_sw_y[found_at] + vt[1] * dt
                tangential_disp_sw_z[found_at] = tangential_disp_sw_z[found_at] + vt[2] * dt

                # Compute the tangential stiffness
                tmp_1 = (2. - nu[i]) / G[i]
                tmp_2 = (2. - nu_w[j]) / G_w[j]
                G_eff = 1. / (tmp_1 + tmp_2)
                # Eq 12 [1]
                kt = 8. * G_eff * (R_eff * overlap)**0.5
                S_t = kt
                eta_t = -2. * (5./6.)**0.5 * beta * (S_t * m_eff)**0.5

                ft_x_star = -kt * tangential_disp_sw_x[found_at] - eta_t * vt[0]
                ft_y_star = -kt * tangential_disp_sw_y[found_at] - eta_t * vt[1]
                ft_z_star = -kt * tangential_disp_sw_z[found_at] - eta_t * vt[2]

                ft_magn = (ft_x_star**2. + ft_y_star**2. + ft_z_star**2.)**0.5

                ti = wp.vec3(0., 0., 0.)

                if ft_magn > 1e-6:
                    ti[0] = ft_x_star / ft_magn
                    ti[1] = ft_y_star / ft_magn
                    ti[2] = ft_z_star / ft_magn

                fn_magn = (fn_x**2. + fn_y**2. + fn_z**2.)**0.5

                ft_magn_star = min(fric_coeff * fn_magn, ft_magn)

                # compute the tangential force, by equation 17 (Lethe)
                ft_x = ft_magn_star * ti[0]
                ft_y = ft_magn_star * ti[1]
                ft_z = ft_magn_star * ti[2]

                # Add damping to the limited force
                ft_x = ft_x + eta_t * vt[0]
                ft_y = ft_y + eta_t * vt[1]
                ft_z = ft_z + eta_t * vt[2]

                # reset the spring length
                tangential_disp_sw_x[found_at] = -ft_x / kt
                tangential_disp_sw_y[found_at] = -ft_y / kt
                tangential_disp_sw_z[found_at] = -ft_z / kt

            force[i][0] = force[i][0] + fn_x + ft_x
            force[i][1] = force[i][1] + fn_y + ft_y
            force[i][2] = force[i][2] + fn_z + ft_z

            # torque = n cross F
            torque[i][0] = torque[i][0] + (nij[1] * ft_z - nij[2] * ft_y) * a_i
            torque[i][1] = torque[i][1] + (nij[2] * ft_x - nij[0] * ft_z) * a_i
            torque[i][2] = torque[i][2] + (nij[0] * ft_y - nij[1] * ft_x) * a_i
        else:
            tangential_disp_sw_x[found_at] = -ft_x / kt
            tangential_disp_sw_y[found_at] = -ft_y / kt
            tangential_disp_sw_z[found_at] = -ft_z / kt

The critical line is:

print(j) # Line 241

This seemingly innocuous print statement is the key to triggering or suppressing the bug. When enabled, the simulation works correctly; when disabled, the force computation fails when the particle is in contact with the wall. This points towards a JIT/compiler optimization issue specific to the Mac ARM architecture. This Heisenbug, as it's sometimes called, can be incredibly challenging to debug because the very act of observing the system changes its behavior.

Potential Causes and Debugging Strategies

Several factors might be contributing to this peculiar behavior:

JIT Compiler Optimization: The Mac ARM's JIT compiler might be optimizing the code in a way that causes incorrect computations when the print statement is absent. The print statement could be acting as a barrier, preventing certain optimizations from occurring and thus masking the bug.
Memory Alignment: Incorrect memory alignment can sometimes lead to unexpected behavior. The print statement might be altering memory access patterns, inadvertently correcting alignment issues.
Race Conditions: Although less likely in this single-threaded context, race conditions can sometimes manifest in similar ways. The print statement could be introducing a slight delay, changing the timing and avoiding the race condition.
Floating-Point Precision: Subtle differences in floating-point computations due to compiler optimizations could be accumulating and leading to divergence. The print statement might be forcing a flush to memory, effectively resetting the floating-point environment.

To debug this issue, consider the following strategies:

Disable Compiler Optimizations: Try compiling the code with optimizations disabled to see if the bug persists. This can help determine if the JIT compiler is the culprit.
Memory Inspection: Use debugging tools to inspect memory layout and access patterns, looking for potential alignment issues.
Logging: Implement more detailed logging of intermediate variables to pinpoint exactly where the computation diverges.
Minimal Reproducible Example: Try to create a smaller, simpler version of the code that still exhibits the bug. This can help isolate the problematic code section.
Consult Warp Documentation and Community: Check NVIDIA Warp's documentation and community forums for similar issues or insights.

System Information

The bug was observed on a Mac OS Arm system, using Warp 1.8.1 and Python 3.12.11. This specific environment might be crucial for reproducing the bug, as different hardware and software configurations could yield different results. Reporting the system information is essential when filing bug reports or seeking help from the community.

Reporting the Bug

If you encounter a similar issue, reporting the bug is crucial. Here’s how to effectively report it:

Provide a Minimal Reproducible Example: Include a simplified version of the code that demonstrates the bug.
Describe the Expected Behavior: Clearly state what the code should do.
Describe the Actual Behavior: Explain what actually happens, including any error messages or unexpected results.
Include System Information: Provide details about your operating system, hardware, and software versions.
Detail Reproduction Steps: Outline the steps needed to reproduce the bug.

Conclusion

This bug highlights the complexities of high-performance computing and the challenges of debugging JIT compiler-related issues. The seemingly random influence of a print statement underscores the need for careful testing and a deep understanding of the underlying hardware and software architecture. By systematically investigating potential causes and employing effective debugging strategies, we can identify and resolve these elusive bugs, ensuring the reliability and accuracy of our simulations. If you've faced similar challenges or have insights into this particular issue, feel free to share your experiences and solutions in the comments below! Let's work together to make our code more robust and our simulations more accurate.

NVIDIA Warp Documentation: Check the official NVIDIA Warp documentation for updates, bug fixes, and community discussions.
Stack Overflow: Search for similar issues on Stack Overflow and engage with the community.
GitHub Issues: Report the bug on the NVIDIA Warp GitHub repository to get direct support from the developers.

I hope this detailed analysis helps in understanding and addressing this tricky bug. Happy debugging, guys!