Candle Backend: Fixing 33 Failing ONNX Tests

Aug 29, 2025 by Marco 45 views

Recently, while enabling the Candle backend for ONNX tests, a significant number of tests failed – 33 out of 259 to be exact. This article discusses the challenges encountered and the steps taken to address these issues, focusing on the tracel-ai/burn project.

Initial Findings and Challenges

Hey guys! So, we fired up the Candle backend for ONNX tests and, woah, a bunch of tests went south. Initially, I tackled the broadcasting issues with less/greater/equal operations, but it became clear there's a whole lotta more to fix. Let's dive into what we found.

When running the ONNX tests, a total of 259 tests were executed. While many tests passed successfully, a significant number, 33 tests, failed. These failures indicate underlying issues within the Candle backend implementation that need to be addressed to ensure compatibility and correct behavior with ONNX models. This means we've got some work to do to bring Candle up to par. The variety of failures suggests that the fixes will span different areas of the backend's functionality. This could involve anything from basic operations to more complex layer implementations.

Here's a quick rundown of the failed tests:

failures:
    avg_pool::tests::avg_pool1d
    avg_pool::tests::avg_pool2d
    bitshift::tests::bitshift_left_scalar_tensor
    bitshift::tests::bitshift_left_tensors
    bitshift::tests::bitshift_right_scalar_tensor
    bitshift::tests::bitshift_right_tensors
    bitshift::tests::scalar_bitshift_left_tensor
    bitshift::tests::scalar_bitshift_right_tensor
    bitwise_and::tests::bitwise_and_scalar_tensor
    bitwise_and::tests::bitwise_and_tensors
    bitwise_and::tests::scalar_bitwise_and_tensor
    bitwise_not::tests::bitwise_not_tensors
    bitwise_or::tests::bitwise_or_scalar_tensor
    bitwise_or::tests::bitwise_or_tensors
    bitwise_or::tests::scalar_bitwise_or_tensor
    bitwise_xor::tests::bitwise_xor_scalar_tensor
    bitwise_xor::tests::bitwise_xor_tensors
    bitwise_xor::tests::scalar_bitwise_xor_tensor
    conv::tests::conv2d
    conv::tests::conv3d
    conv_transpose::tests::conv_transpose2d
    conv_transpose::tests::conv_transpose3d
    global_avr_pool::tests::globalavrpool_1d_2d
    maxpool::tests::maxpool1d
    maxpool::tests::maxpool2d
    pow::tests::pow_with_tensor_and_scalar
    reduce::tests::reduce_prod
    resize::tests::resize_with_scales_1d_linear
    resize::tests::resize_with_scales_2d_bicubic
    resize::tests::resize_with_scales_2d_bilinear
    resize::tests::resize_with_shape
    resize::tests::resize_with_sizes
    where_op::tests::where_op_broadcast

These failures span a range of operations, highlighting areas where the Candle backend needs further development and refinement. Analyzing these failures is crucial for prioritizing development efforts and ensuring that the most common and critical ONNX operations are correctly supported. Understanding the root cause of each failure is essential. This involves examining the error messages, the test cases themselves, and the corresponding Candle backend implementation.

Deep Dive into Failure Categories

Let's break down these failures into categories to get a clearer picture of what's going on:

1. Pooling and Convolution Operations

Several tests related to pooling and convolution operations failed, indicating potential issues in the implementation of these core neural network layers. Specifically, failures in avg_pool, maxpool, conv, and conv_transpose tests suggest that the Candle backend may have limitations in handling padding, multi-dimensional convolutions, or transposed convolutions. These operations are fundamental in many neural network architectures, so addressing these issues is crucial for broader ONNX model compatibility.

avg_pool::tests::avg_pool1d and avg_pool::tests::avg_pool2d: These tests failed because Candle doesn't yet support padding in pooling operations. Padding is a crucial technique for controlling the output size of pooling layers and preventing information loss at the edges of the input tensor. The error message clearly states, “Candle does not support padding in pooling,” indicating a missing feature in the current implementation. Adding padding support will likely involve modifying the pooling operation's logic to handle the extra border elements during computation.
conv::tests::conv2d and conv::tests::conv3d: The conv2d test panicked because Candle doesn't support per-dimension options in convolutions, while conv3d failed because 3D convolutions are not supported at all. These limitations restrict the flexibility of the Candle backend in handling various convolution configurations. Per-dimension options allow for different kernel sizes, strides, and dilation factors along each spatial dimension, while 3D convolutions are essential for processing volumetric data. Implementing these features will involve extending the convolution operation's implementation to handle these additional parameters and data dimensions.
conv_transpose::tests::conv_transpose2d and conv_transpose::tests::conv_transpose3d: Similar to regular convolutions, transposed convolutions also suffer from a lack of support for per-dimension options and 3D operations. The error messages indicate that Candle does not support these features, highlighting a gap in the implementation of transposed convolution operations. Transposed convolutions are commonly used in upsampling layers and generative models, so adding support for these features is important for expanding the backend's capabilities.

2. Bitwise Operations

A significant number of failures are related to bitwise operations (bitshift, bitwise_and, bitwise_or, bitwise_xor, bitwise_not). These failures indicate that the Candle backend's implementation of integer tensor operations is incomplete. These operations are often used in tasks such as data manipulation, masking, and low-level algorithm implementations. Implementing these operations efficiently and correctly is crucial for supporting a wider range of ONNX models.

bitshift::tests::bitshift_left_tensors, bitshift::tests::bitshift_right_tensors, bitshift::tests::bitshift_left_scalar_tensor, bitshift::tests::bitshift_right_scalar_tensor, bitshift::tests::scalar_bitshift_left_tensor, and bitshift::tests::scalar_bitshift_right_tensor: These tests failed because bitwise left and right shift operations are not implemented for Candle's IntTensor. The error messages, such as “not implemented: bitwise_left_shift is not implemented for Candle IntTensor,” clearly indicate the missing functionality. Implementing bit shift operations involves shifting the bits of an integer tensor by a specified amount, which requires careful handling of data types and potential overflow conditions.
bitwise_and::tests::bitwise_and_scalar_tensor, bitwise_and::tests::bitwise_and_tensors, bitwise_and::tests::scalar_bitwise_and_tensor: These failures stem from the lack of implementation for bitwise AND operations between tensors and scalars in Candle's IntTensor. The error messages highlight that the bitwise_and and bitwise_and_scalar functions are not yet implemented. Implementing bitwise AND involves performing a logical AND operation on corresponding bits of the input tensors or scalars.
bitwise_not::tests::bitwise_not_tensors: The failure of this test indicates that the bitwise NOT operation is not implemented for Candle's IntTensor. Bitwise NOT inverts the bits of an integer tensor, which is a fundamental operation in many bit manipulation tasks.
bitwise_or::tests::bitwise_or_scalar_tensor, bitwise_or::tests::bitwise_or_tensors, bitwise_or::tests::scalar_bitwise_or_tensor: Similar to bitwise AND, these tests failed due to the missing implementation of bitwise OR operations between tensors and scalars in Candle's IntTensor. Bitwise OR performs a logical OR operation on corresponding bits of the inputs.
bitwise_xor::tests::bitwise_xor_scalar_tensor, bitwise_xor::tests::bitwise_xor_tensors, bitwise_xor::tests::scalar_bitwise_xor_tensor: These tests failed because bitwise XOR operations are not implemented for Candle's IntTensor. Bitwise XOR performs a logical exclusive OR operation on corresponding bits of the inputs.

3. Resize Operations

The failures in resize tests point to limitations in the interpolation methods supported by the Candle backend. The errors suggest that bilinear and bicubic interpolation, commonly used for resizing images and feature maps, are not yet implemented. Supporting a variety of interpolation methods is essential for handling different resizing requirements in ONNX models.

resize::tests::resize_with_scales_1d_linear, resize::tests::resize_with_scales_2d_bicubic, resize::tests::resize_with_scales_2d_bilinear, resize::tests::resize_with_shape, and resize::tests::resize_with_sizes: These tests failed because Candle does not support bilinear or bicubic interpolation. The error messages consistently state that “bilinear interpolation is not supported by Candle,” indicating a missing feature in the resize operation's implementation. Bilinear and bicubic interpolation are more sophisticated methods that consider the values of neighboring pixels to produce smoother results, which are often preferred in image processing tasks. Implementing these interpolation methods will require adding new algorithms to the resize operation that can calculate the output pixel values based on the input pixel grid.

4. Miscellaneous Failures

Other failures, such as those in pow, reduce_prod, and where_op, indicate potential numerical precision issues or incorrect handling of specific operations. These failures may require more in-depth debugging and analysis to identify the root cause.

pow::tests::pow_with_tensor_and_scalar: This test panicked because of a numerical discrepancy in the power operation. The error message, “Tensors are not eq: => Position 2: 729.0001 != 729,” suggests a potential issue with floating-point precision or the implementation of the power function itself. Debugging this failure may involve examining the input values and the intermediate calculations to pinpoint the source of the discrepancy.
reduce::tests::reduce_prod: The failure in the reduce_prod test also points to a numerical issue. The error message, “Expected 2340000, got 2340003, diff: 3,” indicates a mismatch in the expected and actual results of the product reduction operation. This could be due to an overflow, underflow, or an accumulation error in the multiplication process. Investigating this failure will likely involve examining the input tensor and the reduction logic to identify the source of the numerical error.
where_op::tests::where_op_broadcast: This test failed due to a shape mismatch in the where_op operation. The error message, “shape mismatch in where_cond, lhs: [2, 2], rhs: [2, 1],” indicates that the condition tensor's shape is incompatible with the input tensors. The where_op operation selects elements from two input tensors based on a condition tensor. If the shapes of the tensors are not correctly handled, it can lead to this type of error. Fixing this issue will require ensuring that the shapes of the input tensors and the condition tensor are properly aligned before performing the element selection.
global_avr_pool::tests::globalavrpool_1d_2d: This test failed because the adaptive average pooling operation (adaptive_avg_pool2) is not supported by Candle. Adaptive average pooling is a pooling technique that automatically adjusts the pooling region size to produce a fixed-size output, regardless of the input size. Implementing this operation would enhance the flexibility of the Candle backend in handling variable-sized inputs.

Steps Taken and Solutions Implemented

So far, I've managed to fix the broadcasting issues related to less/greater/equal operations. Broadcasting is a crucial feature that allows operations on tensors with different shapes, and getting this right is a big step forward. This involved carefully reviewing the broadcasting logic in the Candle backend and ensuring that it correctly handles different shape combinations.

Addressing Broadcasting Issues

Broadcasting is a fundamental concept in tensor operations that allows operations to be performed on tensors with different shapes. When the shapes of the input tensors do not match, broadcasting automatically expands the smaller tensor to match the shape of the larger tensor, enabling element-wise operations. Fixing broadcasting issues in the Candle backend is crucial for ensuring compatibility with a wide range of ONNX models.

The initial focus was on the less, greater, and equal operations, as these are commonly used in ONNX models and their correct behavior is essential for many downstream tasks. The fixes involved a careful review of the broadcasting logic within the Candle backend implementation. This included:

Identifying the broadcasting rules: The ONNX specification defines a clear set of rules for broadcasting tensor shapes. These rules were carefully analyzed to ensure that the Candle backend implementation adheres to them correctly.
Implementing the broadcasting logic: The broadcasting logic was implemented in a way that efficiently expands the smaller tensor to match the shape of the larger tensor. This involved creating new tensor views with adjusted strides and shapes, without physically copying the data.
Testing the implementation: Thorough testing was performed to ensure that the broadcasting logic works correctly for a wide range of shape combinations. This included unit tests that specifically target different broadcasting scenarios.

Next Steps and Future Work

Okay, so we've made some headway, but there's still a mountain to climb. The next steps involve tackling the remaining failures, likely starting with the most common or critical operations. This will involve a combination of code contributions, debugging, and potentially some architectural tweaks. Collaboration and community input will be key to making Candle a robust ONNX backend.

Prioritize the remaining failures: The remaining failures need to be prioritized based on their impact and frequency of occurrence in ONNX models. Operations that are commonly used or critical for specific tasks should be addressed first.
Implement missing operations: Many of the failures are due to missing implementations of specific ONNX operations, such as bitwise operations and advanced pooling techniques. These operations need to be implemented in the Candle backend, following the ONNX specification and best practices for performance and accuracy.
Debug numerical issues: Some failures point to numerical precision issues or incorrect handling of specific cases. These issues require careful debugging and analysis to identify the root cause and implement appropriate fixes. This may involve using numerical debugging tools, examining intermediate calculations, and comparing the results with other backends.
Improve test coverage: The test suite needs to be expanded to cover a wider range of ONNX models and operations. This will help to identify and prevent future regressions and ensure the long-term stability of the Candle backend.
Contribute to the community: Collaboration and community input are crucial for the success of the Candle backend. Contributions from other developers, including bug reports, feature requests, and code contributions, are highly valued and encouraged.

Conclusion

While 33 failing tests might seem daunting, it's a fantastic opportunity to strengthen the Candle backend and make it a top-notch choice for ONNX compatibility. Stay tuned for more updates as we continue to chip away at these issues. Together, we can make Candle shine! This initial effort highlights the ongoing work required to achieve full ONNX compatibility with the Candle backend. Addressing these failures will not only improve the backend's functionality but also contribute to the broader tracel-ai/burn project by providing a more robust and versatile tensor processing engine. The journey to complete ONNX support is a collaborative effort, and community contributions are essential for achieving this goal.