skip to Main Content

I have a project with Python 3.11.4. Until now I was able to build my docker image without issues with the next libraries (just mentioning the important ones related to the issue):

transformers 4.8.2

tokenizers 0.10.3

In the Dockerfile the rust compiler is installed by executing the next:

RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain 1.66.1
# Add .cargo/bin to PATH
ENV PATH="/root/.cargo/bin:${PATH}"

But from yesterday it is not working anymore. It is giving an error by trying to install the tokenizers library when one week ago it was working fine and not changes were done in the code until yesterday. This is how the error looks:


215.3      Compiling tokenizers v0.10.1 (/tmp/pip-req-build-nxa8r_ow/tokenizers-lib)
215.3       Running `/root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="default"' --cfg 'feature="indicatif"' --cfg 'feature="progressbar"' -C metadata=dde566ece3782b43 -C extra-filename=-dde566ece3782b43 --out-dir /tmp/pip-req-build-nxa8r_ow/target/release/deps -L dependency=/tmp/pip-req-build-nxa8r_ow/target/release/deps --extern clap=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libclap-e97ec3243ee04998.rmeta --extern derive_builder=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libderive_builder-41ec09770f2959ba.so --extern esaxx_rs=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libesaxx_rs-f2759c190d1e60f6.rmeta --extern indicatif=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libindicatif-a13f9aa303c115b1.rmeta --extern itertools=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libitertools-2781adac909e0e8d.rmeta --extern lazy_static=/tmp/pip-req-build-nxa8r_ow/target/release/deps/liblazy_static-b3eac7b1efe0daf0.rmeta --extern log=/tmp/pip-req-build-nxa8r_ow/target/release/deps/liblog-e9c072abf79b5d2b.rmeta --extern onig=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libonig-7322b1e79f302581.rmeta --extern rand=/tmp/pip-req-build-nxa8r_ow/target/release/deps/librand-eb8967ca2ff2f601.rmeta --extern rayon=/tmp/pip-req-build-nxa8r_ow/target/release/deps/librayon-06bbb925cd5ab1af.rmeta --extern rayon_cond=/tmp/pip-req-build-nxa8r_ow/target/release/deps/librayon_cond-d5db76508c986330.rmeta --extern regex=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libregex-42ac3f9a5fee7536.rmeta --extern regex_syntax=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libregex_syntax-6d6a76aa7e489183.rmeta --extern serde=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libserde-699614b478bcb51c.rmeta --extern serde_json=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libserde_json-931598b7299b1c2d.rmeta --extern spm_precompiled=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libspm_precompiled-71a4dce0d8e7a388.rmeta --extern unicode_normalization_alignments=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libunicode_normalization_alignments-a9d3428c3ac7b5af.rmeta --extern unicode_segmentation=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libunicode_segmentation-83c854e18f560ee0.rmeta --extern unicode_categories=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libunicode_categories-efa8a5c4f5aee929.rmeta -L native=/tmp/pip-req-build-nxa8r_ow/target/release/build/esaxx-rs-a7fec0442126d010/out -L native=/tmp/pip-req-build-nxa8r_ow/target/release/build/onig_sys-03c260a5c54a327a/out`
215.3     warning: `#[macro_use]` only has an effect on `extern crate` and modules
215.3      --> tokenizers-lib/src/utils/mod.rs:24:1
215.3      |
215.3     24 | #[macro_use]
215.3      | ^^^^^^^^^^^^
215.3      |
215.3      = note: `#[warn(unused_attributes)]` on by default
215.3     
215.3     warning: `#[macro_use]` only has an effect on `extern crate` and modules
215.3      --> tokenizers-lib/src/utils/mod.rs:35:1
215.3      |
215.3     35 | #[macro_use]
215.3      | ^^^^^^^^^^^^
215.3     
215.3     warning: variable does not need to be mutable
215.3      --> tokenizers-lib/src/models/unigram/model.rs:280:21
215.3       |
215.3     280 |         let mut target_node = &mut best_path_ends_at[key_pos];
215.3       |           ----^^^^^^^^^^^
215.3       |           |
215.3       |           help: remove this `mut`
215.3       |
215.3       = note: `#[warn(unused_mut)]` on by default
215.3     
215.3     warning: variable does not need to be mutable
215.3      --> tokenizers-lib/src/models/unigram/model.rs:297:21
215.3       |
215.3     297 |         let mut target_node = &mut best_path_ends_at[starts_at + mblen];
215.3       |           ----^^^^^^^^^^^
215.3       |           |
215.3       |           help: remove this `mut`
215.3     
215.3     warning: variable does not need to be mutable
215.3      --> tokenizers-lib/src/pre_tokenizers/byte_level.rs:175:59
215.3       |
215.3     175 |   encoding.process_tokens_with_offsets_mut(|(i, (token, mut offsets))| {
215.3       |                              ----^^^^^^^
215.3       |                              |
215.3       |                              help: remove this `mut`
215.3     
215.3     warning: fields `bos_id` and `eos_id` are never read
215.3      --> tokenizers-lib/src/models/unigram/lattice.rs:59:5
215.3      |
215.3     53 | pub struct Lattice {
215.3      |      ------- fields in this struct
215.3     ...
215.3     59 |   bos_id: usize,
215.3      |   ^^^^^^
215.3     60 |   eos_id: usize,
215.3      |   ^^^^^^
215.3      |
215.3      = note: `Lattice` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
215.3      = note: `#[warn(dead_code)]` on by default
215.3     
215.3     error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
215.3      --> tokenizers-lib/src/models/bpe/trainer.rs:517:47
215.3       |
215.3     513 |           let w = &words[*i] as *const _ as *mut _;
215.3       |               -------------------------------- casting happend here
215.3     ...
215.3     517 |             let word: &mut Word = &mut (*w);
215.3       |                        ^^^^^^^^^
215.3       |
215.3       = note: `#[deny(invalid_reference_casting)]` on by default
215.3     
215.3     warning: `tokenizers` (lib) generated 6 warnings
215.3     error: could not compile `tokenizers` (lib) due to previous error; 6 warnings emitted
215.3     
215.3     Caused by:
215.3      process didn't exit successfully: `/root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="default"' --cfg 'feature="indicatif"' --cfg 'feature="progressbar"' -C metadata=dde566ece3782b43 -C extra-filename=-dde566ece3782b43 --out-dir /tmp/pip-req-build-nxa8r_ow/target/release/deps -L dependency=/tmp/pip-req-build-nxa8r_ow/target/release/deps --extern clap=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libclap-e97ec3243ee04998.rmeta --extern derive_builder=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libderive_builder-41ec09770f2959ba.so --extern esaxx_rs=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libesaxx_rs-f2759c190d1e60f6.rmeta --extern indicatif=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libindicatif-a13f9aa303c115b1.rmeta --extern itertools=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libitertools-2781adac909e0e8d.rmeta --extern lazy_static=/tmp/pip-req-build-nxa8r_ow/target/release/deps/liblazy_static-b3eac7b1efe0daf0.rmeta --extern log=/tmp/pip-req-build-nxa8r_ow/target/release/deps/liblog-e9c072abf79b5d2b.rmeta --extern onig=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libonig-7322b1e79f302581.rmeta --extern rand=/tmp/pip-req-build-nxa8r_ow/target/release/deps/librand-eb8967ca2ff2f601.rmeta --extern rayon=/tmp/pip-req-build-nxa8r_ow/target/release/deps/librayon-06bbb925cd5ab1af.rmeta --extern rayon_cond=/tmp/pip-req-build-nxa8r_ow/target/release/deps/librayon_cond-d5db76508c986330.rmeta --extern regex=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libregex-42ac3f9a5fee7536.rmeta --extern regex_syntax=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libregex_syntax-6d6a76aa7e489183.rmeta --extern serde=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libserde-699614b478bcb51c.rmeta --extern serde_json=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libserde_json-931598b7299b1c2d.rmeta --extern spm_precompiled=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libspm_precompiled-71a4dce0d8e7a388.rmeta --extern unicode_normalization_alignments=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libunicode_normalization_alignments-a9d3428c3ac7b5af.rmeta --extern unicode_segmentation=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libunicode_segmentation-83c854e18f560ee0.rmeta --extern unicode_categories=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libunicode_categories-efa8a5c4f5aee929.rmeta -L native=/tmp/pip-req-build-nxa8r_ow/target/release/build/esaxx-rs-a7fec0442126d010/out -L native=/tmp/pip-req-build-nxa8r_ow/target/release/build/onig_sys-03c260a5c54a327a/out` (exit status: 1)
215.3     warning: build failed, waiting for other jobs to finish...
215.3     error: `cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module --crate-type cdylib --` failed with code 101
215.3     [end of output]
215.3   
215.3   note: This error originates from a subprocess, and is likely not a problem with pip.
215.3   ERROR: Failed building wheel for tokenizers
215.3  Failed to build tokenizers
215.3  ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
215.3  
215.3 
215.3  at /opt/poetry/venv/lib/python3.11/site-packages/poetry/utils/env.py:1540 in _run
215.4    1536│         output = subprocess.check_output(
215.4    1537│           command, stderr=subprocess.STDOUT, env=env, **kwargs
215.4    1538│         )
215.4    1539│     except CalledProcessError as e:
215.4   → 1540│       raise EnvCommandError(e, input=input_)
215.4    1541│ 
215.4    1542│     return decode(output)
215.4    1543│ 
215.4    1544│   def execute(self, bin: str, *args: str, **kwargs: Any) -> int:
215.4 
215.4 The following error occurred when trying to handle this error:
215.4 
215.4 
215.4  PoetryException
215.4 
215.4  Failed to install /root/.cache/pypoetry/artifacts/11/c1/65/6a1ee2c3ed75cdc8840c15fb385ec739aedba8424fd6b250657ff16342/tokenizers-0.10.3.tar.gz
215.4 
215.4  at /opt/poetry/venv/lib/python3.11/site-packages/poetry/utils/pip.py:58 in pip_install
215.4    54│ 
215.4    55│   try:
215.4    56│     return environment.run_pip(*args)
215.4    57│   except EnvCommandError as e:
215.4   → 58│     raise PoetryException(f"Failed to install {path.as_posix()}") from e
215.4    59│ 
215.4 
------
Dockerfile:57
--------------------
 55 |   FROM builder-image AS dev-image
 56 |   
 57 | >>> RUN poetry install --with main,dev
 58 |   
 59 |   FROM builder-image AS runtime-image
--------------------
ERROR: failed to solve: process "/bin/sh -c poetry install --with main,dev" did not complete successfully: exit code: 1

2

Answers


  1. This seems to be an issue in the tokenizers Rust code that Rust-1.73.0 is sensitive to more than earlier versions of the Rust compiler were. I observed a very similar error when building tokenizers-0.12.1 with rust-1.73.0 but the build went through after I downgraded to Rust-1.72.1. I suspect the difference between these Rust versions that’s relevant here is the fix to rust #112431 (the last item of the Rust-1.73.0 release notes, section Language).

    I am not finding the recommended way to install a concrete version of Rust compiler using their Rustup (or rustup-init) tool but perhaps there is an easy way to do that using the package manager of the system you have installed in the container. (Based on the information you have posted, I am not able to determine what OS that is.)

    Login or Signup to reply.
  2. tldr;

    set environment variable RUSTUP_TOOLCHAIN to the version you want to use before building tokenizers

    Details

    When building the wheel for tokenizers one of the build steps you’ll see in the logs is

    125.9       running build_rust
    

    which will set the version of the rust-toolchain to match what is set in the build spec, in the tokenizers repo it is set to stable. In the build logs you can see that it is using the stable version of rustc instead of the version set via --default-toolchain when rust was installed

    215.3     error: could not compile `tokenizers` (lib) due to previous error; 6 warnings emitted
    215.3     
    215.3     Caused by:
    215.3      process didn't exit successfully: `/root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc
    

    Note the path stable-x86_64-unknown-linux-gnu/bin/rustc in the above. From the rustup book, you can supply an override with a higher priority, in this case setting RUSTUP_TOOLCHAIN is sufficient.

    Dockerfile

    Use the below to setup rust and the rust-toolchain before building the tokenizers wheel

    RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- --default-toolchain=1.72.1 -y
    ENV PATH="/root/.cargo/bin:${PATH}"
    ENV RUSTUP_TOOLCHAIN=1.72.1
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search