I have a project with Python 3.11.4. Until now I was able to build my docker image without issues with the next libraries (just mentioning the important ones related to the issue):
transformers 4.8.2
tokenizers 0.10.3
In the Dockerfile the rust compiler is installed by executing the next:
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain 1.66.1
# Add .cargo/bin to PATH
ENV PATH="/root/.cargo/bin:${PATH}"
But from yesterday it is not working anymore. It is giving an error by trying to install the tokenizers library when one week ago it was working fine and not changes were done in the code until yesterday. This is how the error looks:
215.3 Compiling tokenizers v0.10.1 (/tmp/pip-req-build-nxa8r_ow/tokenizers-lib) 215.3 Running `/root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="default"' --cfg 'feature="indicatif"' --cfg 'feature="progressbar"' -C metadata=dde566ece3782b43 -C extra-filename=-dde566ece3782b43 --out-dir /tmp/pip-req-build-nxa8r_ow/target/release/deps -L dependency=/tmp/pip-req-build-nxa8r_ow/target/release/deps --extern clap=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libclap-e97ec3243ee04998.rmeta --extern derive_builder=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libderive_builder-41ec09770f2959ba.so --extern esaxx_rs=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libesaxx_rs-f2759c190d1e60f6.rmeta --extern indicatif=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libindicatif-a13f9aa303c115b1.rmeta --extern itertools=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libitertools-2781adac909e0e8d.rmeta --extern lazy_static=/tmp/pip-req-build-nxa8r_ow/target/release/deps/liblazy_static-b3eac7b1efe0daf0.rmeta --extern log=/tmp/pip-req-build-nxa8r_ow/target/release/deps/liblog-e9c072abf79b5d2b.rmeta --extern onig=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libonig-7322b1e79f302581.rmeta --extern rand=/tmp/pip-req-build-nxa8r_ow/target/release/deps/librand-eb8967ca2ff2f601.rmeta --extern rayon=/tmp/pip-req-build-nxa8r_ow/target/release/deps/librayon-06bbb925cd5ab1af.rmeta --extern rayon_cond=/tmp/pip-req-build-nxa8r_ow/target/release/deps/librayon_cond-d5db76508c986330.rmeta --extern regex=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libregex-42ac3f9a5fee7536.rmeta --extern regex_syntax=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libregex_syntax-6d6a76aa7e489183.rmeta --extern serde=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libserde-699614b478bcb51c.rmeta --extern serde_json=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libserde_json-931598b7299b1c2d.rmeta --extern spm_precompiled=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libspm_precompiled-71a4dce0d8e7a388.rmeta --extern unicode_normalization_alignments=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libunicode_normalization_alignments-a9d3428c3ac7b5af.rmeta --extern unicode_segmentation=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libunicode_segmentation-83c854e18f560ee0.rmeta --extern unicode_categories=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libunicode_categories-efa8a5c4f5aee929.rmeta -L native=/tmp/pip-req-build-nxa8r_ow/target/release/build/esaxx-rs-a7fec0442126d010/out -L native=/tmp/pip-req-build-nxa8r_ow/target/release/build/onig_sys-03c260a5c54a327a/out` 215.3 warning: `#[macro_use]` only has an effect on `extern crate` and modules 215.3 --> tokenizers-lib/src/utils/mod.rs:24:1 215.3 | 215.3 24 | #[macro_use] 215.3 | ^^^^^^^^^^^^ 215.3 | 215.3 = note: `#[warn(unused_attributes)]` on by default 215.3 215.3 warning: `#[macro_use]` only has an effect on `extern crate` and modules 215.3 --> tokenizers-lib/src/utils/mod.rs:35:1 215.3 | 215.3 35 | #[macro_use] 215.3 | ^^^^^^^^^^^^ 215.3 215.3 warning: variable does not need to be mutable 215.3 --> tokenizers-lib/src/models/unigram/model.rs:280:21 215.3 | 215.3 280 | let mut target_node = &mut best_path_ends_at[key_pos]; 215.3 | ----^^^^^^^^^^^ 215.3 | | 215.3 | help: remove this `mut` 215.3 | 215.3 = note: `#[warn(unused_mut)]` on by default 215.3 215.3 warning: variable does not need to be mutable 215.3 --> tokenizers-lib/src/models/unigram/model.rs:297:21 215.3 | 215.3 297 | let mut target_node = &mut best_path_ends_at[starts_at + mblen]; 215.3 | ----^^^^^^^^^^^ 215.3 | | 215.3 | help: remove this `mut` 215.3 215.3 warning: variable does not need to be mutable 215.3 --> tokenizers-lib/src/pre_tokenizers/byte_level.rs:175:59 215.3 | 215.3 175 | encoding.process_tokens_with_offsets_mut(|(i, (token, mut offsets))| { 215.3 | ----^^^^^^^ 215.3 | | 215.3 | help: remove this `mut` 215.3 215.3 warning: fields `bos_id` and `eos_id` are never read 215.3 --> tokenizers-lib/src/models/unigram/lattice.rs:59:5 215.3 | 215.3 53 | pub struct Lattice { 215.3 | ------- fields in this struct 215.3 ... 215.3 59 | bos_id: usize, 215.3 | ^^^^^^ 215.3 60 | eos_id: usize, 215.3 | ^^^^^^ 215.3 | 215.3 = note: `Lattice` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis 215.3 = note: `#[warn(dead_code)]` on by default 215.3 215.3 error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell` 215.3 --> tokenizers-lib/src/models/bpe/trainer.rs:517:47 215.3 | 215.3 513 | let w = &words[*i] as *const _ as *mut _; 215.3 | -------------------------------- casting happend here 215.3 ... 215.3 517 | let word: &mut Word = &mut (*w); 215.3 | ^^^^^^^^^ 215.3 | 215.3 = note: `#[deny(invalid_reference_casting)]` on by default 215.3 215.3 warning: `tokenizers` (lib) generated 6 warnings 215.3 error: could not compile `tokenizers` (lib) due to previous error; 6 warnings emitted 215.3 215.3 Caused by: 215.3 process didn't exit successfully: `/root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="default"' --cfg 'feature="indicatif"' --cfg 'feature="progressbar"' -C metadata=dde566ece3782b43 -C extra-filename=-dde566ece3782b43 --out-dir /tmp/pip-req-build-nxa8r_ow/target/release/deps -L dependency=/tmp/pip-req-build-nxa8r_ow/target/release/deps --extern clap=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libclap-e97ec3243ee04998.rmeta --extern derive_builder=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libderive_builder-41ec09770f2959ba.so --extern esaxx_rs=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libesaxx_rs-f2759c190d1e60f6.rmeta --extern indicatif=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libindicatif-a13f9aa303c115b1.rmeta --extern itertools=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libitertools-2781adac909e0e8d.rmeta --extern lazy_static=/tmp/pip-req-build-nxa8r_ow/target/release/deps/liblazy_static-b3eac7b1efe0daf0.rmeta --extern log=/tmp/pip-req-build-nxa8r_ow/target/release/deps/liblog-e9c072abf79b5d2b.rmeta --extern onig=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libonig-7322b1e79f302581.rmeta --extern rand=/tmp/pip-req-build-nxa8r_ow/target/release/deps/librand-eb8967ca2ff2f601.rmeta --extern rayon=/tmp/pip-req-build-nxa8r_ow/target/release/deps/librayon-06bbb925cd5ab1af.rmeta --extern rayon_cond=/tmp/pip-req-build-nxa8r_ow/target/release/deps/librayon_cond-d5db76508c986330.rmeta --extern regex=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libregex-42ac3f9a5fee7536.rmeta --extern regex_syntax=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libregex_syntax-6d6a76aa7e489183.rmeta --extern serde=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libserde-699614b478bcb51c.rmeta --extern serde_json=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libserde_json-931598b7299b1c2d.rmeta --extern spm_precompiled=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libspm_precompiled-71a4dce0d8e7a388.rmeta --extern unicode_normalization_alignments=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libunicode_normalization_alignments-a9d3428c3ac7b5af.rmeta --extern unicode_segmentation=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libunicode_segmentation-83c854e18f560ee0.rmeta --extern unicode_categories=/tmp/pip-req-build-nxa8r_ow/target/release/deps/libunicode_categories-efa8a5c4f5aee929.rmeta -L native=/tmp/pip-req-build-nxa8r_ow/target/release/build/esaxx-rs-a7fec0442126d010/out -L native=/tmp/pip-req-build-nxa8r_ow/target/release/build/onig_sys-03c260a5c54a327a/out` (exit status: 1) 215.3 warning: build failed, waiting for other jobs to finish... 215.3 error: `cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module --crate-type cdylib --` failed with code 101 215.3 [end of output] 215.3 215.3 note: This error originates from a subprocess, and is likely not a problem with pip. 215.3 ERROR: Failed building wheel for tokenizers 215.3 Failed to build tokenizers 215.3 ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects 215.3 215.3 215.3 at /opt/poetry/venv/lib/python3.11/site-packages/poetry/utils/env.py:1540 in _run 215.4 1536│ output = subprocess.check_output( 215.4 1537│ command, stderr=subprocess.STDOUT, env=env, **kwargs 215.4 1538│ ) 215.4 1539│ except CalledProcessError as e: 215.4 → 1540│ raise EnvCommandError(e, input=input_) 215.4 1541│ 215.4 1542│ return decode(output) 215.4 1543│ 215.4 1544│ def execute(self, bin: str, *args: str, **kwargs: Any) -> int: 215.4 215.4 The following error occurred when trying to handle this error: 215.4 215.4 215.4 PoetryException 215.4 215.4 Failed to install /root/.cache/pypoetry/artifacts/11/c1/65/6a1ee2c3ed75cdc8840c15fb385ec739aedba8424fd6b250657ff16342/tokenizers-0.10.3.tar.gz 215.4 215.4 at /opt/poetry/venv/lib/python3.11/site-packages/poetry/utils/pip.py:58 in pip_install 215.4 54│ 215.4 55│ try: 215.4 56│ return environment.run_pip(*args) 215.4 57│ except EnvCommandError as e: 215.4 → 58│ raise PoetryException(f"Failed to install {path.as_posix()}") from e 215.4 59│ 215.4 ------ Dockerfile:57 -------------------- 55 | FROM builder-image AS dev-image 56 | 57 | >>> RUN poetry install --with main,dev 58 | 59 | FROM builder-image AS runtime-image -------------------- ERROR: failed to solve: process "/bin/sh -c poetry install --with main,dev" did not complete successfully: exit code: 1
2
Answers
This seems to be an issue in the
tokenizers
Rust code that Rust-1.73.0 is sensitive to more than earlier versions of the Rust compiler were. I observed a very similar error when buildingtokenizers-0.12.1
withrust-1.73.0
but the build went through after I downgraded to Rust-1.72.1. I suspect the difference between these Rust versions that’s relevant here is the fix to rust #112431 (the last item of the Rust-1.73.0 release notes, section Language).I am not finding the recommended way to install a concrete version of Rust compiler using their Rustup (or
rustup-init
) tool but perhaps there is an easy way to do that using the package manager of the system you have installed in the container. (Based on the information you have posted, I am not able to determine what OS that is.)tldr;
set environment variable
RUSTUP_TOOLCHAIN
to the version you want to use before building tokenizersDetails
When building the wheel for tokenizers one of the build steps you’ll see in the logs is
which will set the version of the rust-toolchain to match what is set in the build spec, in the tokenizers repo it is set to
stable
. In the build logs you can see that it is using thestable
version of rustc instead of the version set via--default-toolchain
when rust was installedNote the path
stable-x86_64-unknown-linux-gnu/bin/rustc
in the above. From the rustup book, you can supply an override with a higher priority, in this case settingRUSTUP_TOOLCHAIN
is sufficient.Dockerfile
Use the below to setup rust and the rust-toolchain before building the tokenizers wheel