Interest in preprocessing utilities for multifile model uploads
Oh. My previous comment about Modular Transformers came partly from a somewhat fuzzy memory, and I’m not a Transformers maintainer, so please take this with the appropriate amount of salt. But based on things I had happened to see before, plus what I looked into further using your reply as a starting point, I think one thing has become fairly clear: regardless of the exact implementation strategy,** there does seem to be real demand for solving this broader custom-code packaging problem**:
After looking into this a bit more, I would answer the original “is there demand?” question with a fairly strong yes , but with one important nuance:
The recurring demand seems to be less specifically for “an inliner” as such, and more for complete, reproducible, inspectable custom-code artifacts for
trust_remote_code=Truemodels.
An inliner may be one good strategy for that broader problem, especially if the goal is to let authors develop in a normal multifile layout while still publishing something closer to the traditional Transformers single-file / standalone-artifact style.
But I would probably avoid presenting inline as the only correct implementation. It may be more maintainable to frame it as one possible custom-code packaging strategy.
Direct reaction to your points
My direct reaction to your reply is roughly this:
| Your point | My current read |
|---|---|
dynamic_utils / dynamic module loading has serious deficiencies around multifile or recursive imports |
This seems plausible. The surrounding issue history suggests that custom-code saving/loading has been fragile for a long time. I would still separate “current loader/save bug with an MRE” from “new opt-in packaging feature”. |
An opt-in save_pretrained flag would be useful |
I agree that this seems like a reasonable user-facing shape. I would maybe phrase it as a possible custom_code_packaging strategy rather than committing too early to one exact flag name. |
push_to_hub should receive the same behavior |
That also seems reasonable if the packaging transform is part of the saved artifact contract. Ideally, push_to_hub would upload the same complete custom-code artifact that save_pretrained creates. |
| Rebuilding the dynamic import resolver directly may be risky | I agree. An opt-in packaging transform may be safer than changing dynamic-module resolution globally. It avoids making the loader responsible for arbitrary Python project layouts. |
| The long-term fix might still involve better multi-level import handling | Possibly, but I would treat that as a separate track. One track is “fix current loader/save behavior with concrete MREs”; another is “provide an explicit packaging transform”. |
| The generated output is human-readable and preserves comments/docstrings | That is one of the strongest arguments for the approach. Reviewability matters a lot for trust_remote_code=True, because the artifact is executable code. |
| CI testing is unclear | I meant something narrower than proving all possible semantic equivalence: regenerate the artifact, compare it with the committed/generated output, then run local/remote/offline load smoke tests. |
| A DAG gives a clean boundary for inlining | I agree that acyclicity is a very useful support boundary. I would only be cautious about treating DAG-ness alone as a full semantic-equivalence proof in Python. |
| This is about uploading/saving for the Hub, not necessarily changing the Hub itself | That distinction makes sense. I would frame the problem as producing a complete custom-code artifact for save_pretrained / push_to_hub, rather than asking the Hub or dynamic loader to support arbitrary Python package layouts. |
So my current read is:
Yes, the need is real.
But the strongest framing may be:
custom-code artifact completeness for trust_remote_code=True models
rather than:
a multifile inliner, specifically
Why I think the demand is real
There seems to be a long-running pattern of related issues and fixes around this area:
| Year | Issue / PR | What it suggests |
|---|---|---|
| 2022 | #15224: Copy of the custom modeling file when saving a model | Users already needed save_pretrained() to copy custom modeling files for dynamic-code models. |
| 2022 | #20884: Santacoder saved checkpoints missing required .py files | Fine-tuned checkpoints for trust_remote_code=True models could be unusable because required code files were missing. |
| 2023 | #21008: Make sure dynamic objects can be saved and reloaded | Core Transformers already added fixes so dynamic/custom objects could be saved and reloaded with their code. |
| 2023 | #24737: Falcon models saved with save_pretrained no longer get saved with Python files / #24785 | Another concrete regression/fix around copying custom Python files during saving. |
| 2023 | #27688: Remote code improvements | Broader concerns around trust_remote_code, auto_map, downstream libraries, and documentation. |
| 2024 | #29714: push_to_hub for a trust_remote_code=True model | Users wanted push_to_hub() to push all files needed by a custom model, not only weights/config/tokenizer files. |
| 2024 | #32923 / #33100 | Local-vs-remote custom code behavior affected pipeline registration and AutoClass behavior. |
| 2024 | #34855: Offline mode does not work with models requiring trust_remote_code=True | save_pretrained() artifacts were not always self-contained enough for offline / fresh-machine loading. |
| 2024 | sentence-transformers #2613 | Downstream users also need hermetic/offline Docker-style deployments for models requiring remote code. |
| 2025 | #36808: Support loading custom code objects in offline mode from local | Ongoing work around fully saving/loading trust_remote_code=True custom objects in offline/local settings. |
| 2025 | #37716: Fix custom code saving | A major merged PR explicitly aimed at making save_pretrained() and push_to_hub() correctly save relevant custom modeling files. |
| 2025 | #37751: Stop autoconverting custom code checkpoints | Custom-code checkpoints may need special handling in adjacent infrastructure. |
| 2026 | #45684: save_pretrained custom model files copied with readonly permissions | Saved custom-code files are touched by post-save tooling, so generated/copied artifacts are a real workflow. |
| 2026 | #45698: from_pretrained loads wrong custom module after save_pretrained | Custom module identity/cache/local-source behavior can still be subtle after saving. |
So, to me, this looks like a real problem family. It has appeared as:
- missing custom
.pyfiles aftersave_pretrained(); - missing custom
.pyfiles afterpush_to_hub(); - local-vs-remote custom-code inconsistencies;
- offline/hermetic deployment failures;
auto_map/_auto_classfragility;- pipeline registration differences;
- dynamic-module cache/module identity issues;
- relative-import limitations and under-documentation.
That is a fairly strong signal that there is real demand.
How I would frame the core problem
I would probably frame the problem less as:
How do we support arbitrary multifile Python projects on the Hub?
and more as:
How do we produce a complete, reproducible, inspectable custom-code artifact
for `trust_remote_code=True` models, across `save_pretrained()`,
`push_to_hub()`, local loading, remote loading, and offline loading?
That framing seems to connect better with the existing Transformers work.
It also avoids forcing the loader to become a general Python package resolver. Instead, the save/push step could produce an artifact that the dynamic loader already knows how to consume.
Why your inliner idea still seems relevant
The current custom-code machinery already appears to be somewhat artifact-oriented.
The relevant area seems to be dynamic_module_utils.py, especially functions such as:
custom_object_saveget_relative_importsget_relative_import_filesget_cached_module_fileget_class_in_module
From the current code, custom_object_save() looks like it already saves custom object source files and discovered relative imports into the target folder. It also appears to copy files by basename, which makes the current save path feel closer to a flat artifact than to preserving an arbitrary nested Python package layout.
So I think your proposal can be framed as a natural extension of an existing direction:
Current-ish direction:
collect custom code files
copy them into the save/push artifact
Possible inline strategy:
collect custom code files
generate one deterministic, inspectable file
update metadata so AutoClass loading points to that generated file
That does not necessarily fight the single-file philosophy. It may actually align with it:
Authoring:
modular source tree
Published artifact:
generated standalone/flat/inspectable custom-code artifact
This is similar in spirit to the broader compromise behind Modular Transformers, though the target layer is different:
| Area | Source authoring | Published / consumed artifact |
|---|---|---|
| Transformers repo models | modular_<model>.py with imports/inheritance |
generated standalone modeling_*.py, configuration_*.py, etc. |
| Hub custom code proposal | multifile custom source tree | generated flat or inline artifact for trust_remote_code=True loading |
I would still be cautious about saying it is “the same thing” as Modular Transformers. It is not. But the design pattern is similar: modular authoring, standalone artifact.
I would present inline as one packaging strategy, not the whole proposal
One useful way to make the implementation discussion less binary may be to define a small strategy space:
| Strategy | Output artifact | Advantages | Risks / open questions |
|---|---|---|---|
current |
Whatever current save_pretrained() / push_to_hub() produces |
Maximum backward compatibility | Existing edge cases remain. |
flat_copy |
Copy discovered .py files into the save directory |
Close to current custom_object_save() behavior |
Basename collisions, lost package structure, relative import quirks. |
preserve_package |
Preserve nested package directories | Most Pythonic for authors | More work for dynamic module loading/cache; may conflict with current same-directory assumptions. |
inline |
Generate one standalone .py file |
Inspectable, single-file-compatible, loader-simple | Semantic equivalence, deterministic generation, source-of-truth questions. |
| external CLI | Pre-publish generated artifact | Easy to experiment with outside core Transformers | Not standardized; users must wire it into their own publishing flow. |
Then your proposal becomes:
Add or experiment with an `inline` custom-code packaging strategy.
rather than:
Replace the current custom-code loader with an inliner.
That seems easier to evaluate.
Possible API shape, very tentatively
I do not know where maintainers would want this to live, so I would treat this as illustrative rather than prescriptive.
Maybe something like:
model.save_pretrained(
save_directory,
custom_code_packaging="inline",
)
and eventually:
model.push_to_hub(
repo_id,
custom_code_packaging="inline",
)
or perhaps a lower-level utility first:
from transformers.utils import package_custom_code
package_custom_code(
entry_file="modeling_my_model.py",
output_file="modeling_my_model_generated.py",
strategy="inline",
)
I am not saying these are the right API names. The important part is the contract:
Given a custom-code entrypoint and a supported subset of relative imports,
produce a deterministic artifact that can be saved, pushed, inspected,
cached, and loaded.
Possible responsibility boundary
I would be careful here. From the outside, it is tempting to say:
Just add a flag to `save_pretrained()`.
But the recent custom-code saving work appears to touch more than one function. For example, #37716 touched custom-code saving, _auto_map, AutoClass behavior, multiple save/load paths, tests, and docs/docstrings.
So I would phrase the implementation boundary cautiously:
`dynamic_module_utils.custom_object_save()` looks like one plausible hook,
because it already saves custom object source files and updates config-side
metadata for Hub loading.
But I would not claim it is definitely the correct hook. The right abstraction
may need to account for AutoClass behavior, `auto_map`, local-vs-remote loading,
processors/tokenizers/configs, and push-to-hub behavior.
That keeps the proposal helpful without over-prescribing internals.
What I meant by CI / checks
When I mentioned CI, I did not mean:
Prove all possible model behavior is equivalent for all inputs.
I meant a much narrower generated-artifact consistency check:
1. Run the packager/inliner.
2. Compare the generated file with the checked-in generated file.
3. Fail if they differ.
4. Run AutoModel.from_pretrained(<local_saved_dir>, trust_remote_code=True).
5. If practical, also test a Hub-like or remote load path.
6. Optionally compare a tiny forward pass or at least state_dict keys
between the source and packaged forms.
So the CI input would not need to be arbitrary user inputs. It could start from a tiny toy custom model fixture.
For example:
toy_model/
configuration_toy.py
modeling_toy.py
backbone.py
modules.py
with:
# modeling_toy.py
from .backbone import ToyBackbone
and:
# backbone.py
from .modules import ToyModule
Then the check could be:
model.save_pretrained(tmpdir)
AutoModel.from_pretrained(tmpdir, trust_remote_code=True)
plus, for the packaging tool specifically:
generate artifact
compare generated artifact with expected artifact
load from generated artifact
That is much narrower than full semantic verification, but still useful.
About DAGs and semantic equivalence
I agree that acyclicity is probably a very good support boundary. If the relative-import graph is cyclic, the packager can clearly reject it.
I would only be cautious about saying that DAG-ness alone proves semantic equivalence in Python.
A DAG means a topological inline order can exist. But Python import behavior can also depend on:
- module identity;
- import order;
sys.modules;__name__;__package__;__file__;__all__;- module-level side effects;
- optional imports;
try/except import;TYPE_CHECKING;- wildcard imports;
- duplicate names after flattening;
- monkey-patching;
importlib;- local-vs-remote cache behavior.
So I would phrase it as:
Acyclic import graph:
necessary / practical condition for supported inlining
Full semantic equivalence:
still worth checking with load tests and possibly a tiny forward pass
This does not make the inliner idea weaker. It just makes the support contract more precise.
Why inline might be attractive
An inline artifact could have several practical advantages:
| Advantage | Why it matters |
|---|---|
| Fewer dynamic relative imports | The loader has less dependency graph to reconstruct. |
| More inspectable artifact | Reviewers/users can inspect one generated file. |
| Closer to single-file philosophy | The final artifact resembles the traditional Transformers model file style. |
| Better offline/hermetic behavior | The saved directory can contain executable custom code without needing to fetch remote code again. |
| Easier upload completeness | push_to_hub() has fewer files to miss. |
| Potentially simpler cache invalidation | One deterministic file may be easier to hash than a graph of relative imports. |
But these advantages depend on the generated file being deterministic and honest about its origin.
For example, I would expect generated files to include something like:
# This file was automatically generated from a multifile custom-code source tree.
# Do not edit this file manually; edit the source files and regenerate.
# Source root: <source_root>
# Entry point: <entry_file>
# Packaging strategy: inline
and source boundary markers such as:
# ---------------------------------------------------------------------
# BEGIN inlined file: layers/attention.py
# ---------------------------------------------------------------------
...
# ---------------------------------------------------------------------
# END inlined file: layers/attention.py
# ---------------------------------------------------------------------
That would make the artifact more reviewable.
Possible initial supported subset
Something like this may be easier to maintain:
Supported:
- one custom-code entry file
- same-repository relative imports
- acyclic dependency graph
- normal `from .foo import Bar` imports
- normal class/function/constant definitions
- external imports preserved at the top
- comments/docstrings preserved
- deterministic output
- generated source boundary markers
- clear error messages for unsupported patterns
Unsupported at first:
- circular imports
- wildcard relative imports
- dynamic imports via `importlib`
- imports outside the source root
- namespace packages
- complex module-level side effects
- ambiguous duplicate symbols
- package layouts that require runtime package identity
I would not present this as the final design, only as a possible starting point.
Possible tests / MREs
If this becomes a GitHub issue or PR, I think the most useful thing would be to split examples into small reproducible cases.
1. Save artifact completeness
Goal:
`save_pretrained()` should produce a directory that can be loaded
without manually copying custom `.py` files.
Minimal layout:
toy_model/
config.json
configuration_toy.py
modeling_toy.py
helper.py
Import chain:
# modeling_toy.py
from .helper import ToyBlock
Test:
model.save_pretrained(tmpdir)
AutoModel.from_pretrained(tmpdir, trust_remote_code=True)
2. Recursive relative imports
Goal:
transitive relative imports are either supported, clearly rejected,
or transformed into a generated artifact.
Minimal layout:
toy_model/
configuration_toy.py
modeling_toy.py
backbone.py
modules.py
Import chain:
# modeling_toy.py
from .backbone import ToyBackbone
# backbone.py
from .modules import ToyModule
This is close to the kind of issue described in #36653.
3. Nested package layout
Goal:
decide whether nested subpackages are unsupported, preserved,
flat-copied, or inlined.
Minimal layout:
toy_model/
configuration_toy.py
modeling_toy.py
layers/
__init__.py
attention.py
rope.py
Import chain:
# modeling_toy.py
from .layers.attention import ToyAttention
# layers/attention.py
from .rope import apply_rope
This would clarify whether the desired behavior is:
preserve package layout
or:
generate a flat/inline artifact
4. Push artifact completeness
Goal:
`push_to_hub()` should push the same complete custom-code artifact
that `save_pretrained()` would produce locally.
This is close to #29714, where the issue was that a custom model needed additional files to function properly after push.
5. Offline/hermetic loading
Goal:
A saved model directory should be usable on a fresh machine in offline mode
if all required custom code was saved.
This connects to:
- #34855
- sentence-transformers #2613
- #36808
6. Module identity / cache behavior
Goal:
A saved model should not accidentally load a different local custom module
with the same filename/class name.
This connects to #45698.
Possible issue split
If this is taken to GitHub, I would probably avoid one giant issue.
Maybe split it like this:
| Issue type | Possible title | Purpose |
|---|---|---|
| Bug / MRE | Recursive relative imports are not reliably included for trust_remote_code custom models |
Show current behavior with a minimal failing repo. |
| Feature request | Add an opt-in custom-code packaging strategy for save_pretrained / push_to_hub |
Discuss inline, flat_copy, preserve_package, etc. |
| Docs clarification | Clarify supported relative-import layouts for Hub custom code |
Explain same-directory imports, nested packages, generated artifacts, and reload tests. |
| Experimental package | External custom-code inliner / packager |
Prove the idea before proposing core integration. |
That separation may make the discussion easier for maintainers to act on.
Possible venue
I am less certain about the best venue, so I would treat this only as practical guidance, not official routing.
My understanding is:
| Place | Probably good for |
|---|---|
| This Forum thread | Initial context, demand check, design sketch. |
| transformers-community/support Discussions | Cross-linking a broader Transformers design/API question. It appears to be used for some semi-official community discussions, but I would not call it guaranteed/canonical. |
| GitHub Issue | Focused bug report or feature request with MRE/API sketch. |
| GitHub PR | Tests, docs, or implementation once the target behavior is clear. |
The transformers-community/support Space seems relevant because there are already broader discussions there, such as:
- Custom generate methods discussion
- The Transformers Library: standardizing model definitions
- With the new multi-backend modular system…
But I would not rely on that as the only path. For concrete bugs and feature requests, GitHub issues are probably still the most actionable place.
My tentative summary
I would summarize the situation like this:
There is real demand, but I would name the demand carefully.
The demand is for complete, reproducible, inspectable custom-code artifacts
for `trust_remote_code=True` models.
Inlining is one possible packaging strategy.
It may be especially attractive because it aligns with the single-file /
standalone-artifact style, reduces relative-import complexity, and can make
the saved/pushed artifact easier to inspect.
But it should probably be presented as an opt-in strategy, not as the only
right design.
The exact implementation hook should be left open for maintainers, though
`dynamic_module_utils.custom_object_save()` looks like a plausible place to
start reading because it already handles saving custom code files and metadata.
So I think your idea is useful, but I would pitch it less as:
Here is a preprocessing script for multifile uploads.
and more as:
Here is a possible opt-in packaging strategy for the broader custom-code
artifact completeness problem that Transformers has already been working on
for several years.
That framing seems both stronger and safer.
Discussion in the ATmosphere