External Publication

Interest in preprocessing utilities for multifile model uploads

Hugging Face Forums [Unofficial] June 3, 2026

Oh. My previous comment about Modular Transformers came partly from a somewhat fuzzy memory, and I’m not a Transformers maintainer, so please take this with the appropriate amount of salt. But based on things I had happened to see before, plus what I looked into further using your reply as a starting point, I think one thing has become fairly clear: regardless of the exact implementation strategy,** there does seem to be real demand for solving this broader custom-code packaging problem**:

After looking into this a bit more, I would answer the original “is there demand?” question with a fairly strong yes , but with one important nuance:

The recurring demand seems to be less specifically for “an inliner” as such, and more for complete, reproducible, inspectable custom-code artifacts for trust_remote_code=True models.

An inliner may be one good strategy for that broader problem, especially if the goal is to let authors develop in a normal multifile layout while still publishing something closer to the traditional Transformers single-file / standalone-artifact style.

But I would probably avoid presenting inline as the only correct implementation. It may be more maintainable to frame it as one possible custom-code packaging strategy.

Direct reaction to your points

My direct reaction to your reply is roughly this:

Your point	My current read
`dynamic_utils` / dynamic module loading has serious deficiencies around multifile or recursive imports	This seems plausible. The surrounding issue history suggests that custom-code saving/loading has been fragile for a long time. I would still separate “current loader/save bug with an MRE” from “new opt-in packaging feature”.
An opt-in `save_pretrained` flag would be useful	I agree that this seems like a reasonable user-facing shape. I would maybe phrase it as a possible `custom_code_packaging` strategy rather than committing too early to one exact flag name.
`push_to_hub` should receive the same behavior	That also seems reasonable if the packaging transform is part of the saved artifact contract. Ideally, `push_to_hub` would upload the same complete custom-code artifact that `save_pretrained` creates.
Rebuilding the dynamic import resolver directly may be risky	I agree. An opt-in packaging transform may be safer than changing dynamic-module resolution globally. It avoids making the loader responsible for arbitrary Python project layouts.
The long-term fix might still involve better multi-level import handling	Possibly, but I would treat that as a separate track. One track is “fix current loader/save behavior with concrete MREs”; another is “provide an explicit packaging transform”.
The generated output is human-readable and preserves comments/docstrings	That is one of the strongest arguments for the approach. Reviewability matters a lot for `trust_remote_code=True`, because the artifact is executable code.
CI testing is unclear	I meant something narrower than proving all possible semantic equivalence: regenerate the artifact, compare it with the committed/generated output, then run local/remote/offline load smoke tests.
A DAG gives a clean boundary for inlining	I agree that acyclicity is a very useful support boundary. I would only be cautious about treating DAG-ness alone as a full semantic-equivalence proof in Python.
This is about uploading/saving for the Hub, not necessarily changing the Hub itself	That distinction makes sense. I would frame the problem as producing a complete custom-code artifact for `save_pretrained` / `push_to_hub`, rather than asking the Hub or dynamic loader to support arbitrary Python package layouts.

So my current read is:

Yes, the need is real.

But the strongest framing may be:
  custom-code artifact completeness for trust_remote_code=True models

rather than:
  a multifile inliner, specifically

Why I think the demand is real

There seems to be a long-running pattern of related issues and fixes around this area:

Year	Issue / PR	What it suggests
2022	#15224: Copy of the custom modeling file when saving a model	Users already needed `save_pretrained()` to copy custom modeling files for dynamic-code models.
2022	#20884: Santacoder saved checkpoints missing required .py files	Fine-tuned checkpoints for `trust_remote_code=True` models could be unusable because required code files were missing.
2023	#21008: Make sure dynamic objects can be saved and reloaded	Core Transformers already added fixes so dynamic/custom objects could be saved and reloaded with their code.
2023	#24737: Falcon models saved with save_pretrained no longer get saved with Python files / #24785	Another concrete regression/fix around copying custom Python files during saving.
2023	#27688: Remote code improvements	Broader concerns around `trust_remote_code`, `auto_map`, downstream libraries, and documentation.
2024	#29714: push_to_hub for a trust_remote_code=True model	Users wanted `push_to_hub()` to push all files needed by a custom model, not only weights/config/tokenizer files.
2024	#32923 / #33100	Local-vs-remote custom code behavior affected pipeline registration and AutoClass behavior.
2024	#34855: Offline mode does not work with models requiring trust_remote_code=True	`save_pretrained()` artifacts were not always self-contained enough for offline / fresh-machine loading.
2024	sentence-transformers #2613	Downstream users also need hermetic/offline Docker-style deployments for models requiring remote code.
2025	#36808: Support loading custom code objects in offline mode from local	Ongoing work around fully saving/loading `trust_remote_code=True` custom objects in offline/local settings.
2025	#37716: Fix custom code saving	A major merged PR explicitly aimed at making `save_pretrained()` and `push_to_hub()` correctly save relevant custom modeling files.
2025	#37751: Stop autoconverting custom code checkpoints	Custom-code checkpoints may need special handling in adjacent infrastructure.
2026	#45684: save_pretrained custom model files copied with readonly permissions	Saved custom-code files are touched by post-save tooling, so generated/copied artifacts are a real workflow.
2026	#45698: from_pretrained loads wrong custom module after save_pretrained	Custom module identity/cache/local-source behavior can still be subtle after saving.

So, to me, this looks like a real problem family. It has appeared as:

missing custom .py files after save_pretrained();
missing custom .py files after push_to_hub();
local-vs-remote custom-code inconsistencies;
offline/hermetic deployment failures;
auto_map / _auto_class fragility;
pipeline registration differences;
dynamic-module cache/module identity issues;
relative-import limitations and under-documentation.

That is a fairly strong signal that there is real demand.

How I would frame the core problem

I would probably frame the problem less as:

How do we support arbitrary multifile Python projects on the Hub?

and more as:

How do we produce a complete, reproducible, inspectable custom-code artifact
for `trust_remote_code=True` models, across `save_pretrained()`,
`push_to_hub()`, local loading, remote loading, and offline loading?

That framing seems to connect better with the existing Transformers work.

It also avoids forcing the loader to become a general Python package resolver. Instead, the save/push step could produce an artifact that the dynamic loader already knows how to consume.

Why your inliner idea still seems relevant

The current custom-code machinery already appears to be somewhat artifact-oriented.

The relevant area seems to be dynamic_module_utils.py, especially functions such as:

custom_object_save
get_relative_imports
get_relative_import_files
get_cached_module_file
get_class_in_module

From the current code, custom_object_save() looks like it already saves custom object source files and discovered relative imports into the target folder. It also appears to copy files by basename, which makes the current save path feel closer to a flat artifact than to preserving an arbitrary nested Python package layout.

So I think your proposal can be framed as a natural extension of an existing direction:

Current-ish direction:
  collect custom code files
  copy them into the save/push artifact

Possible inline strategy:
  collect custom code files
  generate one deterministic, inspectable file
  update metadata so AutoClass loading points to that generated file

That does not necessarily fight the single-file philosophy. It may actually align with it:

Authoring:
  modular source tree

Published artifact:
  generated standalone/flat/inspectable custom-code artifact

This is similar in spirit to the broader compromise behind Modular Transformers, though the target layer is different:

Area	Source authoring	Published / consumed artifact
Transformers repo models	`modular_<model>.py` with imports/inheritance	generated standalone `modeling_.py`, `configuration_.py`, etc.
Hub custom code proposal	multifile custom source tree	generated flat or inline artifact for `trust_remote_code=True` loading

I would still be cautious about saying it is “the same thing” as Modular Transformers. It is not. But the design pattern is similar: modular authoring, standalone artifact.

I would present `inline` as one packaging strategy, not the whole proposal

One useful way to make the implementation discussion less binary may be to define a small strategy space:

Strategy	Output artifact	Advantages	Risks / open questions
`current`	Whatever current `save_pretrained()` / `push_to_hub()` produces	Maximum backward compatibility	Existing edge cases remain.
`flat_copy`	Copy discovered `.py` files into the save directory	Close to current `custom_object_save()` behavior	Basename collisions, lost package structure, relative import quirks.
`preserve_package`	Preserve nested package directories	Most Pythonic for authors	More work for dynamic module loading/cache; may conflict with current same-directory assumptions.
`inline`	Generate one standalone `.py` file	Inspectable, single-file-compatible, loader-simple	Semantic equivalence, deterministic generation, source-of-truth questions.
external CLI	Pre-publish generated artifact	Easy to experiment with outside core Transformers	Not standardized; users must wire it into their own publishing flow.

Then your proposal becomes:

Add or experiment with an `inline` custom-code packaging strategy.

rather than:

Replace the current custom-code loader with an inliner.

That seems easier to evaluate.

Possible API shape, very tentatively

I do not know where maintainers would want this to live, so I would treat this as illustrative rather than prescriptive.

Maybe something like:

model.save_pretrained(
    save_directory,
    custom_code_packaging="inline",
)

and eventually:

model.push_to_hub(
    repo_id,
    custom_code_packaging="inline",
)

or perhaps a lower-level utility first:

from transformers.utils import package_custom_code

package_custom_code(
    entry_file="modeling_my_model.py",
    output_file="modeling_my_model_generated.py",
    strategy="inline",
)

I am not saying these are the right API names. The important part is the contract:

Given a custom-code entrypoint and a supported subset of relative imports,
produce a deterministic artifact that can be saved, pushed, inspected,
cached, and loaded.

Possible responsibility boundary

I would be careful here. From the outside, it is tempting to say:

Just add a flag to `save_pretrained()`.

But the recent custom-code saving work appears to touch more than one function. For example, #37716 touched custom-code saving, _auto_map, AutoClass behavior, multiple save/load paths, tests, and docs/docstrings.

So I would phrase the implementation boundary cautiously:

`dynamic_module_utils.custom_object_save()` looks like one plausible hook,
because it already saves custom object source files and updates config-side
metadata for Hub loading.

But I would not claim it is definitely the correct hook. The right abstraction
may need to account for AutoClass behavior, `auto_map`, local-vs-remote loading,
processors/tokenizers/configs, and push-to-hub behavior.

That keeps the proposal helpful without over-prescribing internals.

What I meant by CI / checks

When I mentioned CI, I did not mean:

Prove all possible model behavior is equivalent for all inputs.

I meant a much narrower generated-artifact consistency check:

1. Run the packager/inliner.
2. Compare the generated file with the checked-in generated file.
3. Fail if they differ.
4. Run AutoModel.from_pretrained(<local_saved_dir>, trust_remote_code=True).
5. If practical, also test a Hub-like or remote load path.
6. Optionally compare a tiny forward pass or at least state_dict keys
   between the source and packaged forms.

So the CI input would not need to be arbitrary user inputs. It could start from a tiny toy custom model fixture.

For example:

toy_model/
  configuration_toy.py
  modeling_toy.py
  backbone.py
  modules.py

with:

# modeling_toy.py
from .backbone import ToyBackbone

and:

# backbone.py
from .modules import ToyModule

Then the check could be:

model.save_pretrained(tmpdir)
AutoModel.from_pretrained(tmpdir, trust_remote_code=True)

plus, for the packaging tool specifically:

generate artifact
compare generated artifact with expected artifact
load from generated artifact

That is much narrower than full semantic verification, but still useful.

About DAGs and semantic equivalence

I agree that acyclicity is probably a very good support boundary. If the relative-import graph is cyclic, the packager can clearly reject it.

I would only be cautious about saying that DAG-ness alone proves semantic equivalence in Python.

A DAG means a topological inline order can exist. But Python import behavior can also depend on:

module identity;
import order;
sys.modules;
__name__;
__package__;
__file__;
__all__;
module-level side effects;
optional imports;
try/except import;
TYPE_CHECKING;
wildcard imports;
duplicate names after flattening;
monkey-patching;
importlib;
local-vs-remote cache behavior.

So I would phrase it as:

Acyclic import graph:
  necessary / practical condition for supported inlining

Full semantic equivalence:
  still worth checking with load tests and possibly a tiny forward pass

This does not make the inliner idea weaker. It just makes the support contract more precise.

Why `inline` might be attractive

An inline artifact could have several practical advantages:

Advantage	Why it matters
Fewer dynamic relative imports	The loader has less dependency graph to reconstruct.
More inspectable artifact	Reviewers/users can inspect one generated file.
Closer to single-file philosophy	The final artifact resembles the traditional Transformers model file style.
Better offline/hermetic behavior	The saved directory can contain executable custom code without needing to fetch remote code again.
Easier upload completeness	`push_to_hub()` has fewer files to miss.
Potentially simpler cache invalidation	One deterministic file may be easier to hash than a graph of relative imports.

But these advantages depend on the generated file being deterministic and honest about its origin.

For example, I would expect generated files to include something like:

# This file was automatically generated from a multifile custom-code source tree.
# Do not edit this file manually; edit the source files and regenerate.
# Source root: <source_root>
# Entry point: <entry_file>
# Packaging strategy: inline

and source boundary markers such as:

# ---------------------------------------------------------------------
# BEGIN inlined file: layers/attention.py
# ---------------------------------------------------------------------

...

# ---------------------------------------------------------------------
# END inlined file: layers/attention.py
# ---------------------------------------------------------------------

That would make the artifact more reviewable.

Possible initial supported subset

Something like this may be easier to maintain:

Supported:
  - one custom-code entry file
  - same-repository relative imports
  - acyclic dependency graph
  - normal `from .foo import Bar` imports
  - normal class/function/constant definitions
  - external imports preserved at the top
  - comments/docstrings preserved
  - deterministic output
  - generated source boundary markers
  - clear error messages for unsupported patterns

Unsupported at first:
  - circular imports
  - wildcard relative imports
  - dynamic imports via `importlib`
  - imports outside the source root
  - namespace packages
  - complex module-level side effects
  - ambiguous duplicate symbols
  - package layouts that require runtime package identity

I would not present this as the final design, only as a possible starting point.

Possible tests / MREs

If this becomes a GitHub issue or PR, I think the most useful thing would be to split examples into small reproducible cases.

1. Save artifact completeness

Goal:
  `save_pretrained()` should produce a directory that can be loaded
  without manually copying custom `.py` files.

Minimal layout:

toy_model/
  config.json
  configuration_toy.py
  modeling_toy.py
  helper.py

Import chain:

# modeling_toy.py
from .helper import ToyBlock

Test:

model.save_pretrained(tmpdir)
AutoModel.from_pretrained(tmpdir, trust_remote_code=True)

2. Recursive relative imports

Goal:
  transitive relative imports are either supported, clearly rejected,
  or transformed into a generated artifact.

Minimal layout:

toy_model/
  configuration_toy.py
  modeling_toy.py
  backbone.py
  modules.py

Import chain:

# modeling_toy.py
from .backbone import ToyBackbone



# backbone.py
from .modules import ToyModule

This is close to the kind of issue described in #36653.

3. Nested package layout

Goal:
  decide whether nested subpackages are unsupported, preserved,
  flat-copied, or inlined.

Minimal layout:

toy_model/
  configuration_toy.py
  modeling_toy.py
  layers/
    __init__.py
    attention.py
    rope.py

Import chain:

# modeling_toy.py
from .layers.attention import ToyAttention



# layers/attention.py
from .rope import apply_rope

This would clarify whether the desired behavior is:

preserve package layout

or:

generate a flat/inline artifact

4. Push artifact completeness

Goal:
  `push_to_hub()` should push the same complete custom-code artifact
  that `save_pretrained()` would produce locally.

This is close to #29714, where the issue was that a custom model needed additional files to function properly after push.

5. Offline/hermetic loading

Goal:
  A saved model directory should be usable on a fresh machine in offline mode
  if all required custom code was saved.

This connects to:

#34855
sentence-transformers #2613
#36808

6. Module identity / cache behavior

Goal:
  A saved model should not accidentally load a different local custom module
  with the same filename/class name.

This connects to #45698.

Possible issue split

If this is taken to GitHub, I would probably avoid one giant issue.

Maybe split it like this:

Issue type	Possible title	Purpose
Bug / MRE	`Recursive relative imports are not reliably included for trust_remote_code custom models`	Show current behavior with a minimal failing repo.
Feature request	`Add an opt-in custom-code packaging strategy for save_pretrained / push_to_hub`	Discuss `inline`, `flat_copy`, `preserve_package`, etc.
Docs clarification	`Clarify supported relative-import layouts for Hub custom code`	Explain same-directory imports, nested packages, generated artifacts, and reload tests.
Experimental package	`External custom-code inliner / packager`	Prove the idea before proposing core integration.

That separation may make the discussion easier for maintainers to act on.

Possible venue

I am less certain about the best venue, so I would treat this only as practical guidance, not official routing.

My understanding is:

Place	Probably good for
This Forum thread	Initial context, demand check, design sketch.
transformers-community/support Discussions	Cross-linking a broader Transformers design/API question. It appears to be used for some semi-official community discussions, but I would not call it guaranteed/canonical.
GitHub Issue	Focused bug report or feature request with MRE/API sketch.
GitHub PR	Tests, docs, or implementation once the target behavior is clear.

The transformers-community/support Space seems relevant because there are already broader discussions there, such as:

Custom generate methods discussion
The Transformers Library: standardizing model definitions
With the new multi-backend modular system…

But I would not rely on that as the only path. For concrete bugs and feature requests, GitHub issues are probably still the most actionable place.

My tentative summary

I would summarize the situation like this:

There is real demand, but I would name the demand carefully.

The demand is for complete, reproducible, inspectable custom-code artifacts
for `trust_remote_code=True` models.

Inlining is one possible packaging strategy.

It may be especially attractive because it aligns with the single-file /
standalone-artifact style, reduces relative-import complexity, and can make
the saved/pushed artifact easier to inspect.

But it should probably be presented as an opt-in strategy, not as the only
right design.

The exact implementation hook should be left open for maintainers, though
`dynamic_module_utils.custom_object_save()` looks like a plausible place to
start reading because it already handles saving custom code files and metadata.

So I think your idea is useful, but I would pitch it less as:

Here is a preprocessing script for multifile uploads.

and more as:

Here is a possible opt-in packaging strategy for the broader custom-code
artifact completeness problem that Transformers has already been working on
for several years.

That framing seems both stronger and safer.

Direct reaction to your points

Why I think the demand is real

How I would frame the core problem

Why your inliner idea still seems relevant

I would present inline as one packaging strategy, not the whole proposal

Possible API shape, very tentatively

Possible responsibility boundary

What I meant by CI / checks

About DAGs and semantic equivalence

Why inline might be attractive

Possible initial supported subset

Possible tests / MREs

1. Save artifact completeness

2. Recursive relative imports

3. Nested package layout

4. Push artifact completeness

5. Offline/hermetic loading

6. Module identity / cache behavior

Possible issue split

Possible venue

My tentative summary

Discussion in the ATmosphere

I would present `inline` as one packaging strategy, not the whole proposal

Why `inline` might be attractive