-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[SANA-Video] Adding 5s pre-trained 480p SANA-Video inference #12584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2. add `SanaVideoPipeline` in pipeline_sana_video.py 3. add all code we need for import `SanaVideoPipeline`
2. add reshape function in sana-video-processor; 3. fix convert pth to safetensor bugs;
| return int(default_hw[0]), int(default_hw[1]) | ||
|
|
||
| @staticmethod | ||
| def resize_and_crop_tensor(samples: torch.Tensor, new_width: int, new_height: int) -> torch.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think exposing an interface like VaeImageProcessor.resize:
diffusers/src/diffusers/image_processor.py
Lines 468 to 474 in dcfb18a
| def resize( | |
| self, | |
| image: Union[PIL.Image.Image, np.ndarray, torch.Tensor], | |
| height: int, | |
| width: int, | |
| resize_mode: str = "default", # "default", "fill", "crop" | |
| ) -> Union[PIL.Image.Image, np.ndarray, torch.Tensor]: |
would be more robust, since different video preprocessing pipelines will probably make different choices here. Not blocking, on the diffusers side we can follow up to support more video pipelines here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I would let u guys help to finish this part. Thanks!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Would you be able to add tests and docs? We can help with both, especially the tests, but for the docs it may be harder for us as we are not as familiar with the intricacies of the model.
- Documentation example (Wan): https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/wan.md
- Model tests example (
WanTransformer3DModel): https://github.com/huggingface/diffusers/blob/main/tests/models/transformers/test_models_transformer_wan.py - Pipeline tests example (
WanPipeline): https://github.com/huggingface/diffusers/blob/main/tests/pipelines/wan/test_wan.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2. fix typos;
|
I also added two test cases for you for reference. Please feel free to modify them |
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the follow up changes! I have made some suggestions that should help the Sana Video pipeline tests pass.
Sorry for all the small change requests, but could you also do the following?
- Can you run the following to make sure that the CI code quality check is green?
make style
make quality
make fix-copies- Can you add the new Sana Video markdown docs to
docs/source/en/_toctree.yml? For reference, here is how the Sana pipeline docs were added:This change will help the docs build correctly.diffusers/docs/source/en/_toctree.yml
Lines 562 to 563 in dcfb18a
- local: api/pipelines/sana title: Sana
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
make quality make fix-copies
Done! Let's test it. |
| - local: api/models/sana_video_transformer3d | ||
| title: SanaVideoTransformer3DModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause an error when building the docs since the api/models/sana_video_transformer3d file doesn't currently exist. Could you add a markdown doc for the transformer as well? For reference, here is the documentation for SanaTransformer2DModel: https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/models/sana_transformer2d.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
What does this PR do?
This PR add SANA-Video, a new text/image-to-video model from NVIDIA
Paper
Project
HF weight
Cc: @yiyixuxu @asomoza @sayakpaul
Results:
sana_v2.mp4