Deep Learning Course CSE641 & ECE555

Subject Code :
CSE641-ECE555

DeepLearning(CSE641/ECE555)

Assignment3(5Marks)

GenerativeAdversarialText-to-ImageSynthesis

In this assignment, you will learn about text-to-image synthesis using conditional GANs.A typical GAN has a Generator(G)thattakesrandomnoiseasinputtogeneraterealisticdatasamples(e.g.,imagesoraudioortext)and a Discriminator (D) that acts as a binary classifier, distinguishing between real and generated data.In ConditionalGANs,inputto(G)isconditionedoveradditionalinformation.

Inthisassignment,youhavetotrainaconditionalGANtogenerateimageswhereinputtoTargetGenerator

is conditioned over textual descriptions.In addition, you have to train aSourceEncoder, which will provide learnedrepresentationsasinputto(G)insteadofYoumaytrainthewholesetupinanend-to-endmanneror in parts.For instance, one approach could be knowledge distillation from source encoder to generator.

OverallSetup:

Source Encoder:TakesinputimageandoutputsaAnymodelsizeortype.
Target Generator:Takes representations from the source model and text encoding to generate new samples. ThenumberofparametersshouldbehalfofthatoftheSourceAnymodeltype.
Discriminator: Distinguishes betweenreal and generated data.Any model size or

Rules:

You can use any library to design your
You can use any loss function, coding style, batch size, optimizer or learning rate
You can use any model architecture except modern ones, such as transformer or diffusion-based models.(If you are unsure, please ask & clarify first.)
Youcanusethefollowingasbaserepofordata:https://github.com/aelnouby/Text-to-Image-Synthesis?tab=readme-ov-file
You cannot use any pretrained model/checkpoint, i.e., all parameters in your setup should be trained fromscratch (some random seed).
You have to demonstrate your setup by randomly selecting 20 classes (for the train) and 5 classes (for the test) fromtheOxford-102TextdescriptionsareavailableintheGitHubrepomentionedabove.
Sourceen coder can not use class labels duringYou may use any loss function to make it as discriminative as possible for the real images of all 25 classes.
We will only run & test your code on GoogleYou have a maximum of 200 epochs for training using Colab resources. Time per epoch doesnt matter but it is advisable that the training and testing can be finished within 1 hr (though not mandatory). Hence, choose a resonable model size.
We encourage you to save .ipynb file cell outputs such as plots, visualization, loss/acc logs etc to aid in subjective evaluation component.

[Input]??>[SourceEncoder]??>[Representation]??>[TargetGenerator]??>[GeneratedImage]

[ TextInput]??>[ TextEncoder]??>[ TextEncoding]????????????+

[GeneratedImage]??>[Discriminator]<??>[RealImages]

Deliverables:

We dont need your trained model but a robust code that can replicate your best
Submit a single .ipynb file for this assignment with clean documented code. Beautifully structure your notebook as if you are given a demo tutorial to a 1st year Tech student who can easily follow the steps.
Highlight the innovations (new things),if any,you have used that you believe make your submission stand outand different from the entire class.
There should be two separate sections, one for Training and one for
In Training/Testing, you may use the data loader from the above-mentioned GitHub
In Testing, using the best model checkpoint you have to
- Generate and plot 5 random images from each test class as a grid of(Hint: use diverse unseen text.)
- Plot the 3D-tSNE embedding of Source Encoder on all images from both train and test
- Print in the form of a table: the total number of parameters, number of trainable parameters and model size on disk for encoder, generator and discriminator.

Marking:

Thisassignmentwillnotbefullyauto-graded.Markingwillbemanualwithsubjectiveevaluationsusingthe following components:

Overall structure & cleanliness of submitted code notebook [1 mark]
Successful training of the full GAN model [1 mark]
Discriminative ability of the embeddings from Source encoder [1 mark]
Subjective diversity and quality of generation [1 mark]
Subjective evaluation of innovation in model architecture (including its size and memory footprint) and training paradigm [1 mark]

Download Solution Now

Uploaded By : Nivesh
Posted on : May 26th, 2025
Downloads : 0
Views : 146

Download Solution Now

Choose a Plan

Premium

80 USD

All in Gold, plus:
30-minute live one-to-one session with an expert
- Understanding Marking Rubric
- Understanding task requirements
- Structuring & Formatting
- Referencing & Citing

Most
Popular

Gold

30 50 USD

Get the Full Used Solution
(Solution is already submitted and 100% plagiarised.
Can only be used for reference purposes)

Save 33%

Silver

20 USD

Journals
Peer-Reviewed Articles
Books
Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more

Deep Learning Course CSE641 & ECE555

GenerativeAdversarialText-to-ImageSynthesis

OverallSetup:

Rules:

Deliverables:

Marking:

Download Solution Now

Download Solution Now

Choose a Plan

Premium

Gold

Silver

Request a Call Back