diff_months: 4

Deep Learning Course CSE641 & ECE555

Flat 50% Off Order New Solution
Added on: 2025-05-26 09:44:35
Order Code: LD523620
Question Task Id: 0
  • Subject Code :

    CSE641-ECE555

DeepLearning(CSE641/ECE555)

Assignment3(5Marks)

GenerativeAdversarialText-to-ImageSynthesis

In this assignment, you will learn about text-to-image synthesis using conditional GANs.A typical GAN has a Generator(G)thattakesrandomnoiseasinputtogeneraterealisticdatasamples(e.g.,imagesoraudioortext)and a Discriminator (D) that acts as a binary classifier, distinguishing between real and generated data.In ConditionalGANs,inputto(G)isconditionedoveradditionalinformation.

Inthisassignment,youhavetotrainaconditionalGANtogenerateimageswhereinputtoTargetGenerator

  • is conditioned over textual descriptions.In addition, you have to train aSourceEncoder, which will provide learnedrepresentationsasinputto(G)insteadofYoumaytrainthewholesetupinanend-to-endmanneror in parts.For instance, one approach could be knowledge distillation from source encoder to generator.

OverallSetup:

  1. Source Encoder:TakesinputimageandoutputsaAnymodelsizeortype.
  2. Target Generator:Takes representations from the source model and text encoding to generate new samples. ThenumberofparametersshouldbehalfofthatoftheSourceAnymodeltype.
  3. Discriminator: Distinguishes betweenreal and generated data.Any model size or

Rules:

  1. You can use any library to design your
  2. You can use any loss function, coding style, batch size, optimizer or learning rate
  3. You can use any model architecture except modern ones, such as transformer or diffusion-based models.(If you are unsure, please ask & clarify first.)
  4. Youcanusethefollowingasbaserepofordata:https://github.com/aelnouby/Text-to-Image-Synthesis?tab=readme-ov-file
  5. You cannot use any pretrained model/checkpoint, i.e., all parameters in your setup should be trained fromscratch (some random seed).
  6. You have to demonstrate your setup by randomly selecting 20 classes (for the train) and 5 classes (for the test) fromtheOxford-102TextdescriptionsareavailableintheGitHubrepomentionedabove.
  7. Sourceen coder can not use class labels duringYou may use any loss function to make it as discriminative as possible for the real images of all 25 classes.
  8. We will only run & test your code on GoogleYou have a maximum of 200 epochs for training using Colab resources. Time per epoch doesnt matter but it is advisable that the training and testing can be finished within 1 hr (though not mandatory). Hence, choose a resonable model size.
  9. We encourage you to save .ipynb file cell outputs such as plots, visualization, loss/acc logs etc to aid in subjective evaluation component.

[Input]??>[SourceEncoder]??>[Representation]??>[TargetGenerator]??>[GeneratedImage]

|

[ TextInput]??>[ TextEncoder]??>[ TextEncoding]????????????+

[GeneratedImage]??>[Discriminator]<??>[RealImages]

Deliverables:

  1. We dont need your trained model but a robust code that can replicate your best
  2. Submit a single .ipynb file for this assignment with clean documented code. Beautifully structure your notebook as if you are given a demo tutorial to a 1st year Tech student who can easily follow the steps.
  3. Highlight the innovations (new things),if any,you have used that you believe make your submission stand outand different from the entire class.
  4. There should be two separate sections, one for Training and one for
  5. In Training/Testing, you may use the data loader from the above-mentioned GitHub
  6. In Testing, using the best model checkpoint you have to
    • Generate and plot 5 random images from each test class as a grid of(Hint: use diverse unseen text.)
    • Plot the 3D-tSNE embedding of Source Encoder on all images from both train and test
    • Print in the form of a table: the total number of parameters, number of trainable parameters and model size on disk for encoder, generator and discriminator.

Marking:

Thisassignmentwillnotbefullyauto-graded.Markingwillbemanualwithsubjectiveevaluationsusingthe following components:

  1. Overall structure & cleanliness of submitted code notebook [1 mark]
  2. Successful training of the full GAN model [1 mark]
  3. Discriminative ability of the embeddings from Source encoder [1 mark]
  4. Subjective diversity and quality of generation [1 mark]
  5. Subjective evaluation of innovation in model architecture (including its size and memory footprint) and training paradigm [1 mark]
  • Uploaded By : Nivesh
  • Posted on : May 26th, 2025
  • Downloads : 0
  • Views : 81

Order New Solution

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more