Main Article Content


In today’s era of Computer vision, Imagination can come true. There is a wide range of commercial applications for generating images via text commands. It’s just like you tell your computer to draw an art and your computer does it. This challenging problem has a solution with two staged Generative Adversarial Network (GAN) model. In this paper we propose two staged Generative Adversarial Network to generate a photo realistic image. The text command is the input for the first stage GAN, which outputs a very basic image with almost no resolution. This almost no resolution image undergoes sketch refinement. This output of first stage GAN and text description is fed as an input to the second stage of GAN. Second stage GAN generates high resolution photo realistic image. The second stage GAN rectifies the defective output image of first stage GAN recursively. We use the augmentation technique to ensure the smoothness of the image. Wide ranging experiments shows two staged GAN has a prominent growth on creating photo realistic images based on text description.


Generative Adversarial Network (GAN), Generator, Discriminator, photo realistic images, Augmentation, Stack GAN

Article Details

Author Biographies

Chithra Apoorva D.A, GITAM School of technology, Bengaluru, India

Working as Assistant Professor in the Department of Computer Science & Engineering, GITAM School of Technology, Bengaluru, India

Neetha KS, GITAM School of technology, Bengaluru, India

Working as Assistant Professor in the Department of Computer Science & Engineering, GITAM School of Technology, Bengaluru, India

Dr Brahmananda S H, GITAM School of technology, Bengaluru, India

 Working as Professor in the Department of Computer Science & Engineering, GITAM School of Technology, Bengaluru, India


How to Cite
C. Apoorva D.A, N. KS, D. B. S H, and M. Kumar, “A GAN model to produce Photo realistic Images via text command”, Ausjournal, vol. 1, no. 1, pp. 56-61, Oct. 2019.


    [1] M. Arjovsky and L. Bottou. Towards principled methods for training generative adversarial networks. In ICLR, 2017.
    [2] A. Brock, T. Lim, J. M. Ritchie, and N.Weston. Neural photo editing with introspective adversarial networks. In ICLR, 2017.
    [3] T. Che, Y. Li, A. P. Jacob, Y. Bengio, and W. Li. Mode regularized generative adversarial networks. In ICLR, 2017.
    [4] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, 2016.
    [5] E. L. Denton, S. Chintala, A. Szlam, and R. Fergus. Deep generative image models using a laplacian pyramid of adversarial networks. In NIPS, 2015.
    [6] C. Doersch. Tutorial on variational autoencoders. arXiv:1606.05908, 2016.
    [7] J. Gauthier. Conditional generative adversarial networks for convolutional face generation. Technical report, 2015.
    [8] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014.
    [9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
    [10] X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Belongie. Stacked generative adversarial networks. In CVPR, 2017.
    [11] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
    [12] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. In CVPR, 2017.
    [13] D. P. Kingma and M. Welling. Auto-encoding variational bayes. In ICLR, 2014.
    [14] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. In ICML, 2016.
    [15] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, 2017.
    [16] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollr, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
    [17] E. Mansimov, E. Parisotto, L. J. Ba, and R. Salakhutdinov. Generating images from captions with attention. In ICLR, 2016.
    [18] L. Metz, B. Poole, D. Pfau, and J. Sohl-Dickstein. Unrolled generative adversarial networks. In ICLR, 2017.
    [19] M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv:1411.1784, 2014.
    [20] A. Nguyen, J. Yosinski, Y. Bengio, A. Dosovitskiy, and J. Clune. Plug & play generative networks: Conditional iterative generation of images in latent space. In CVPR, 2017.
    [21] M.-E. Nilsback and A. Zisserman. Automated flower classification over a large number of classes. In ICCVGIP, 2008.
    [22] A. Odena, C. Olah, and J. Shlens. Conditional image synthesis with auxiliary classifier gans. In ICML, 2017.
    [23] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR, 2016.
    [24] S. Reed, Z. Akata, S. Mohan, S. Tenka, B. Schiele, and H. Lee. Learning what and where to draw. In NIPS, 2016.
    [25] S. Reed, Z. Akata, B. Schiele, and H. Lee. Learning deep representations of fine-grained visual descriptions. In CVPR, 2016.
    [26] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text-to-image synthesis. In ICML, 2016.
    [27] S. Reed, A. van den Oord, N. Kalchbrenner, V. Bapst, M. Botvinick, and N. de Freitas. Generating interpretable images with controllable structure. Technical report, 2016.
    [28] D. J. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In ICML, 2014.
    [29] T. Salimans, I. J. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. In NIPS, 2016.
    [30] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR, 2016.
    [31] C. K. Snderby, J. Caballero, L. Theis, W. Shi, and F. Huszar. Amortised map inference for image super-resolution. In ICLR, 2017.
    [32] Y. Taigman, A. Polyak, and L. Wolf. Unsupervised crossdomain image generation. In ICLR, 2017.
    [33] A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu. Pixel recurrent neural networks. In ICML, 2016.
    [34] A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, and K. Kavukcuoglu. Conditional image generation with pixelcnn decoders. In NIPS, 2016.
    [35] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
    [36] X. Wang and A. Gupta. Generative image modeling using style and structure adversarial networks. In ECCV, 2016.
    [37] X. Yan, J. Yang, K. Sohn, and H. Lee. Attribute2image: Conditional image generation from visual attributes. In ECCV, 2016.
    [38] J. Zhao, M. Mathieu, and Y. LeCun. Energy-based generative adversarial network. In ICLR, 2017.
    [39] J. Zhu, P. Kr¨ahenb¨uhl, E. Shechtman, and A. A. Efros. Generative visual manipulation on the natural image manifold. In ECCV, 2016.