Projects

- Google Gemini Cookbook
A simple tool for frontend developers to get boilerplate code for their UI designs. Just upload the UI Image and generate HTML CSS code for it. Launched it as a free tool in December '23 and scaled it to 2k+ free users then later relaunched as a paid product and scaled it upto 200$ MRR within 6 months. All organic traffic via SEO.

An attempt towards helping students improve their problem solving skills using AI. We made a online IDE where users can write and execute code. Whenever they are stuck, they can use Hints & Explanations to understand what the problem is, what errors they are making and how they should go about solving this problem. This way we avoid providing them with AI generated code solutions directly and help them think and build their logic to solve the problems.

An application that allows the users to visualize mulitple spectral bands of data of the Satellite INSAT-3D (although, you can use this tool for visualizing literally any satellite data which use the formats HDF5/Geotiffs). Worked on creating a pipeline to convert the HDF5 Data to Geotiffs which in turn would be further optimized for the Cloud as Cloud Optimized Geotiffs (COGs). This way the users could easily visualize the data of different spectral bands much more convienently in their browser. All of the data was selectively streamed and downloaded allowing for more efficient data utilization. Made this for a hackathon so was a little scrappy solution, you need to manually run the script for the pipeline once, after which all the COGs were available to view.

I built this project for generating transcripts from audio. It was going to be a project to help podcasters repurpose their content using AI by creating blogs to using these transcripts for generating snippets from the podcast. I didn’t go through with repurposing the content implementation. Although, thanks to this project I explored more about WebGPUs and how we can utilize the user’s computer power for performing compute intensive task thereby reducing the costs on our backends. This way the transcription service can be provided for 0 costs as we can run models like OpenAI/Whisper in the user’s browser.
Built a Image Generator using Stable Diffusion 1.5 Model and fine tuned it using Dreambooth and LoRA on Custom Styles like Vectors Arts, Storybook, Lowpoly, etc and then upscaled the generated results using realESRGAN. I was able to get good results using LoRA but not as much with Dreambooth after finetuning. Also, I worked on creating a simple prompt suggestor which would suggest keywords according to the Styles to help improve the quality of the images generated. It used Replicate API to access these models and help keep the costs low using the serverless architecture of Replicate.
