What's Behind Incognito?

Image to Audio Converter

21/08/2024, Borneo - UFTHaq.

INCOGNITO

Incognito is an image to audio converter application project of mine, it is written in C++ and raylib. The name come from the idea that this app will convert image, which will hide image under new shape of data to audio.

So how it works? In this blog, I'll try my best to explain how this algorithm works.

# Basic Theory

It's best to start from a basic understanding of digital image and audio.

Digital Image is a numerical representation of a two-dimensional image. Simply put, the images we see on a computer or cell phone screen are actually made up of millions of tiny dots called pixels. Each pixel has a certain color value which is arranged sequentially to form the image we see.
Example:

This image is actually a collection of dots/pixels like this.

Digital Audio is stored as a sequence of numbers, each representing the amplitude of the sound wave at a particular point in time. The sampling rate determines how often these samples are taken, and the bit depth determines the precision of the amplitude values.
Example:

# Algorithm

After understanding image and audio. Then we can start the process.

1 - Grayscale

The original data in this image generally consists of 3 data, RGB (Red, Green, Blue), each of which has a size of 8 bits (8x3 = 24 bits) or in the PNG type it has 4 data, RGBA (Red, Green, Blue, Alpha ) alpha are transparent values, each of which has a size of 8bit (8x4 = 32bit). Because a regular image has 4 values ​​in 1 pixel, we need to reduce this to just only 1 value to make the calculation process easier. Therefore it is converted to grayscale. Or black and white. This value is only 1 with a size of 8 bits (0-255) to determine the dark and light pixels.

2 - Take Data per Column

For example, it has an array A:
- A = [20, 70, 30, 25, 19, 68, 150, 160, 110, 222]
‍

This can be normalized to give a smaller value from 0 to 1:
- A = [0.08, 0.274, 0.18, 0.09, 0.07, 0.27, 0.59, 0.63, 0.43, 0.87]

This value can be interpreted as the amplitude value of each frequency.
‍

3 - Inverse DFT

When we have signal data in frequency domain form, we can make it into a time domain signal using Inverse Discrete Fourier Transform (IDFT). IDFT process using a library for the Fourier Transform, for example in C/C++ there is FFTW.

The output from IDFT will produce array data in the form of a time domain signal so it can be included in the audio signal array. Then continue with the process to the next column for process 2 and 3 until finished. After all process finished, the audio signal array can be exported to WAV, encoded to FLAC, ACC, MP3, according to your wishes.

Then how do you see the results of Audio files that has been converted by Incognito?

You can see your image again by using the Spectrogram feature on my other project, Tirakat : Musializer++.