Dhvani Documentation

This page explains dhvani C/C++ APIs and sample uses.

Getting the dhvani version information

Let us start with a very simple API use- Just to get the dhvani version information in a C/C++ code.


#include <dhvani/dhvani_lib.h>
int
main (int argc, char *argv[])
{
	printf ("%s\n", dhvani_Info());
	return 0;
}

The code is simple and need no explanation. dhvani/dhvani_lib.h should be included in source files to use dhvani APIs.And if the above filename is version.c , here is the way to compile it

gcc -ldhvani -o version version.c

Here we used -ldhvani , since dhvani is the shared library name.

const char *dhvani_Info();

is the function definition

Printing the list of supported languages

#include <dhvani/dhvani_lib.h>
int
main (int argc, char *argv[])
{
  printf ("%s", dhvani_ListLanguage ());
  return 0;
}

Now we will move to something that actually speak.

Reading a sentence

A sample code snippet that reads a hindi sentence is given below

#include <dhvani/dhvani_lib.h>
int
main (int argc, char *argv[])
{
  dhvani_options *dhvani;
  dhvani = dhvani_init ();
  dhvani_say ("??????", dhvani);
  dhvani_close ();
 return 0;
}

Here we had to initialize the synthesizer with dhvani_init() function. It returns a handle to the dhvani APIs and is of type dhvani_options structure. Here is the definition of dhvani_options

typedef struct {
    struct dhvani_VOICE *voice; /* not used now.. for future use.*/
    float pitch;
    float tempo;
    int rate;
    dhvani_Languages language;
    int output_file_format;
    int isPhonetic;
    int speech_to_file;
    char* output_file_name;
    t_dhvani_synth_callback* synth_callback_fn;
    t_dhvani_audio_callback* audio_callback_fn;
} dhvani_options;

The syntax for dhvani _say is as follows. It reads the string given as the first argument

dhvani_ERROR dhvani_say(char *, dhvani_options*);

The return type is an enumeration dhvani_ERROR and it is defined like this

typedef enum {
    DHVANI_OK = 0,
    DHVANI_INTERNAL_ERROR = -1
} dhvani_ERROR;

Reading a File

The following example explains how to read a file using dhvani_speak_file API. It takes two arguments , first one - the file descriptor, and the second one is the handle.

#include <dhvani/dhvani_lib.h>
int
main (int argc, char *argv[])
{
  FILE *fd = fopen ("test.txt", "r"); /*open the file*/
  dhvani_options *dhvani;
  dhvani = dhvani_init (); /*Initialize the synthesizer*/
  dhvani_speak_file (fd, dhvani); /*read the file*/
  dhvani_close (); /*close the synthesizer*/
  return 0;
}

Saving the speech to a file

The following example explains how to save the generated speech to an output file using dhvani_speak_file API. Before calling dhvani_speak_file, we need to set the speech to file flag , output file format and file name in the dhvani_options structure.

#include <dhvani/dhvani_lib.h>
int
main (int argc, char *argv[])
{
  FILE *fd = fopen ("test.txt", "r"); /*open the file*/
  dhvani_options *dhvani;
  dhvani = dhvani_init (); /*Initialize the synthesizer*/
  dhvani->speech_to_file = 1;
  dhvani->output_file_format = DHVANI_OGG_FORMAT; /*set the output file format*
  dhvani->output_file_name = "test.ogg"; /*The output file name*/
  dhvani_speak_file (fd, dhvani); /*read the file*/
  dhvani_close (); /*close the synthesizer*/
  return 0;
}

Note:The dhvani_say, dhvani_speak_file functions are synchronous, that means, those functions will return only after the input file or sentence is read completely or generated speech is saved to the file. There is no way to stop in between. Anyway there are callbacks that can be used to stop, pause the speech. See the callbacks section in this page.

Setting the pitch and tempo/speed of the speech

Before calling any speech synthesis functions, set the pitch and/or speed property to the dhvani_options structure. An example is given below

#include <dhvani/dhvani_lib.h>
int
main (int argc, char *argv[])
{
  FILE *fd = fopen ("test.txt", "r"); /*open the file*/
  dhvani_options *dhvani;
  dhvani = dhvani_init (); /*Initialize the synthesizer*/
  dhvani->speech_to_file = 1;
  dhvani->output_file_format = DHVANI_OGG_FORMAT; /*set the output file format*
  dhvani->output_file_name = "test.ogg"; /*The output file name*/
  dhvani->tempo = -10.0; /*set the tempo - the speed of the speech*/
  dhvani->pitch = 5.0; /*set the pitch*/
  dhvani->rate = 16000; /*set the rate . 16000 Hz is the default*/
  dhvani_speak_file (fd, dhvani); /*read the file*/
  dhvani_close (); /*close the synthesizer*/
  return 0;
}

It is possible to use different pitch or tempo in the same program. The speech synthesizer will use the option value present in the options structure. ie, the following code

  dhvani->pitch = 3.0; /*set the pitch to a female voice*/
  dhvani_say("നമസ്കാരം, സുഖം തന്നെയല്ലേ",  dhvani);
  dhvani->pitch = -1.0; /*set the pitch to a male voice*/
  dhvani_say("സുഖം തന്നെ",  dhvani);

will read the sentences in different pitches, one after other.

Setting the language to be used

Dhvani will detect the language automatically based on the algorithm given here But it is also possible to set the language to override this logic.

  dhvani_->language = MALAYALAM;
  dhvani_say("സുഖം തന്നെ",  dhvani);

The languages are defined in an enum defined in dhvani_lib.h. It is as follows. When we add a new language support , one entry should be added here.

typedef enum  {
    HINDI = 1,
    MALAYALAM = 2,
    TAMIL = 3,
    KANNADA = 4,
    ORIYA = 5,
    PANJABI = 6,
    GUJARATI = 7,
    TELUGU = 8,
    BENGALI = 9,
    MARATHI = 10,
    PASHTHO =11
} dhvani_Languages;

Callbacks

Dhvani provides two callback APIs to be used in the calling program to synchronize with the speech synthesizer.

SynthCallback: Must be called before any synthesis functions are called. This specifies a function in the calling program which is called when a word(after tokenizing a sentence) is finished processing. The location of the synthesizer in the input is returned. The callback function is of the form:
```
int SynthCallback(int text_position);
```
text_position: the number of characters from the start of the text that has been finished.
Callback returns: 0=continue synthesis, 1=abort synthesis.
Audio Callback: Must be called before any synthesis functions are called. This specifies a function in the calling program which is called when a buffer of speech sound data has been produced. The callback function is of the form:
```
int SynthCallback(short *wav, int numsamples);
```
wav: is the speech sound data which has been produced. NULL indicates that the synthesis has been completed.
numsamples: is the number of entries in wav. This number may vary, may be less than the value implied by the buflength parameter given in espeak_Initialize, and may sometimes be zero (which does NOT indicate end of synthesis).
Callback returns: 0=continue synthesis, 1=abort synthesis.

Examples:

int
synthcallback ( int text_position)
{
  printf ("Reached %d\n", text_position);
  return 0;
}

And this function is set by:
 dhvani->synth_callback_fn = synthcallback;

Another example showing audio callback to use the speech generated

int
audio_callback (short *data, int length)
{
  printf ("got the call %d \n", length);
  if (length > 0)
    {
	  fwrite (data, sizeof (short), length, out); /*writing to a file..*/
    }
  return 0;
}
This callback function can be set as follows:
  dhvani->audio_callback_fn = audio_callback;

It is also possible to set the callback functions using any of the functions

void dhvani_set_audio_callback(t_dhvani_audio_callback*,   dhvani_options *);
or
void dhvani_set_synth_callback(t_dhvani_synth_callback*,   dhvani_options *);