Mimasa

Mimasa Features

User Stories

Below is a list of features that needs to be implemented for Mimasa.

Description

This feature will allow users to create an account on Mimasa and sign in to access the application’s services.

DoD

The acceptance criteria for this feature include creating an account using an email address and password, signing in with the created account, and securely storing the user’s credentials.

2. Upload Details

Description

This feature will allow users to upload a video to the application. This feature allows the user to select a video file from their device, supporting different video file formats, and storing the video file securely.
User should be able to select the language they want to translate the video to

DoD

User should be able to select a video file from their device & provide necessary details and upload it to the application.

3. Translation Request Submission

Description

User can submit a request to translate the video they uploaded.
Video uploaded by user will be processed by Mimasa and the result will be provided later.
The user is notified when the translation is ready.
The user should receive a notification when the translation is ready to download

DoD

User should be able to submit a request to translate the video they uploaded, specifying the language they want it translated to.

4. Video Translation Processing

Description

The application processes the video uploaded by the user to extract audio. The separated audio should be given to Audio Translation
The video should undergo processing, perform facial tracking.
The component should also takes care of pre/post processing.

DoD

The application should accurately extract audio from the video, track facial movements.

Datasets Reference

Facial landmarks dataset: A large dataset of face annotations that can be used for facial tracking and pre/post processing. Example: 300-W dataset, AFLW dataset.
Audio classification dataset: A dataset that can be used to separate the music and speech from the video. Example: AudioSet, UrbanSound8K.

5. Audio Translation Processing

Description

This feature will allow users to separate the music and speech from the video input.
Positions of speech in the audio should be determined.
The extracted speech is translated to user selected language.
The translated text is converted to speech.
Overlay the translated speech with music and prepare the final audio.
The component should also takes care of pre/post processing.

DoD

The acceptance criteria for this feature include using machine learning techniques to separate the music and speech tracks
Providing the user with translated audio synced with background music.

Datasets Reference

Text-to-speech (TTS) dataset: A dataset that can be used to convert translated text to speech. Example: LJSpeech dataset, VCTK corpus.
Speech recognition dataset: A dataset that can be used to extract speech positions in the audio. Example: CommonVoice dataset, VoxCeleb dataset.

6. Emotion Detection & Lip Syncing

Description

The Emotion Detection & Lip Syncing feature will allow users to detect emotions and sync the lip movements in the video based on the given audio.
The implementation of this feature will require the use of computer vision, machine learning, and speech processing technologies.

DoD

The acceptance criteria for this feature include detecting emotions from facial expressions in the video
syncing the lip movements with the audio translation, and providing a more natural and realistic translation experience.

Datasets Reference

Facial expression recognition dataset: A dataset that can be used to detect emotions from facial expressions. Example: EmoReact dataset, AffectNet dataset.
Lip synchronization dataset: A dataset that can be used to sync the lip movements with the audio translation. Example: LRW dataset, GRID dataset.

This site is open source. Improve this page.

Mimasa

Mimasa Features

User Stories

1. User Sign-up & Login

Description

DoD

2. Upload Details

Description

DoD

3. Translation Request Submission

Description

DoD

4. Video Translation Processing

Description

DoD

Datasets Reference

5. Audio Translation Processing

Description

DoD

Datasets Reference

6. Emotion Detection & Lip Syncing

Description

DoD

Datasets Reference