Speech Analytics

Marsview Speech Analytics is a cloud-hosted or containerized API service that helps you accurately transcribe a conversation and discover insights. It is packed with models for automatic speech recognition (ASR), Tone Analyzer, Natural Language Classifiers to uncover topics, keywords, entities and sentiments.

Get API Key View API Docs


  • What can Speech Analytics do?

    Speech analytics software helps mine and analyze audio data, detecting things like emotion, tone and stress in a customer's voice; the reason for the call; satisfaction; the products mentioned; and more. Speech analytics tools can also identify if a customer is getting upset or frustrated.

  • How does Speech Analytics help improve customer experience?

    Detect things like emotion, tone and stress in a customer's voice; the reason for the call; satisfaction; the products mentioned
    Adapt to customer’s sentiments in real time or improve after the fact
    Identify customers at risk of churning and retain them
    Gather insights to improve NPS, CSAT and CES scores
    Use call transcripts for compliance and documentation
    Listen to your customers - it pays!

  • What is the Marsview API platform?

    Marsview conversation self-service API platform offers a comprehensive suite of proprietary APIs and developer tools for automatic speech recognition, speaker separation, multi-modal emotion and sentiment recognition, intent recognition, time-sequenced visual recognition, and more. Designed for the demanding Call Center environments (CCAI) that handle millions of outbound and inbound sales and support calls.
    Marsview APIs provide end-to-end workflows from call listening, recording, insights generation, and Voice of Customer Insights. Conversation APIs are also used in one-on-one to many-to-many conversations and meetings to automatically generate rich contextual feedback, key topics, moments, actions, Q&A, and summaries.



Asynchronous files or recordings

We support all audio and video file types without any transcoding.

Synchronous live streams

We support live streaming for select APIs and models. Please review the documentation for details.


JSON format

API output data is delivered in JavaScript Object Notation (JSON) format.

Export transcription text as captions

Easily export your transcription as SRT or VTT format to be directly plugged into video players for subtitles and captions.

Conversation Insights

Accurate transcription

Convert speech into readable text from a live stream or from audio and/or video recordings in minutes.

Punctuation and casing

Automatically add punctuation and casing in the transcription text.

Multi speaker seperation

Automatically recognize and separate speakers in a group conversation. Attribute speaker names within a group of enrolled speakers.

Type of speech

Identify the type of speech based on context and tone such as a statement, question, command and so on.

Action items

Detect and list action items and tasks in the transcription text.

Questions and responses

Detect questions and related responses in the transcription text.


Automatically determine the topics, entities, concepts discussed in the transcription text.


Extract keywords and key phrases in the transcription text.

Abstractive summary

Generate a concise paraphrased summary from the transcription text.

Extractive summary

Reduce the transcription text into a short summary preserving the keywords and phrases.

Tone of voice

Use the acoustic voice of the speaker to determine the tone such as neutral, calm, happy, sad, angry, fearful, disgust, surprised.

Speaker emotion

Detect the emotion such as anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise, trust from the language of the speaker.

Sentiment analysis

Determine the sensitivity of the topic in the transcription text to classify the sentiment as positive, negative, neutral.

Visual Insights

Screen activity

Detect the type of on-screen activity as interaction, slide share, motion graphics etc.

Screen grab

Capture key frames and slides to chapterize a visual presentation.

Optical Character Recognition (OCR)

Automatically detect text in a typed, handwritten, or a display form into machine encoded text in a visual presentation.

Faces and labels

Automatically detect people and objects in a visual presentation.

Conversation Metrics

Call quality ratio
  • Talk-to-listen ratio
  • Average speech speed
  • Bitrates, jitter, latency
  • Connection and bandwidth
Speech insights
  • Phrase cloud
  • Dear air
  • Longest monologue
  • Filler words
  • Speech clarity
Time based data
  • Sentiment by time
  • Topics by time
  • Speaker emotion by time
  • Engagement score
  • Sentiment score
  • Call quality score


Custom vocabulary

Add new words to the base vocabulary or train your own language model to generate more accurate transcriptions for domain-specific words and phrases like product names, technical terminology, or names of individuals.

Security and Support Declaration

High uptime and 24x7 support

99.9% uptime with always-on support via email, chat and web call.

Robust security

Marsview encrypts all data via 256-bit SSL encryption and complies with GDPR and CCPA standards.

Deployment options

Optimized for public cloud and private on-premise deployments.

Use cases
  • AI-powered sales and customer support services
  • IVA and IVR Automation
  • Call Transcription and Analytics
  • Messaging and Social Media Applications
  • Assist and coach sales and support agents
  • Improve customer experience
  • Optimize and close deals faster
  • Gain actionable insights and intelligence