Screen Activity (Visual)

This API captures the visual activity in a conversation

Overview

Identify and analyze the visual aspects of the meeting along with the corresponding timestamps and with Screen Activity API.

Types

Activity

Description

Speaker/Interaction

when the meeting/conversation has no visual elements or if it is face-to-face (video is ON) conversation.

Sketching/whiteboarding

detects when there is whiteboarding or sketching in the meeting

Presentation

detects full-screen presentations during the meeting

Screen Share

detects general screen share like browsing through the computer, webpages, etc. during a screen share session.

Input Type Supported: Video, Image

Model Dependency: OCR

post
Compute Metadata

https://api.marsview.ai/v1/conversation/compute
This method is used to upload an audio or video file on which metadata has to be computed. Settings object can be used to enable/disable metadata from different models. Check the overview section for getting a list of models that are available
Request
Response
Request
Headers
appSecret
required
string
<sample-app-secret>
appId
required
string
<sample-app-Id>
Content-Type
optional
string
application/json
Body Parameters
screengrabs.enable
required
boolean
Speech_type will be computed when screengrabs.enable key is set to true under the settings object
Response
200: OK
A Transaction ID is returned in the JSON body once the processing job is launched successfully. This Transaction ID can be used to check the status of the job or fetch the results of the job once the metadata is computed
{
"status":true,
"transaction_id":32dcef1a-5724-4df8-a4a5-fb43c047716b,
"message": " Compute job for file-id: 32dcef1a-5724-4df8-a4a5-fb43c047716b launched successfully"
}
400: Bad Request
This usually happens when the settings for computing the metadata are not configured correctly. Check the request object and also the dependencies required to compute certain metadata objects. ( For Example: Speech to Text has to be enabled for Action Items to be enabled). This can also happen when the input file_id is not of the supported format as shown in the example below.
{
"status":false,
"error":{
"code":"VDNF01",
"message":"FileTypeError: Require file to be of type Video"
}
}

Example API Call

Request

CURL
CURL
curl --request POST 'https://api.marsview.ai/v1/conversation/compute' \
--header 'appSecret: 32dcef1a-5724-4df8-a4a5-fb43c047716b' \
--header 'appId: 1ZrKT0tTv7rVWX-qNAKLc' \
--header 'Content-Type: application/json' \
--data-raw '{
"settings":{
"speech_to_text":{
"enable":true,
"pii_detection":false,
"custom_vocabulary":["Marsview" , "Pikachu"]
},
"speaker_separation":{
"enable":true,
"num_speakers":4
},
"screen_activity":{
"enable":true
}
}
}'

Response

Given below is a sample response JSON when the Status code is 200.

{
"status":true,
"transaction_id":32dcef1a-5724-4df8-a4a5-fb43c047716b,
"message": " Compute job for file-id: 32dcef1a-5724-4df8-a4a5-fb43c047716b launched successfully"
}

post
Request Metadata

https://api.marsview.ai/v1/conversation/fetch
This method is used to fetch specific Metadata for a particular file_id. It can also be used for long polling to track the progress of compute under the status object.
Request
Response
Request
Headers
Content-Type
optional
string
application/json
appId
optional
string
<sample-app-id>
appSecret
optional
string
<sample-app-secret>
Body Parameters
fileID
optional
string
fileId of the audio/video file
data.screen_activity
optional
boolean
Returns screen activity for file_id once computed
Response
200: OK
The output consists of two objects. The data object returns the requested metadata if it is computed. The status object shows the current state of the requested metadata. Status for each metadata field can take values "Queued"/"Processing"/"Completed". Shown below is a case where "sentiment analysis" Job is in "Queued" state and "Completed" state.
QUEUED STATE
COMPLETED STATE
QUEUED STATE
{
"status":{
"screen_activity":"Queued",
}
"data":{
"screen_activity":{}
}
}
COMPLETED STATE
{
"status":{
"screengrabs":"Completed"
}
"data":{
"screengrabs":{
"chunks":[
...
{
"frame_time" : "174100.0"
"frame_id" : 1235
"confidence" : "0.33",
"type":"Presentation"
},
{
"frame_time" : "174100.0"
"frame_id" : "1521",
"confidence" :"0.95",
"type":"Screen Share"
},
...
]
}
}
}

Response Object Fields

Fields

Description

frame_time

Offset time of the frame in milliseconds from the starting of the video

frame_id

Offset frame_id (Frame number) from the starting of the video

type

Type of the screen that is detected. refer to Types for more details.

confidence

Confidence of the speech type label (ranges from 0 to 1). Higher the better