Back to top

Turning an AWS Transcribe JSON file into a more useful doc

A colleague recently needed to transcribe some recorded interviews. They used AWS Transcribe which outputs a json file that is not super easy to use directly. Luckily a few open source tools have popped up to make them more generally legible. I helped to turn the results into a docx and wanted to document the process for my colleague and anyone else interested in the process.

I found kibaffo33's tool, so this is basically a guide to getting that running and how to use it in its most basic form to process a few files.

I already had homebrew installed. If you don't have that, go get it now.

Get python3 and pip set up

First, I was on a freshly installed Catalina machine. It ships with python2, but pip complains about that (as it should) so I need to get python3 installed. From OpenSource.com article I got python3 installed using these commands:


$ brew install pyenv

The opensource article suggests using pyenv to install 3.7, but I wanted the latest, so first I checked:


$ pyenv install -l | grep 3

I found that 3.8.2 is an acceptable argument, so I did:


$ pyenv install 3.8.2
$ pyenv global 3.8.2

I'm still on bash, so I did this to get bash set up:


echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> .bash_profile

Running pyenv init - will tell you what to add to your fish config.

Then I closed the terminal window and reopened it so that the profile change would take effect.

Finally, to install the package:


$ pip install tscribe

Using tscribe to transform AWS json to a docx

I had several json files named things like KK.json To turn these into a docx file the process is to:

  1. run python to start up the python command line.
  2. run import tscribe to load the tscribe library.
  3. run tscribe.write("KK.json") to get your computer to process the file.

The results of that command by default are to write a docx file that has the name that was given to AWS as the Job Name, so in this screenshot it is KK_CDS.docx.

That's it!

Category: 
People Involved: 
Location: