Submitted by greggles on
A colleague recently needed to transcribe some recorded interviews. They used AWS Transcribe which outputs a json file that is not super easy to use directly. Luckily a few open source tools have popped up to make them more generally legible. I helped to turn the results into a docx and wanted to document the process for my colleague and anyone else interested in the process.
I found kibaffo33's tool, so this is basically a guide to getting that running and how to use it in its most basic form to process a few files.
I already had homebrew installed. If you don't have that, go get it now.
Get python3 and pip set up
First, I was on a freshly installed Catalina machine. It ships with python2, but pip complains about that (as it should) so I need to get python3 installed. From OpenSource.com article I got python3 installed using these commands:
$ brew install pyenv
The opensource article suggests using pyenv to install 3.7, but I wanted the latest, so first I checked:
$ pyenv install -l | grep 3
I found that 3.8.2 is an acceptable argument, so I did:
$ pyenv install 3.8.2
$ pyenv global 3.8.2
I'm still on bash, so I did this to get bash set up:
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> .bash_profile
Running pyenv init -
will tell you what to add to your fish config.
Then I closed the terminal window and reopened it so that the profile change would take effect.
Finally, to install the package:
$ pip install tscribe
Using tscribe to transform AWS json to a docx
I had several json files named things like KK.json To turn these into a docx file the process is to:
- run
python
to start up the python command line. - run
import tscribe
to load the tscribe library. - run
tscribe.write("KK.json")
to get your computer to process the file.
The results of that command by default are to write a docx file that has the name that was given to AWS as the Job Name, so in this screenshot it is KK_CDS.docx.
That's it!
- Log in to post comments