Joshua logo (picture of a Joshua tree)

Joshua decoder

Joshua is an open-source statistical machine translation decoder for hierarchical and syntax-based machine translation, written in Java. It is developed by Chris Callison-Burch's research group at the Center for Language and Speech Processing and the Human Language Technology Center of Excellence at Johns Hopkins University. For a high-level description, see the following publication describing Joshua 3.0: Joshua 3.0: Syntax-based Machine Translation with the Thrax Grammar Extractor [PDF link].

Download

Joshua's source code is available at https://github.com/joshua-decoder/joshua. To clone the repository (which downloads the complete source code), you can type:

  git clone git@github.com:joshua-decoder/joshua.git
  ant jar
      

Downloading this way will enable you to easily pull in bugfixes as they are published. If you prefer, you can also download a ZIP file archive by following the code link above and clicking the "ZIP" button.

Usage

The easiest way to use Joshua is to use the pipeline script which is included with the source code. This script supports multiple use cases; see the documentation for more information. For more detail, Chris Callison-Burch describes how to manually run the steps of the machine translation pipeline. (This information is a bit outdated and much of it is automated by the pipeline, but the broad steps still apply).

If you run into difficulty, feel free to email the Joshua Technical Support Group, or search its archives.

Mailing lists

Contributors

Joshua was originally ported from David Chiang's Python implementation of Hiero by Zhifei Li. Since then, there have been a number of contributors to the project (listed here in alphabetical order).

Please email if you know of anyone who has been left off this list.

Acknowledgements


Human Language Technology Center of Excellence (logo)
Twitter feed: