Joshua is an open-source statistical machine translation decoder for hierarchical and syntax-based machine translation, written in Java. It is developed by Chris Callison-Burch's research group at the Center for Language and Speech Processing and the Human Language Technology Center of Excellence at Johns Hopkins University. For a high-level description, see the following publication describing Joshua 3.0: Joshua 3.0: Syntax-based Machine Translation with the Thrax Grammar Extractor [PDF link].
Download
Joshua's source code is available at https://github.com/joshua-decoder/joshua. To clone the repository (which downloads the complete source code), you can type:
git clone git@github.com:joshua-decoder/joshua.git
ant jar
Downloading this way will enable you to easily pull in bugfixes as they are published. If you prefer, you can also download a ZIP file archive by following the code link above and clicking the "ZIP" button.
Usage
The easiest way to use Joshua is to use the pipeline script which is included with the source code. This script supports multiple use cases; see the documentation for more information. For more detail, Chris Callison-Burch describes how to manually run the steps of the machine translation pipeline. (This information is a bit outdated and much of it is automated by the pipeline, but the broad steps still apply).
If you run into difficulty, feel free to email the Joshua Technical Support Group, or search its archives.
Mailing lists
- Technical support (for end users with questions and technical difficulties): joshua_support@googlegroups.com
- Developers: joshua_developers@googlegroups.com
Contributors
Joshua was originally ported from David Chiang's Python implementation of Hiero by Zhifei Li. Since then, there have been a number of contributors to the project (listed here in alphabetical order).
- Aaron Phillips
- Adam Pauls
- Chris Callison-Burch
- Chris Dyer
- Colin Bannard
- Jonathan Weese
- Josh Schroeder
- Juri Ganitkevitch
- Kenneth Heafield
- Lane Schwartz
- Matt Post
- Omar F. Zaidan
- Wren Ng Thornton
- Yuan Cao
Acknowledgements