Joshua is an open-source statistical machine translation decoder for hierarchical and syntax-based machine translation, written in Java. It is developed by Chris Callison-Burch's research group at the Center for Language and Speech Processing and the Human Language Technology Center of Excellence at Johns Hopkins University.
Click here for a list of changes from the previous version (4.0). Some of these features are described in more detail in the paper
Joshua 4.0: Packing, PRO, and Paraphrases BibTeXYou might also be interested in older papers describing Joshua 1.0, the Joshua demo paper, 2.0 and 3.0.
Juri Ganitkevitch, Yuan Cao, Jonathan Weese, Matt Post, and Chris Callison-Burch
WMT 2012
Download
To download, click the big green download button to the left. This will deliver to you a tarball of our latest release, version 5.0.
Usage
Check out our large collection of user documentation for information on downloading and setup, a quick start guide, detailed information on different Joshua components, common problems, and more.
The easiest way to use Joshua is to use the pipeline script which is included with the source code. This script supports multiple use cases; see the documentation for more information. For more detail, Chris Callison-Burch describes how to manually run the steps of the machine translation pipeline. (This information is a bit outdated and much of it is automated by the pipeline, but the broad steps still apply).
If you run into difficulty, feel free to email the Joshua Technical Support Group, or search its archives.
Mailing lists
- Technical support (for end users with questions and technical difficulties): joshua_support@googlegroups.com
- Developers: joshua_developers@googlegroups.com
Contributors
Joshua was originally ported from David Chiang's Python implementation of Hiero by Zhifei Li. Since then, there have been a number of contributors to the project (listed here in alphabetical order).
- Aaron Phillips
- Adam Pauls
- Chris Callison-Burch
- Chris Dyer
- Colin Bannard
- Jonathan Weese
- Josh Schroeder
- Juri Ganitkevitch
- Kenneth Heafield
- Lane Schwartz
- Luke Orland
- Matt Post
- Omar F. Zaidan
- Wren Ng Thornton
- Yuan Cao
Other decoders
There are many other publicly-available open-source decoders available. Some of them are:
- cdec, an SCFG decoder
- Moses, the defacto phrase-based translation system which also supports hierarchical decoding
- Jane a hierarchical phrase-based system