- Description:
Web-Scale Parallel Corpora for Official European Languages.
Additional Documentation: Explore on Papers With Code
Homepage: https://paracrawl.eu/releases.html
Source code:
tfds.datasets.para_crawl.BuilderVersions:
1.2.0(default): No release notes.
Figure (tfds.show_examples): Not supported.
Citation:
@misc {paracrawl,
title = "ParaCrawl",
year = "2018",
url = "http://paracrawl.eu/download.html."
}
para_crawl/enbg (default config)
Config description: Translation dataset from English to bg.
Download size:
98.94 MiBDataset size:
362.46 MiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
1,039,885 |
- Feature structure:
Translation({
'bg': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| bg | Text | string | ||
| en | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'bg')Examples (tfds.as_dataframe):
para_crawl/encs
Config description: Translation dataset from English to cs.
Download size:
187.31 MiBDataset size:
666.34 MiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
2,981,949 |
- Feature structure:
Translation({
'cs': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| cs | Text | string | ||
| en | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'cs')Examples (tfds.as_dataframe):
para_crawl/enda
Config description: Translation dataset from English to da.
Download size:
174.34 MiBDataset size:
619.77 MiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
2,414,895 |
- Feature structure:
Translation({
'da': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| da | Text | string | ||
| en | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'da')Examples (tfds.as_dataframe):
para_crawl/ende
Config description: Translation dataset from English to de.
Download size:
1.22 GiBDataset size:
4.04 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
16,264,448 |
- Feature structure:
Translation({
'de': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| de | Text | string | ||
| en | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'de')Examples (tfds.as_dataframe):
para_crawl/enel
Config description: Translation dataset from English to el.
Download size:
184.59 MiBDataset size:
698.75 MiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
1,985,233 |
- Feature structure:
Translation({
'el': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| el | Text | string | ||
| en | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'el')Examples (tfds.as_dataframe):
para_crawl/enes
Config description: Translation dataset from English to es.
Download size:
1.82 GiBDataset size:
6.23 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
21,987,267 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'es': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| es | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'es')Examples (tfds.as_dataframe):
para_crawl/enet
Config description: Translation dataset from English to et.
Download size:
66.91 MiBDataset size:
209.16 MiBAuto-cached (documentation): Only when
shuffle_files=False(train)Splits:
| Split | Examples |
|---|---|
'train' |
853,422 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'et': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| et | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'et')Examples (tfds.as_dataframe):
para_crawl/enfi
Config description: Translation dataset from English to fi.
Download size:
151.83 MiBDataset size:
543.85 MiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
2,156,069 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'fi': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| fi | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'fi')Examples (tfds.as_dataframe):
para_crawl/enfr
Config description: Translation dataset from English to fr.
Download size:
2.63 GiBDataset size:
9.04 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
31,374,161 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'fr': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| fr | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'fr')Examples (tfds.as_dataframe):
para_crawl/enga
Config description: Translation dataset from English to ga.
Download size:
28.03 MiBDataset size:
107.09 MiBAuto-cached (documentation): Yes
Splits:
| Split | Examples |
|---|---|
'train' |
357,399 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'ga': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| ga | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'ga')Examples (tfds.as_dataframe):
para_crawl/enhr
Config description: Translation dataset from English to hr.
Download size:
80.97 MiBDataset size:
256.37 MiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
1,002,053 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'hr': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| hr | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'hr')Examples (tfds.as_dataframe):
para_crawl/enhu
Config description: Translation dataset from English to hu.
Download size:
114.24 MiBDataset size:
421.40 MiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
1,901,342 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'hu': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| hu | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'hu')Examples (tfds.as_dataframe):
para_crawl/enit
Config description: Translation dataset from English to it.
Download size:
1017.30 MiBDataset size:
3.36 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
12,162,239 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'it': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| it | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'it')Examples (tfds.as_dataframe):
para_crawl/enlt
Config description: Translation dataset from English to lt.
Download size:
63.28 MiBDataset size:
204.70 MiBAuto-cached (documentation): Only when
shuffle_files=False(train)Splits:
| Split | Examples |
|---|---|
'train' |
844,643 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'lt': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| lt | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'lt')Examples (tfds.as_dataframe):
para_crawl/enlv
Config description: Translation dataset from English to lv.
Download size:
45.17 MiBDataset size:
147.09 MiBAuto-cached (documentation): Only when
shuffle_files=False(train)Splits:
| Split | Examples |
|---|---|
'train' |
553,060 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'lv': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| lv | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'lv')Examples (tfds.as_dataframe):
para_crawl/enmt
Config description: Translation dataset from English to mt.
Download size:
18.15 MiBDataset size:
54.36 MiBAuto-cached (documentation): Yes
Splits:
| Split | Examples |
|---|---|
'train' |
195,502 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'mt': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| mt | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'mt')Examples (tfds.as_dataframe):
para_crawl/ennl
Config description: Translation dataset from English to nl.
Download size:
400.63 MiBDataset size:
1.40 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
5,659,268 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'nl': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| nl | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'nl')Examples (tfds.as_dataframe):
para_crawl/enpl
Config description: Translation dataset from English to pl.
Download size:
257.90 MiBDataset size:
885.63 MiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
3,503,276 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'pl': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| pl | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'pl')Examples (tfds.as_dataframe):
para_crawl/enpt
Config description: Translation dataset from English to pt.
Download size:
608.62 MiBDataset size:
2.05 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
8,141,940 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'pt': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| pt | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'pt')Examples (tfds.as_dataframe):
para_crawl/enro
Config description: Translation dataset from English to ro.
Download size:
153.24 MiBDataset size:
534.34 MiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
1,952,043 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'ro': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| ro | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'ro')Examples (tfds.as_dataframe):
para_crawl/ensk
Config description: Translation dataset from English to sk.
Download size:
96.61 MiBDataset size:
352.91 MiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
1,591,831 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'sk': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| sk | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'sk')Examples (tfds.as_dataframe):
para_crawl/ensl
Config description: Translation dataset from English to sl.
Download size:
62.02 MiBDataset size:
187.66 MiBAuto-cached (documentation): Only when
shuffle_files=False(train)Splits:
| Split | Examples |
|---|---|
'train' |
660,161 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'sl': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| sl | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'sl')Examples (tfds.as_dataframe):
para_crawl/ensv
Config description: Translation dataset from English to sv.
Download size:
262.76 MiBDataset size:
905.72 MiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
3,476,729 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'sv': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| sv | Text | string |
Supervised keys (See
as_superviseddoc):('en', 'sv')Examples (tfds.as_dataframe):