Referanslar:
unshuffled_deduplicated_af
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_af')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişim kurulabilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 130640 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_als
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_als')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 4518 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_arz
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_arz')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişim kurulabilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 79928 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_an
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_an')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 2025 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ast
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ast')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 5343 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ba
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ba')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 27050 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_am
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_am')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 43102 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_as
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_as')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 9212 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_azb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_azb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 9985 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_be
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_be')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 307405 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 15762 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bxr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bxr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 36 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ceb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ceb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişim kurulabilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 26145 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_az
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_az')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 626796 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bcl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bcl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_cy
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_cy')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 98225 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_dsb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_dsb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 37 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1114481 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bs
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bs')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 702 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ce
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ce')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 2984 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_cv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_cv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 10130 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_diq
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_diq')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_eml
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_eml')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 80 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_et
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_et')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1172041 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bg
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bg')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 3398679 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bpy
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bpy')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1770 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ca
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ca')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 2458067 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ckb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ckb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 68210 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ar
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ar')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- İhlalde bulunduğu iddia edilen materyali ve materyali bulmamıza olanak sağlayacak makul ölçüde yeterli bilgiyi açıkça tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 9006977 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_av
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_av')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler bu lisans şeması kapsamında yayınlanmaktadır. Bu verilerin alındığı hiçbir metnin sahibi değiliz. Bu verilerin fiili paketlenmesine Creative Commons CC0 lisansı ("hiçbir hak saklı değildir") kapsamında lisans veriyoruz http://creativecommons.org/publicdomain/zero/1.0/ Inria, kanunlar kapsamında mümkün olduğu ölçüde, tüm telif haklarından ve ilgili veya OSCAR'a komşu haklar Bu çalışma yayınlandığı kaynak: Fransa.
Verilerimizin size ait materyal içerdiğini ve bu nedenle burada çoğaltılmaması gerektiğini düşünüyorsanız lütfen:
- Sizinle iletişime geçilebilecek adres, telefon numarası veya e-posta adresi gibi ayrıntılı iletişim bilgileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça tanımlayın.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 360 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Shuffled_ddeduplicated_bar
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bar')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 4 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unffuffled_deduplicated_bh
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bh')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 82 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unffuffled_deduplicated_br
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_br')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 14724 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unffuffled_deduplicated_cbk
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_cbk')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Shuffled_deduplicated_da
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_da')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 4771098 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unffuffled_deduplicated_dv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_dv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 17024 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Unwfffled_deduplicated_eo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_eo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 84752 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Shuffled_deduplicated_fa
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_fa')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 8203495 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unffuffled_deduplicated_fy
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_fy')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 20661 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unffuffled_deduplicated_gn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 68 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Unwfflled_deduplicated_cs
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_cs')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 12308039 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Unwfflled_deduplicated_hi
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hi')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1909387 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Unwfflled_deduplicated_hu
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hu')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 6582908 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Shuffled_deduplicated_ie
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ie')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 11 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Shuffled_deduplicated_fr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_fr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 59448891 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Shuffled_deduplicated_gd
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gd')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 3883 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Unwfflled_deduplicated_gu
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gu')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 169834 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unffuffled_deduplicated_hsb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hsb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 3084 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Shuffled_deduplicated_ia
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ia')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 529 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Unwfflled_deduplicated_io
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_io')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 617 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unffuffled_deduplicated_jbo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_jbo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 617 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Unffuffled_deduplicated_km
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_km')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 108346 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Shuffled_deduplicated_ku
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ku')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 29054 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Shuffled_deduplicated_la
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_la')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 18808 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Unwfflled_deduplicated_lmo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lmo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1374 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Unwfflled_deduplicated_lv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 843195 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Unwfflled_deduplicated_min
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_min')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 166 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Unwfflled_deduplicated_mr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 212556 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Unwfflled_deduplicated_mwl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mwl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Adres, telefon numarası veya sizinle iletişime geçilebilecek e -posta adresi gibi ayrıntılı iletişim verileriyle kendinizi açıkça tanımlayın.
- İhlal edildiği iddia edilen telif hakkıyla korunan çalışmayı açıkça belirleyin.
- Materyali bulmamıza izin vermek için ihlal ettiği iddia edilen materyali ve bilgileri makul bir şekilde yeterli olarak tanımlayın.
Etkilenen kaynakları korpusun bir sonraki sürümünden kaldırarak meşru taleplere uyacağız.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 7 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
Unwfflled_deduplicated_nah
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nah')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
Lisans : Bu veriler, bu verilerin çıkarıldığı metnin hiçbirine sahip değiliz. Bu verilerin gerçek ambalajını Creative Commons CC0 lisansı ("Hakları Saklıdır") altında lisanslıyoruz http://creativecommons.org/publicdomain/zero/1.0/ yasalar altında mümkün olan ölçüde, Inria tüm telif hakkı ve ilgili veya ilgili feragat etti. Oscar'a Komşu Haklar Bu çalışma: Fransa'dan yayınlandı.
Verilerimizin size ait olduğu ve bu nedenle burada çoğaltılmaması gerektiğini düşünürseniz, lütfen:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 58 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_new
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_new')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 2126 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_oc
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_oc')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 6485 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_pam
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pam')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ps
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ps')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 67921 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_it
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_it')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 28522082 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ka
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ka')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 372158 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ro
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ro')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 5044757 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_scn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_scn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 17 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ko
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ko')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 3675420 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_kw
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_kw')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 68 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_lez
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lez')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1381 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_lrc
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lrc')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 72 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mg
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mg')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 13343 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ml
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ml')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 453904 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ms
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ms')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 183443 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_myv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_myv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 5 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_nds
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nds')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 8714 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_nn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 109118 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_os
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_os')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 2559 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_pms
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pms')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 2859 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_qu
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_qu')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 411 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sa
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sa')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 7121 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sk
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sk')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 2820821 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sh
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sh')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 17610 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_so
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_so')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 42 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 645747 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ta
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ta')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 833101 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_tk
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tk')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 4694 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_tyv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tyv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 24 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_uz
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_uz')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 15074 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_wa
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_wa')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 677 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_xmf
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_xmf')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 2418 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 11014487 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_tg
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tg')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 56259 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_de
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_de')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 62398034 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_tr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 11596446 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_el
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_el')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 6521169 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_uk
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_uk')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 7782375 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_vi
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_vi')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 9897709 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_wuu
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_wuu')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 64 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_yo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_yo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 49 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_als
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_als')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 7324 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_arz
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_arz')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 158113 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_az
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_az')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 912330 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bcl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_bcl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_bn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1675515 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bs
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_bs')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 2143 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ce
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ce')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 4042 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_cv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_cv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 20281 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_diq
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_diq')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_eml
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_eml')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 84 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_et
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_et')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 2093621 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_zh
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_zh')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 41708901 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_an
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_an')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 2449 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ast
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ast')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 6999 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ba
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ba')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 42551 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bg
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_bg')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 5869686 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bpy
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_bpy')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 6046 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ca
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ca')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 4390754 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ckb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ckb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 103639 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_es
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_es')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 56326016 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_da
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_da')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 7664010 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_dv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_dv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 21018 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_eo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_eo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 121168 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_fi
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_fi')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 5326443 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ga
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ga')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 46493 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_gom
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gom')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 484 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_hr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 321484 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_hy
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hy')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 396093 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ilo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ilo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1578 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_fa
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_fa')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 13704702 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_fy
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_fy')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 33053 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_gn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_gn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 106 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_hi
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_hi')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 3264660 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_hu
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_hu')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 11197780 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ie
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ie')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 101 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ja
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ja')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 39496439 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_kk
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_kk')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 338073 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_krc
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_krc')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1377 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ky
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ky')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 86561 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_li
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_li')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 118 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_lt
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lt')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1737411 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mhr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mhr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 2515 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 197878 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mt
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mt')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 16383 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mzn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mzn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 917 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ne
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ne')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 219334 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_no
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_no')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 3229940 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_pa
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pa')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 87235 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_pnb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pnb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 3463 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_rm
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_rm')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 34 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sah
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sah')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 8555 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_si
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_si')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 120684 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sq
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sq')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 461598 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sw
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sw')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 24803 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_th
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_th')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 3749826 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_tt
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tt')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 82738 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ur
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ur')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 428674 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_vo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_vo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 3317 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_xal
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_xal')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 36 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_yue
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_yue')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 7 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_am
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_am')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 83663 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_as
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_as')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 14985 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_azb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_azb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 15446 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_be
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_be')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 586031 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_bo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 26795 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bxr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_bxr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 42 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ceb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ceb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 56248 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_cy
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_cy')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 157698 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_dsb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_dsb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 65 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_fr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_fr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 96742378 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_gd
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_gd')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 5799 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_gu
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_gu')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 240691 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_hsb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_hsb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 7959 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ia
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ia')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1040 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_io
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_io')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 694 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_jbo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_jbo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 832 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_km
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_km')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 159363 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ku
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ku')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 46535 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_la
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_la')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 94588 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lmo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_lmo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1401 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_lv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1593820 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_min
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_min')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 220 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_mr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 326804 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mwl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_mwl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 8 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_nah
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_nah')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 61 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_new
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_new')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 4696 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_oc
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_oc')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 10709 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_pam
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_pam')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 3 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ps
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ps')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 98216 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ro
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ro')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 9387265 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_scn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_scn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 21 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sk
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_sk')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 5492194 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_sr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1013619 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ta
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ta')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 1263280 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_tk
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_tk')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 6456 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_tyv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_tyv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 34 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_uz
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_uz')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 27537 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_wa
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_wa')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1001 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_xmf
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_xmf')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 3783 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_it
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_it')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 46981781 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ka
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ka')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 563916 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ko
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ko')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 7345075 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_kw
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_kw')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 203 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lez
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_lez')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1485 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lrc
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_lrc')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 88 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mg
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_mg')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 17957 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ml
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ml')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 603937 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ms
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ms')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 534016 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_myv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_myv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 6 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_nds
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_nds')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 18174 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_nn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_nn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 185884 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_os
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_os')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 5213 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_pms
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_pms')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 3225 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_qu
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_qu')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 452 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sa
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_sa')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 14291 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sh
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_sh')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 36700 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_so
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_so')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 156 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_sv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 17395625 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_tg
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_tg')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 89002 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_tr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_tr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 18535253 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_uk
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_uk')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 12973467 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_vi
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_vi')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 14898250 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_wuu
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_wuu')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 214 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_yo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_yo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 214 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_zh
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_zh')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 60137667 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_en
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_en')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 304230423 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_eu
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_eu')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 256513 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_frr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_frr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 7 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_gl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 284320 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_he
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_he')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 2375030 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ht
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ht')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 9 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_id
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_id')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 9948521 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_is
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_is')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 389515 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_jv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_jv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1163 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_kn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_kn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 251064 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_kv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_kv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 924 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_lb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 21735 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_lo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 32652 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mai
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mai')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 25 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mk
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mk')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 299457 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mrj
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mrj')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 669 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_my
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_my')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 136639 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_nap
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nap')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 55 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_nl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 20812149 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_or
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_or')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 44230 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_pl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 20682611 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_pt
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pt')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 26920397 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ru
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ru')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 115954598 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sd
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sd')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 33925 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 886223 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_su
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_su')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 511 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_te
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_te')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 312644 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_tl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 294132 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ug
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ug')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 15503 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_vec
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_vec')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 64 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_war
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_war')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 9161 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_yi
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_yi')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 32919 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_af
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_af')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 201117 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ar
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ar')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 16365602 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_av
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_av')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 456 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bar
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_bar')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 4 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bh
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_bh')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 336 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_br
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_br')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 37085 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_cbk
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_cbk')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_cs
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_cs')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 21001388 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_de
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_de')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 104913504 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_el
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_el')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 10425596 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_es
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_es')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 88199221 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_fi
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_fi')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 8557453 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ga
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ga')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 83223 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_gom
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_gom')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 640 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_hr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_hr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 582219 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_hy
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_hy')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 659430 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ilo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ilo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 2638 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ja
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ja')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 62721527 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_kk
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_kk')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 524591 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_krc
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_krc')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1581 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ky
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ky')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 146993 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_li
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_li')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 137 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lt
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_lt')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 2977757 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mhr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_mhr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 3212 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_mn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 395605 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mt
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_mt')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 26598 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mzn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_mzn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1055 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ne
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ne')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 299938 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_no
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_no')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 5546211 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_pa
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_pa')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 127467 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_pnb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_pnb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 4599 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_rm
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_rm')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 41 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sah
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_sah')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 22301 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_si
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_si')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 203082 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sq
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_sq')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 672077 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sw
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_sw')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 41986 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_th
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_th')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 6064129 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_tt
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_tt')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 135923 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ur
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ur')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 638596 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_vo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_vo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 3366 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_xal
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_xal')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 39 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_yue
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_yue')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 11 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_en
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_en')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Bölünmeler :
Bölmek | Örnekler |
---|---|
'train' | 455994980 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_eu
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_eu')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 506883 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_frr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_frr')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 7 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_gl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_gl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 544388 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_he
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_he')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 3808397 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ht
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ht')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 13 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_id
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_id')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 16236463 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_is
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_is')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 625673 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_jv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_jv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1445 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_kn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_kn')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 350363 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_kv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_kv')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1549 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lb
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_lb')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 34807 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lo
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_lo')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 52910 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mai
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_mai')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 123 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mk
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_mk')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 437871 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mrj
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_mrj')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 757 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_my
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_my')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 232329 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_nap
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_nap')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 73 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_nl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_nl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 34682142 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_or
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_or')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 59463 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_pl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_pl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 35440972 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_pt
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_pt')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 42114520 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ru
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ru')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 161836003 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sd
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_sd')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 44280 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_sl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 1746604 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_su
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_su')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Sürüm : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 805 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_te
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_te')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 475703 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_tl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_tl')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 458206 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ug
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_ug')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 22255 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_vec
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_vec')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 73 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_war
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_war')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 9760 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_yi
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:oscar/unshuffled_original_yi')
- Tanım :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
Bölmek | Örnekler |
---|---|
'train' | 59364 |
- Özellikler :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}