তথ্যসূত্র:
unshuffled_deduplicated_af
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_af')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 130640 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_als
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_als')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 4518 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_arz
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_arz')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 79928 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_an
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_an')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2025 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ast
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ast')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 5343 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ba
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ba')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 27050 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_am
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_am')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 43102 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_as
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_as')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 9212 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_azb
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_azb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 9985 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_be
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_be')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 307405 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bo
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 15762 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bxr
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bxr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 36 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ceb
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ceb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 26145 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_az
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_az')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 626796 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bcl
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bcl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_cy
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_cy')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 98225 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_dsb
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_dsb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 37 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bn
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1114481 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bs
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bs')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 702 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ce
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ce')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2984 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_cv
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_cv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 10130 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_diq
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_diq')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_eml
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_eml')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 80 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_et
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_et')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1172041 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bg
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bg')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | ৩৩৯৮৬৭৯ |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_bpy
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bpy')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1770 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ca
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ca')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2458067 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ckb
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ckb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 68210 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ar
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ar')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- লঙ্ঘনকারী বলে দাবি করা হয়েছে এমন উপাদানটি স্পষ্টভাবে সনাক্ত করুন এবং আমাদের উপাদানটি সনাক্ত করার অনুমতি দেওয়ার জন্য যথেষ্ট তথ্য।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে প্রভাবিত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব৷
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 9006977 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_av
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_av')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় যা থেকে এই ডেটাগুলি বের করা হয়েছে এমন কোনও পাঠ্যের মালিকানা আমাদের নেই৷ আমরা ক্রিয়েটিভ কমন্স CC0 লাইসেন্সের অধীনে এই ডেটার প্রকৃত প্যাকেজিং লাইসেন্স করি ("কোনও অধিকার সংরক্ষিত নেই") OSCAR-এর প্রতিবেশী অধিকার এই কাজটি থেকে প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটাতে আপনার মালিকানাধীন উপাদান রয়েছে এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, অনুগ্রহ করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানার মতো বিস্তারিত যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে সনাক্ত করুন।
- লঙ্ঘন বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে চিহ্নিত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 360 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_bar
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bar')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 4 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplated_bh
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_bh')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 82 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplated_br
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_br')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 14724 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplated_cbk
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_cbk')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_da
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_da')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 4771098 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_dv
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_dv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 17024 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_eo
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_eo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 84752 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplatic_fa
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_fa')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 8203495 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_fy
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_fy')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 20661 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_gn
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 68 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplated_cs
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_cs')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 12308039 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplated_hi
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hi')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1909387 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplatic_hu
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hu')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 6582908 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_ie
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ie')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 11 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplated_fr
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_fr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 59448891 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_gd
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gd')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3883 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplated_gu
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gu')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 169834 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicate_hsb
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hsb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3084 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_ia
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ia')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 529 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplatic_io
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_io')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 617 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_jbo
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_jbo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 617 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_km
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_km')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 108346 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplatic_ku
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ku')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 29054 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplated_la
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_la')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 18808 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_lmo
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lmo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1374 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_lv
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 843195 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplated_min
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_min')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 166 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_mr
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 212556 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplicated_mwl
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mwl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- আপনার সাথে যোগাযোগ করা যেতে পারে এমন কোনও ঠিকানা, টেলিফোন নম্বর বা ইমেল ঠিকানা হিসাবে বিশদ যোগাযোগের ডেটা সহ স্পষ্টভাবে নিজেকে চিহ্নিত করুন।
- লঙ্ঘিত বলে দাবি করা কপিরাইটযুক্ত কাজটি স্পষ্টভাবে সনাক্ত করুন।
- আমাদের উপাদানগুলি সনাক্ত করার অনুমতি দেওয়ার জন্য যথাযথভাবে যথেষ্ট পরিমাণে লঙ্ঘনকারী এবং তথ্য যথেষ্ট বলে দাবি করা হয়েছে এমন উপাদানগুলি স্পষ্টভাবে সনাক্ত করুন।
আমরা কর্পাসের পরবর্তী প্রকাশ থেকে আক্রান্ত উত্সগুলি সরিয়ে বৈধ অনুরোধগুলি মেনে চলব।
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 7 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uncuffled_deduplatic_nah
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nah')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
লাইসেন্স : এই ডেটাগুলি এই লাইসেন্সিং স্কিমের অধীনে প্রকাশিত হয় আমাদের যে কোনও পাঠ্য থেকে এই ডেটা বের করা হয়েছে তার কোনওটিরই মালিকানা নেই। আমরা ক্রিয়েটিভ কমন্স সিসি 0 লাইসেন্স ("কোনও অধিকার সংরক্ষিত নেই") এর অধীনে এই ডেটাগুলির প্রকৃত প্যাকেজিং লাইসেন্স http://creativecommons.org/publicomain/zero/1.0/ আইনের অধীনে যে পরিমাণে সম্ভব সম্ভব, ইনরিয়া সমস্ত কপিরাইট এবং সম্পর্কিত বা মওকুফ করেছে বা অস্কারের প্রতিবেশী অধিকারগুলি এই কাজটি প্রকাশিত হয়েছে: ফ্রান্স।
আপনি যদি বিবেচনা করেন যে আমাদের ডেটা এমন উপাদান রয়েছে যা আপনার মালিকানাধীন এবং তাই এখানে পুনরুত্পাদন করা উচিত নয়, দয়া করে:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 58 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_new
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_new')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2126 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_oc
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_oc')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 6485 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_pam
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pam')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ps
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ps')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 67921 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_it
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_it')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 28522082 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ka
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ka')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 372158 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ro
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ro')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 5044757 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_scn
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_scn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 17 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ko
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ko')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3675420 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_kw
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_kw')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 68 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_lez
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lez')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1381 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_lrc
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lrc')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 72 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mg
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mg')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 13343 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ml
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ml')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 453904 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ms
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ms')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 183443 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_myv
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_myv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 5 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_nds
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nds')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 8714 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_nn
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 109118 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_os
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_os')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2559 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_pms
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pms')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2859 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_qu
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_qu')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 411 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sa
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sa')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 7121 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sk
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sk')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2820821 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sh
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sh')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 17610 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_so
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_so')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 42 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sr
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 645747 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ta
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ta')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 833101 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_tk
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tk')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 4694 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_tyv
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tyv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 24 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_uz
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_uz')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 15074 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_wa
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_wa')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 677 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_xmf
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_xmf')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2418 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sv
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 11014487 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_tg
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tg')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 56259 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_de
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_de')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 62398034 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_tr
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 11596446 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_el
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_el')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 6521169 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_uk
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_uk')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 7782375 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_vi
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_vi')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 9897709 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_wuu
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_wuu')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 64 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_yo
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_yo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 49 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_als
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_als')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 7324 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_arz
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_arz')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 158113 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_az
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_az')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 912330 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bcl
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_bcl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bn
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_bn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1675515 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bs
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_bs')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2143 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ce
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_ce')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 4042 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_cv
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_cv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 20281 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_diq
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_diq')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_eml
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_eml')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 84 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_et
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_et')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2093621 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_zh
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_zh')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 41708901 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_an
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_an')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2449 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ast
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_ast')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 6999 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ba
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_ba')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 42551 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bg
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_bg')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 5869686 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bpy
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_bpy')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 6046 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ca
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_ca')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 4390754 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ckb
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_ckb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 103639 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_es
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_es')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 56326016 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_da
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_da')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 7664010 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_dv
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_dv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 21018 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_eo
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_eo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 121168 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_fi
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_fi')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 5326443 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ga
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ga')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 46493 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_gom
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gom')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 484 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_hr
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 321484 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_hy
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_hy')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 396093 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ilo
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ilo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1578 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_fa
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_fa')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 13704702 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_fy
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_fy')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 33053 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_gn
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_gn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 106 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_hi
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_hi')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3264660 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_hu
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_hu')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 11197780 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ie
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_ie')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 101 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ja
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ja')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 39496439 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_kk
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_kk')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 338073 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_krc
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_krc')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1377 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ky
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ky')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 86561 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_li
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_li')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 118 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_lt
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lt')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1737411 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mhr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mhr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2515 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mn
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 197878 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mt')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 16383 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mzn
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mzn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 917 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ne
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ne')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 219334 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_no
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_no')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3229940 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_pa
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pa')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 87235 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_pnb
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pnb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3463 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_rm
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_rm')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 34 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sah
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sah')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 8555 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_si
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_si')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 120684 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sq
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sq')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 461598 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sw
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sw')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 24803 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_th
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_th')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3749826 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_tt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tt')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 82738 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ur
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ur')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 428674 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_vo
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_vo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3317 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_xal
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_xal')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 36 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_yue
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_yue')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 7 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_am
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_am')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 83663 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_as
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_as')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 14985 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_azb
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_azb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 15446 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_be
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_be')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 586031 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bo
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_bo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 26795 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bxr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_bxr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 42 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ceb
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ceb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 56248 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_cy
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_cy')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 157698 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_dsb
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_dsb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 65 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_fr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_fr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 96742378 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_gd
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_gd')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 5799 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_gu
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_gu')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 240691 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_hsb
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_hsb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 7959 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ia
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ia')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1040 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_io
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_io')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 694 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_jbo
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_jbo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 832 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_km
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_km')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 159363 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ku
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ku')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 46535 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_la
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_la')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 94588 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lmo
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_lmo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1401 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lv
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_lv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1593820 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_min
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_min')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 220 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_mr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 326804 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mwl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_mwl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 8 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_nah
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_nah')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 61 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_new
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_new')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 4696 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_oc
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_oc')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 10709 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_pam
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_pam')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ps
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ps')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 98216 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ro
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ro')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 9387265 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_scn
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_scn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 21 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sk
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_sk')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 5492194 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_sr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1013619 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ta
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ta')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1263280 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_tk
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_tk')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 6456 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_tyv
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_tyv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 34 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_uz
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_uz')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 27537 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_wa
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_wa')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1001 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_xmf
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_xmf')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3783 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_it
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_it')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 46981781 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ka
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ka')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 563916 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ko
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ko')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 7345075 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_kw
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_kw')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 203 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lez
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_lez')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1485 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lrc
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_lrc')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | ৮৮ |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mg
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_mg')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 17957 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ml
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ml')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 603937 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ms
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ms')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 534016 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_myv
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_myv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 6 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_nds
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_nds')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 18174 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_nn
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_nn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 185884 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_os
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_os')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 5213 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_pms
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_pms')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3225 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_qu
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_qu')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 452 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sa
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_sa')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 14291 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_sh')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 36700 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_so
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_so')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 156 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sv
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_sv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 17395625 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_tg
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_tg')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 89002 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_tr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_tr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 18535253 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_uk
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_uk')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 12973467 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_vi
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_vi')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 14898250 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_wuu
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_wuu')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 214 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_yo
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_yo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 214 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_zh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_zh')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 60137667 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_en')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 304230423 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_eu
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_eu')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 256513 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_frr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_frr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 7 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_gl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_gl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 284320 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_he
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_he')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2375030 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ht
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ht')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 9 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_id
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_id')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 9948521 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_is
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_is')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 389515 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_jv
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_jv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1163 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_kn
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_kn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 251064 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_kv
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_kv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 924 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_lb
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 21735 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_lo
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_lo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 32652 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mai
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mai')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 25 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mk
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mk')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 299457 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_mrj
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_mrj')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | ৬৬৯ |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_my
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_my')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 136639 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_nap
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nap')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 55 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_nl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_nl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 20812149 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_or
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_or')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 44230 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_pl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 20682611 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_pt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_pt')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 26920397 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ru
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ru')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 115954598 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sd
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sd')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | ৩৩৯২৫ |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_sl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_sl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 886223 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_su
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_su')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 511 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_te
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_te')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 312644 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_tl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_tl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 294132 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_ug
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_ug')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 15503 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_vec
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_vec')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 64 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_war
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_war')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 9161 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_deduplicated_yi
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_deduplicated_yi')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 32919 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_af
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_af')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 201117 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ar
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ar')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 16365602 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_av
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_av')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 456 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bar
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_bar')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 4 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_bh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_bh')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 336 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_br
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_br')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 37085 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_cbk
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_cbk')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_cs
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_cs')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 21001388 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_de
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_de')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 104913504 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_el
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_el')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 10425596 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_es
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_es')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 88199221 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_fi
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_fi')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 8557453 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ga
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ga')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 83223 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_gom
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_gom')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 640 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_hr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_hr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 582219 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_hy
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_hy')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 659430 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ilo
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ilo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2638 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ja
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ja')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 62721527 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_kk
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_kk')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 524591 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_krc
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_krc')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1581 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ky
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_ky')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 146993 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_li
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_li')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 137 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_lt')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 2977757 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mhr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_mhr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3212 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mn
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_mn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 395605 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_mt')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 26598 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mzn
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_mzn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1055 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ne
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ne')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 299938 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_no
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_no')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 5546211 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_pa
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_pa')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 127467 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_pnb
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_pnb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 4599 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_rm
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_rm')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 41 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sah
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_sah')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 22301 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_si
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_si')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 203082 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sq
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_sq')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 672077 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sw
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_sw')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 41986 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_th
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_th')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 6064129 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_tt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_tt')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 135923 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ur
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ur')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 638596 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_vo
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_vo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3366 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_xal
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_xal')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 39 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_yue
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_yue')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 11 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_en')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 455994980 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_eu
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:oscar/unshuffled_original_eu')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 506883 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_frr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_frr')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 7 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_gl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_gl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 544388 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_he
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_he')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 3808397 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ht
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ht')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 13 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_id
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_id')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 16236463 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_is
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_is')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 625673 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_jv
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_jv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1445 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_kn
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_kn')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 350363 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_kv
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_kv')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1549 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lb
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_lb')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 34807 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_lo
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_lo')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 52910 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mai
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_mai')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 123 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mk
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_mk')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 437871 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_mrj
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_mrj')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 757 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_my
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_my')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 232329 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_nap
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_nap')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 73 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_nl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_nl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 34682142 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_or
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_or')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 59463 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_pl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_pl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 35440972 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_pt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_pt')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 42114520 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ru
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ru')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 161836003 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sd
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_sd')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 44280 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_sl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_sl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1746604 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_su
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_su')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 805 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_te
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_te')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 475703 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_tl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_tl')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 458206 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_ug
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_ug')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 22255 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_vec
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_vec')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 73 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_war
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_war')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
Version : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 9760 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
unshuffled_original_yi
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:oscar/unshuffled_original_yi')
- বর্ণনা :
The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
License : These data are released under this licensing scheme We do not own any of the text from which these data has been extracted. We license the actual packaging of these data under the Creative Commons CC0 license ("no rights reserved") http://creativecommons.org/publicdomain/zero/1.0/ To the extent possible under law, Inria has waived all copyright and related or neighboring rights to OSCAR This work is published from: France.
Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please:
- Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted.
- Clearly identify the copyrighted work claimed to be infringed.
- Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.
We will comply to legitimate requests by removing the affected sources from the next release of the corpus.
সংস্করণ : 1.0.0
Splits :
বিভক্ত | উদাহরণ |
---|---|
'train' | 59364 |
- বৈশিষ্ট্য :
{
"id": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}