오지

설명 :

대화형 AI 안전성 평가( DICES ) 데이터 세트의 다양성

기계 학습 접근 방식은 긍정적인 사례와 부정적인 사례를 명확하게 구분해야 하는 데이터 세트를 사용하여 훈련되고 평가되는 경우가 많습니다. 이 접근 방식은 많은 작업과 콘텐츠 항목에 존재하는 자연스러운 주관성을 지나치게 단순화합니다. 이는 또한 인간의 인식과 의견에 내재된 다양성을 모호하게 만듭니다. 인간의 콘텐츠와 다양성의 다양성을 보존하려는 작업은 종종 비용이 많이 들고 힘든 작업입니다. 이러한 격차를 메우고 보다 심층적인 모델 성능 분석을 촉진하기 위해 우리는 AI 생성 대화의 안전에 대한 다양한 관점을 갖춘 고유한 데이터세트인 DICES 데이터세트를 제안합니다. 대화형 AI 시스템의 안전성 평가 업무에 중점을 두고 있습니다. DICES 데이터 세트에는 각 평가자에 대한 자세한 인구통계 정보가 포함되어 있으며 대화당 고유 평가의 매우 높은 복제가 포함되어 추가 분석의 통계적 중요성을 보장하고 평가자 투표를 다양한 인구통계에 대한 분포로 인코딩하여 다양한 평가 집계 전략을 심층적으로 탐색할 수 있습니다.

이 데이터세트는 대화형 AI의 안전성 측면에서 분산, 모호함, 다양성을 관찰하고 측정하는 데 매우 적합합니다. 데이터 세트에는 평가자의 다양성이 다양한 지리적 지역, 인종 그룹, 연령 그룹 및 성별의 평가자의 안전 인식에 어떻게 영향을 미치는지 보여주는 일련의 측정항목을 설명하는 논문이 함께 제공됩니다. DICES 데이터 세트의 목표는 대화형 AI 시스템의 안전성 평가를 위한 공유 벤치마크로 사용되는 것입니다.

콘텐츠 경고 : 이 데이터세트에는 불쾌감을 줄 수 있는 적대적인 대화의 예가 포함되어 있습니다.

홈페이지 : https://github.com/google-research-datasets/dices-dataset
소스 코드 : tfds.datasets.dices.Builder
버전 :
- 1.0.0 (기본값): 최초 릴리스입니다.
감독되는 키 ( as_supervised doc 참조): None
그림 ( tfds.show_examples ): 지원되지 않습니다.
인용 :

@article{aroyo2024dices,
  title={ {DICES} dataset: Diversity in conversational {AI} evaluation for safety},
  author={Aroyo, Lora and Taylor, Alex and Diaz, Mark and Homan, Christopher and Parrish, Alicia and Serapio-Garc{\'\i}a, Gregory and Prabhakaran, Vinodkumar and Wang, Ding},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

오지/350(기본 구성)

구성 설명 : 데이터 세트 350에는 123명의 고유 평가자로 구성된 다양한 평가자 풀에서 평가한 350개의 대화가 포함되어 있습니다. 각 대화는 5개의 안전 최상위 범주와 대화에 대한 하나의 전반적인 이해 질문으로 평가됩니다. 평가자는 성별(남성 또는 여성), 인종/민족(백인, 흑인, 라틴계, 아시아인, 다인종)별로 균형있게 모집되었으며 각 평가자는 모든 항목을 평가했습니다. 각 평가자는 모든 대화를 평가했습니다. 각 대화에는 123개의 고유한 평가가 있습니다. 이 데이터 세트의 총 행 수는 43050입니다.
다운로드 크기 : 29.70 MiB
데이터세트 크기 : 74.43 MiB
자동 캐시 ( 문서 ): 예
분할 :

나뉘다	예
`'train'`	43,050

기능 구조 :

FeaturesDict({
    'Q1_whole_conversation_evaluation': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'Q2_harmful_content_dangerous_content': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_derogation_of_boat': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_insensitive': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_medical_legal_financial_or_relationship_advice': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_obscene_and_profane': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_personal_information': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_regulated_goods': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_sexually_suggestive_content': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_beliefs': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_gender_sexual_orientation': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_inherited_attributes': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_status': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q5_political_affiliation': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_making_endorsement': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_polarizing_topics': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'answer_time_ms': Scalar(shape=(), dtype=int64, description=Amount of time spent by each rater on each safety annotation question.),
    'answer_timestamp': Scalar(shape=(), dtype=int64, description=Time when each conversation was rated by each rater.),
    'context': Text(shape=(), dtype=string),
    'degree_of_harm': ClassLabel(shape=(), dtype=int64, num_classes=5),
    'harm_type': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=32)),
    'id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each row, representing all ratings by a single rater to a single conversation.),
    'item_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each conversation.),
    'phase': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_age': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_education': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_gender': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'rater_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each rater.),
    'rater_race': ClassLabel(shape=(), dtype=int64, num_classes=6),
    'rater_raw_race': Text(shape=(), dtype=string),
    'response': Text(shape=(), dtype=string),
    'safety_gold': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'safety_gold_reason': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	모양	Dtype	설명
	특징Dict
Q1_전체_대화_평가	클래스 라벨		정수64	대화의 이해 가능성을 평가합니다.
Q2_유해한_콘텐츠_위험한_콘텐츠	클래스 라벨		정수64
Q2_harmful_content_derogation_of_boat	클래스 라벨		정수64
Q2_유해한_콘텐츠_무감각	클래스 라벨		정수64
Q2_harmful_content_medical_legal_financial_or_relationship_advice	클래스 라벨		정수64
Q2_harmful_content_obscene_and_profane	클래스 라벨		정수64
Q2_유해한_콘텐츠_기타	클래스 라벨		정수64
Q2_유해한_콘텐츠_전체	클래스 라벨		정수64
Q2_유해한_콘텐츠_개인_정보	클래스 라벨		정수64
Q2_유해한_함량_규제_상품	클래스 라벨		정수64
Q2_유해한_콘텐츠_성적으로_암시적인_콘텐츠	클래스 라벨		정수64
Q3_bias_other	클래스 라벨		정수64
Q3_bias_overall	클래스 라벨		정수64
Q3_bias_targeting_beliefs	클래스 라벨		정수64
Q3_bias_targeting_gender_성적 지향	클래스 라벨		정수64
Q3_bias_targeting_inherited_attributes	클래스 라벨		정수64
Q3_bias_targeting_status	클래스 라벨		정수64
Q4_잘못된 정보	클래스 라벨		정수64
Q5_정치_소속	클래스 라벨		정수64
Q6_policy_guidelines_making_endorsement	클래스 라벨		정수64
Q6_policy_guidelines_other	클래스 라벨		정수64
Q6_정책_가이드라인_전체	클래스 라벨		정수64
Q6_policy_guidelines_ polarizing_topics	클래스 라벨		정수64
Q_전체	클래스 라벨		정수64
답변시간_ms	스칼라		정수64	각 안전 주석 질문에 대해 각 평가자가 소비한 시간입니다.
답변_타임스탬프	스칼라		정수64	각 평가자가 각 대화를 평가한 시간입니다.
문맥	텍스트		끈	최종 챗봇 응답 전에 대화가 전환됩니다.
피해 정도	클래스 라벨		정수64	안전 위험의 심각도를 직접 주석으로 평가합니다.
피해 유형	시퀀스(클래스 라벨)	(없음,)	정수64	대화의 피해 주제에 손으로 주석을 달았습니다.
ID	스칼라		정수64	단일 대화에 대한 단일 평가자의 모든 평가를 나타내는 각 행의 숫자 식별자입니다.
item_id	스칼라		정수64	각 대화의 숫자 식별자입니다.
단계	클래스 라벨		정수64	서로 다른 세 가지 기간 중 하나입니다.
평가자_연령	클래스 라벨		정수64	평가자의 연령 그룹입니다.
평가자_교육	클래스 라벨		정수64	평가자의 교육.
평가자_성별	클래스 라벨		정수64	평가자의 성별입니다.
평가자_ID	스칼라		정수64	각 평가자의 숫자 식별자입니다.
평가자_경주	클래스 라벨		정수64	평가자의 인종/민족입니다.
rater_raw_race	텍스트		끈	5개 범주로 단순화되기 전의 자체 보고된 평가자의 원시 인종/민족입니다.
응답	텍스트		끈	대화의 최종 챗봇 응답입니다.
안전_골드	클래스 라벨		정수64	전문가가 제공하는 최적의 안전 라벨입니다.
안전_금_이유	텍스트		끈	전문가가 제공한 금색 안전 라벨의 이유(제공된 경우)입니다.

예 ( tfds.as_dataframe ):

오지/990

구성 설명 : 데이터 세트 990에는 173명의 고유 평가자로 구성된 다양한 평가자 풀에서 평가한 990개의 대화가 포함되어 있습니다. 각 대화는 세 가지 안전 최상위 범주와 하나의 전체 대화 이해 질문으로 평가됩니다. 각 대화의 평가자 수가 성별(남성, 여성)과 지역(미국, 인도)에 따라 균형을 이루도록 평가자를 모집했습니다. 각 평가자는 대화의 샘플만을 평가했습니다. 각 대화에는 60~70개의 고유 평가가 있습니다. 이 데이터 세트의 총 행 수는 72103입니다.
다운로드 크기 : 48.06 MiB
데이터세트 크기 : 150.38 MiB
자동 캐시 ( 문서 ): shuffle_files=False 인 경우에만(학습)
분할 :

나뉘다	예
`'train'`	72,103

기능 구조 :

FeaturesDict({
    'Q1_whole_conversation_evaluation': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'Q2_harmful_content_explicitly_sexual': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_financial_advice': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_harm_to_people_or_animals': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_health_and_safety_claims': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_how_tos_about_harmful_activities': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_obscenities_or_profanities': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_personal_information': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_promotes_or_condones_use_of_harmful_regulated_goods': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_promotes_or_condones_violence': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_violent_or_gory': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_3_Racial_religious_or_other_slurs_and_stereotypes_that_incites_or_promote_hatred_based_on_any_of_the_attributes_noted_above': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_calling_individuals_or_groups_evil_corrupt_or_malicious': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_incites_hatred': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_insults_about_emotional_romantic_and_or_sexual_attraction': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_physically_or_mentally_inferior': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_subhuman_comparisons': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_superiority_or_subjugation_of_a_group': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_unfair_bias_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_conspiracy_theories': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_contradicts_expert_consensus': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_false_theories': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_political': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'context': Text(shape=(), dtype=string),
    'degree_of_harm': ClassLabel(shape=(), dtype=int64, num_classes=5),
    'harm_type': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=32)),
    'id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each row, representing all ratings by a single rater to a single conversation.),
    'item_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each conversation.),
    'phase': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_age': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_education': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_gender': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'rater_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each rater.),
    'rater_locale': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'rater_race': ClassLabel(shape=(), dtype=int64, num_classes=6),
    'rater_raw_race': Text(shape=(), dtype=string),
    'response': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	모양	Dtype	설명
	특징Dict
Q1_전체_대화_평가	클래스 라벨		정수64	대화의 이해 가능성을 평가합니다.
Q2_유해한_콘텐츠_명시적으로_성적인	클래스 라벨		정수64
Q2_harmful_content_financial_advice	클래스 라벨		정수64
Q2_harmful_content_harm_to_people_or_animals	클래스 라벨		정수64
Q2_harmful_content_health_and_safety_claims	클래스 라벨		정수64
Q2_harmful_content_how_tos_about_harmful_activities	클래스 라벨		정수64
Q2_harmful_content_obscenities_or_profanities	클래스 라벨		정수64
Q2_유해한_콘텐츠_기타	클래스 라벨		정수64
Q2_유해한_콘텐츠_전체	클래스 라벨		정수64
Q2_유해한_콘텐츠_개인_정보	클래스 라벨		정수64
Q2_harmful_content_promotes_or_condones_use_of_harmful_regulated_goods	클래스 라벨		정수64
Q2_harmful_content_promotes_or_condones_violence	클래스 라벨		정수64
Q2_harmful_content_violent_or_gory	클래스 라벨		정수64
Q3_3_Racial_religious_or_other_slurs_and_stereotypes_that_incites_or_promote_hatred_based_on_any_of_the_attributes_noted_above	클래스 라벨		정수64
Q3_bias_calling_individuals_or_groups_evil_corrupt_or_malicious	클래스 라벨		정수64
Q3_bias_incites_hatred	클래스 라벨		정수64
Q3_bias_insults_about_emotional_romantic_and_or_sexual_attraction	클래스 라벨		정수64
Q3_bias_other	클래스 라벨		정수64
Q3_편향_신체적으로_또는_정신적으로_열등함	클래스 라벨		정수64
Q3_bias_subhuman_comparisons	클래스 라벨		정수64
Q3_bias_superiority_or_subjugation_of_a_group	클래스 라벨		정수64
Q3_불공정_편향_전체	클래스 라벨		정수64
Q4_잘못된 정보_음모_이론	클래스 라벨		정수64
Q4_잘못된 정보_contradicts_expert_consensus	클래스 라벨		정수64
Q4_잘못된 정보_거짓_이론	클래스 라벨		정수64
Q4_잘못된 정보_기타	클래스 라벨		정수64
Q4_잘못된 정보_전체	클래스 라벨		정수64
Q4_잘못된 정보_정치적	클래스 라벨		정수64
Q_전체	클래스 라벨		정수64
문맥	텍스트		끈	대화는 최종 챗봇 응답 전에 전환됩니다.
피해 정도	클래스 라벨		정수64	안전 위험의 심각도를 직접 주석으로 평가합니다.
피해 유형	시퀀스(클래스 라벨)	(없음,)	정수64	대화의 피해 주제에 손으로 주석을 달았습니다.
ID	스칼라		정수64	단일 대화에 대한 단일 평가자의 모든 평가를 나타내는 각 행의 숫자 식별자입니다.
item_id	스칼라		정수64	각 대화의 숫자 식별자입니다.
단계	클래스 라벨		정수64	서로 다른 세 가지 기간 중 하나입니다.
평가자_연령	클래스 라벨		정수64	평가자의 연령 그룹입니다.
평가자_교육	클래스 라벨		정수64	평가자의 교육.
평가자_성별	클래스 라벨		정수64	평가자의 성별입니다.
평가자_ID	스칼라		정수64	각 평가자의 숫자 식별자입니다.
rater_locale	클래스 라벨		정수64	평가자의 로캘입니다.
평가자_경주	클래스 라벨		정수64	평가자의 인종/민족입니다.
rater_raw_race	텍스트		끈	5개 범주로 단순화되기 전의 자체 보고된 평가자의 원시 인종/민족입니다.
응답	텍스트		끈	대화의 최종 챗봇 응답입니다.

예 ( tfds.as_dataframe ):

오지 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

대화형 AI 안전성 평가( DICES ) 데이터 세트의 다양성

오지/350(기본 구성)

오지/990

오지