Skip to main content

GCCP Data Overview

Comprehensive overview of the GCCP dataset structure, schemas, and available data categories for researchers.

Introduction

The GCCP (Gies Consumer Credit Panel) dataset comprises a diverse range of datasets categorized according to their specific themes or intended applications. Segmented by borrower type, the GCCP dataset comprises both the Consumer Credit dataset and the Small Business (BIS) dataset. The Consumer Credit dataset can be additionally subdivided into Attributes, Clarity, and Tradelines.

DatasetCategoriesPeriodAvailable Datasets
AttributesConsumer Records (Annual Spend, Trade Types, Inquiries)2004-20221pct, 100pct
BISSmall Businesses2009-2020100pct
ClarityAlternative Financial Dataset2013-201910pct, 100pct
TradelinesCredit Accounts (Account Balance, Credit Limit, Actual Payment)2004-2022100pct

Important Notes

Data Availability Clarification

  • • While some datasets may suggest availability up to 2022, this does not imply that all tables contain data through 2022. To confirm the years available for a specific table, please review the year column.
  • • Please disregard the public schema during dataset navigation, as it is used exclusively for logging queries.
  • • The data_dictionary schema contains descriptions for available columns. While not all column descriptions are currently provided, we are actively working to expand the data dictionary coverage.

Dataset Relationships

The datasets are interconnected with the Attributes dataset. The table below illustrates their relationship with the Attributes dataset, along with highlighting some of their limitations.

DatasetRelationship with AttributesLimitations
AttributesNot ApplicableDemographic data starts from 2011, sparse dataset
BIS8% of borrowersSample selection & Survivorship bias
Clarity11% of borrowersDuplicate records, representativeness, self-reported

Dataset Details

1. Attributes

Consists of loan amounts, inquiries, credits, trades, public records, and delinquencies of customers from 2004 to 2018. Demographic details like age, gender, income, education, occupation, homeowner, children count etc are available from 2011.

Sample Types:

  • 1 pct: 1% sample from the 100% pct dataset
  • 100 pct: Representative random sample generated from the Experian Consumer Attributes dataset for UIUC

The 100 pct sample is a representative sample of 2M to 3M US borrowers per year.

2. BIS

Experian's Small Business dataset (BIS) provides valuable insights into small and midsize businesses. It encompasses data concerning business credit scores spanning from 2009 to 2020.

3. Clarity

Clarity is dedicated to data reporting for under-banked, near prime, and subprime consumer segments during the period from 2013 to 2019. The data for Clarity is sourced from diverse financial service providers, encompassing online small dollar lenders, online installment lenders, single payment, line of credit, storefront small dollar lenders, auto title lenders, and rent-to-own establishments.

Available Subsets:

  • 10 pct: 10% sample
  • 100 pct: Full, comprehensive sample

4. Tradelines

In the context of Experian dataset, tradelines provide critical insights into a consumer's credit behavior, including information about the account type, outstanding balance, payment history, and status (e.g., open, closed, in good standing, or delinquent). This data plays a significant role in determining credit scores, assessing credit risk, and making informed lending decisions.

The GCCP dataset can also be alternatively categorized based on the quality of loans (prime vs. subprime) in addition to borrower type. The GCCP consists of two main components: the Experian umbrella dataset (containing prime loans) and Clarity (comprising subprime loans). The Experian umbrella dataset can be subdivided further into Experian Consumers (Attributes, Tradelines & Inquiries) and Experian Small Businesses (BIS).

Database Schema Overview

GCCP Database Schema Overview

The above diagram succinctly presents an overview of the various schemas housed within the dsrsdb database, namely Attributes, BIS, Clarity, and Tradelines.

Note: While navigating the GCCP dataset in the PostgreSQL database, you may encounter the 'Public' schema, which is designated for database maintenance purposes and can be ignored.

Schema Tables & Attributes

1. Attributes Schema

Tables in Attributes (8)Significant Attributes
attributes_1pctpremier, eirc, taps
attributes_100pctpremier, eirc, taps
eirc_mort_2019_2022eirc_mtf, eirc_mts
eirc_rev_2019_2022eirc_rv
premiers_view_abpremier_v1_2
premiers_view_cdpremier_v1_2
scoresdemos_2019_2022-
taps4_2019_2022taps

2. BIS Schema

Tables in BIS (4)Significant Attributes
biz_aggs-
input_seqnum-
mar_seqtot_trd, tot_bal, tot_limit
score_crdbscorefactor

3. Clarity Schema

Tables in Clarity (4)Significant Attributes
inquiries_10pct-
inquiries_100pct-
tradelines_10pct-
tradelines_100pct-

4. Tradelines Schema

Tables in Tradelines (6)Significant Attributes
inquiry_2004_2018-
inquiry_2019_2022-
public_2004_2018-
public_2019_2022-
tl_2004_2010acct, delq
tl_2010_2018acct, delq, acct_balance_am, acct_credit_limit_am, actual_pymt_am, eirc, taps, last_payment_dt