GCCP Data Overview
Comprehensive overview of the GCCP dataset structure, schemas, and available data categories for researchers.
Introduction
The GCCP (Gies Consumer Credit Panel) dataset comprises a diverse range of datasets categorized according to their specific themes or intended applications. Segmented by borrower type, the GCCP dataset comprises both the Consumer Credit dataset and the Small Business (BIS) dataset. The Consumer Credit dataset can be additionally subdivided into Attributes, Clarity, and Tradelines.
Dataset | Categories | Period | Available Datasets |
---|---|---|---|
Attributes | Consumer Records (Annual Spend, Trade Types, Inquiries) | 2004-2022 | 1pct, 100pct |
BIS | Small Businesses | 2009-2020 | 100pct |
Clarity | Alternative Financial Dataset | 2013-2019 | 10pct, 100pct |
Tradelines | Credit Accounts (Account Balance, Credit Limit, Actual Payment) | 2004-2022 | 100pct |
Important Notes
Data Availability Clarification
- • While some datasets may suggest availability up to 2022, this does not imply that all tables contain data through 2022. To confirm the years available for a specific table, please review the
year
column. - • Please disregard the
public
schema during dataset navigation, as it is used exclusively for logging queries. - • The
data_dictionary
schema contains descriptions for available columns. While not all column descriptions are currently provided, we are actively working to expand the data dictionary coverage.
Dataset Relationships
The datasets are interconnected with the Attributes dataset. The table below illustrates their relationship with the Attributes dataset, along with highlighting some of their limitations.
Dataset | Relationship with Attributes | Limitations |
---|---|---|
Attributes | Not Applicable | Demographic data starts from 2011, sparse dataset |
BIS | 8% of borrowers | Sample selection & Survivorship bias |
Clarity | 11% of borrowers | Duplicate records, representativeness, self-reported |
Dataset Details
1. Attributes
Consists of loan amounts, inquiries, credits, trades, public records, and delinquencies of customers from 2004 to 2018. Demographic details like age, gender, income, education, occupation, homeowner, children count etc are available from 2011.
Sample Types:
- • 1 pct: 1% sample from the 100% pct dataset
- • 100 pct: Representative random sample generated from the Experian Consumer Attributes dataset for UIUC
The 100 pct sample is a representative sample of 2M to 3M US borrowers per year.
2. BIS
Experian's Small Business dataset (BIS) provides valuable insights into small and midsize businesses. It encompasses data concerning business credit scores spanning from 2009 to 2020.
3. Clarity
Clarity is dedicated to data reporting for under-banked, near prime, and subprime consumer segments during the period from 2013 to 2019. The data for Clarity is sourced from diverse financial service providers, encompassing online small dollar lenders, online installment lenders, single payment, line of credit, storefront small dollar lenders, auto title lenders, and rent-to-own establishments.
Available Subsets:
- • 10 pct: 10% sample
- • 100 pct: Full, comprehensive sample
4. Tradelines
In the context of Experian dataset, tradelines provide critical insights into a consumer's credit behavior, including information about the account type, outstanding balance, payment history, and status (e.g., open, closed, in good standing, or delinquent). This data plays a significant role in determining credit scores, assessing credit risk, and making informed lending decisions.
The GCCP dataset can also be alternatively categorized based on the quality of loans (prime vs. subprime) in addition to borrower type. The GCCP consists of two main components: the Experian umbrella dataset (containing prime loans) and Clarity (comprising subprime loans). The Experian umbrella dataset can be subdivided further into Experian Consumers (Attributes, Tradelines & Inquiries) and Experian Small Businesses (BIS).
Database Schema Overview

The above diagram succinctly presents an overview of the various schemas housed within the dsrsdb database, namely Attributes, BIS, Clarity, and Tradelines.
Note: While navigating the GCCP dataset in the PostgreSQL database, you may encounter the 'Public' schema, which is designated for database maintenance purposes and can be ignored.
Schema Tables & Attributes
1. Attributes Schema
Tables in Attributes (8) | Significant Attributes |
---|---|
attributes_1pct | premier, eirc, taps |
attributes_100pct | premier, eirc, taps |
eirc_mort_2019_2022 | eirc_mtf, eirc_mts |
eirc_rev_2019_2022 | eirc_rv |
premiers_view_ab | premier_v1_2 |
premiers_view_cd | premier_v1_2 |
scoresdemos_2019_2022 | - |
taps4_2019_2022 | taps |
2. BIS Schema
Tables in BIS (4) | Significant Attributes |
---|---|
biz_aggs | - |
input_seqnum | - |
mar_seq | tot_trd, tot_bal, tot_limit |
score_crdb | scorefactor |
3. Clarity Schema
Tables in Clarity (4) | Significant Attributes |
---|---|
inquiries_10pct | - |
inquiries_100pct | - |
tradelines_10pct | - |
tradelines_100pct | - |
4. Tradelines Schema
Tables in Tradelines (6) | Significant Attributes |
---|---|
inquiry_2004_2018 | - |
inquiry_2019_2022 | - |
public_2004_2018 | - |
public_2019_2022 | - |
tl_2004_2010 | acct, delq |
tl_2010_2018 | acct, delq, acct_balance_am, acct_credit_limit_am, actual_pymt_am, eirc, taps, last_payment_dt |