GCCP Data Overview
Introduction
The GCCP (Gies Consumer Credit Panel) dataset, comprises a diverse range of datasets categorized according to their specific themes or intended applications.
Segmented by borrower type, the GCCP dataset comprises both the Consumer Credit dataset and the Small Business (BIS) dataset. The Consumer Credit dataset can be additionally subdivided into Attributes, Clarity, and Tradelines.
Dataset | Categories | Period | Available Datasets |
---|---|---|---|
Attributes | Consumer Records (Annual Spend, Trade Types, Inquiries) | 2004-2022 | *1pct, 100pct |
BIS | Small Businesses | 2009-2020 | *100pct |
Clarity | Alternative Financial Dataset | 2013-2019 | *10pct, 100pct |
Tradelines | Credit Accounts (Account Balance, Credit Limit, Actual Payment) | 2004-2022 | 100pct |
Data Availability Clarification
While some datasets may suggest availability up to 2022, this does not imply that all tables contain data through 2022. To confirm the years available for a specific table, please review the year
column.
Please disregard the public
schema during dataset navigation, as it is used exclusively for logging queries.
The data_dictionary
schema contains descriptions for available columns. While not all column descriptions are currently provided, we are actively working to expand the data dictionary coverage.
The datasets are interconnected with the Attributes dataset. The table below illustrates their relationship with the Attributes dataset, along with highlighting some of their limitations.
Dataset | Relationship with Attributes | Limitations |
---|---|---|
Attributes | Not Applicable | Demographic data starts from 2011, sparse dataset |
BIS | 8% of borrowers | Sample selection & Survivorship bias |
Clarity | 11% of borrowers | Duplicate records, representativeness, self-reported |
Below is an introductory overview of the datasets aimed at researchers who are in the initial stages of exploration:
1. Attributes
Consists of loan amounts, inquiries, credits, trades, public records, and delinquencies of customers from 2004 to 2018. Demographic details like age, gender, income, education, occupation, homeowner, children count etc are available from 2011.
1 pct - 1% sample from the 100% pct dataset. 100 pct - Representative random sample generated from the Experian Consumer Attributes dataset for UIUC.
100 pct sample is a representative sample of 2M to 3M US borrowers per year.
The dataset includes information such as loan amounts, inquiries, credits, trades, public records, and delinquencies of customers spanning from 2004 to 2018. Additionally, demographic information such as age, gender, income, education, occupation, homeowner status, and the count of children is available starting from the year 2011.
The terms "1 pct" and "100 pct" refer to specific subsets of the dataset:
"1 pct" represents a 1% sample extracted from the 100% dataset.
"100 pct" denotes a representative random sample derived from the Experian Consumer Attributes dataset specifically for UIUC.
The "100 pct" sample is an inclusive representation, encompassing 2 million to 3 million borrowers in the United States annually.
Furthermore, specific tables include years in their names. For example, the table eirc_rev_2019_2022 pertains to estimated interest rate calculations for revolving credit from 2019 to 2022. Similarly, taps4_2019_2022 relates to total annual plastic spending during the same period.
2. BIS
Experian's Small Business dataset (BIS) provides valuable insights into small and midsize businesses. It encompasses data concerning business credit scores spanning from 2009 to 2020.
3. Clarity
Clarity is dedicated to data reporting for under-banked, near prime, and subprime consumer segments during the period from 2013 to 2019. The data for Clarity is sourced from diverse financial service providers, encompassing online small dollar lenders, online installment lenders, single payment, line of credit, storefront small dollar lenders, auto title lenders, and rent-to-own establishments. The Clarity dataset comprises both inquiries and tradelines.
There are two specific subsets within the Clarity dataset: • "10 pct" denotes a 10% sample. • "100 pct" represents a full, comprehensive sample.
4. Tradelines:
In the context of Experian dataset, tradelines provide critical insights into a consumer's credit behavior, including information about the account type, outstanding balance, payment history, and status (e.g., open, closed, in good standing, or delinquent). This data plays a significant role in determining credit scores, assessing credit risk, and making informed lending decisions.
The GCCP dataset can also be alternatively categorized based on the quality of loans (prime vs. subprime) in addition to borrower type. The GCCP consists of two main components: the Experian umbrella dataset (containing prime loans) and Clarity (comprising subprime loans). The Experian umbrella dataset can be subdivided further into Experian Consumers (Attributes, Tradelines & Inquiries) and Experian Small Businesses (BIS).
Database Schema Overview
The above diagram succinctly presents an overview of the various schemas housed within the dsrsdb database, namely Attributes, BIS, Clarity, and Tradelines.
While navigating the GCCP dataset in the PostgreSQL database, you may encounter the 'Public' schema, which is designated for database maintenance purposes and can be ignored.
Here are the list of tables present in a corresponding schema and their corresponding significant attributes grouping wherever applicable.
1. Attributes
Tables in Attributes (8) | Significant Attributes |
---|---|
attributes_1pct | premier, eirc, taps |
attributes_100pct | premier, eirc, taps |
eirc_mort_2019_2022 | eirc_mtf, eirc_mts |
eirc_rev_2019_2022 | eirc_rv |
premiers_view_ab | premier_v1_2 |
premiers_view_cd | premier_v1_2 |
scoresdemos_2019_2022 | |
taps4_2019_2022 | taps |
2. BIS
Tables in BIS (4) | Significant Attributes |
---|---|
biz_aggs | |
input_seqnum | |
mar_seq | tot_trd, tot_bal, tot_limit |
score_crdb | scorefactor |
3. Clarity
Tables in Clarity (4) | Significant Attributes |
---|---|
inquiries_10pct | |
inquiries_100pct | |
tradelines_10pct | |
tradelines_100pct |
4. Tradelines
Tables in Tradelines (6) | Significant Attributes |
---|---|
inquiry_2004_2018 | |
inquiry_2019_2022 | |
public_2004_2018 | |
public_2019_2022 | |
tl_2004_2010 | acct, delq |
tl_2010_2018 | acct, delq, acct_balance_am, acct_credit_limit_am, actual_pymt_am, eirc, taps, last_payment_dt |