Skip to main content

GCCP Data Overview

Introduction

The GCCP (Gies Consumer Credit Panel) dataset, comprises a diverse range of datasets categorized according to their specific themes or intended applications.

Segmented by borrower type, the GCCP dataset comprises both the Consumer Credit dataset and the Small Business (BIS) dataset. The Consumer Credit dataset can be additionally subdivided into Attributes, Clarity, and Tradelines.

DatasetCategoriesPeriodAvailable Datasets
AttributesConsumer Records (Annual Spend, Trade Types, Inquiries)2004-2022*1pct, 100pct
BISSmall Businesses2009-2020*100pct
ClarityAlternative Financial Dataset2013-2019*10pct, 100pct
TradelinesCredit Accounts (Account Balance, Credit Limit, Actual Payment)2004-2022100pct
note
Data Availability Clarification

While some datasets may suggest availability up to 2022, this does not imply that all tables contain data through 2022. To confirm the years available for a specific table, please review the year column.

Please disregard the public schema during dataset navigation, as it is used exclusively for logging queries.

The data_dictionary schema contains descriptions for available columns. While not all column descriptions are currently provided, we are actively working to expand the data dictionary coverage.

The datasets are interconnected with the Attributes dataset. The table below illustrates their relationship with the Attributes dataset, along with highlighting some of their limitations.

DatasetRelationship with AttributesLimitations
AttributesNot ApplicableDemographic data starts from 2011, sparse dataset
BIS8% of borrowersSample selection & Survivorship bias
Clarity11% of borrowersDuplicate records, representativeness, self-reported

Below is an introductory overview of the datasets aimed at researchers who are in the initial stages of exploration:

1. Attributes

Consists of loan amounts, inquiries, credits, trades, public records, and delinquencies of customers from 2004 to 2018. Demographic details like age, gender, income, education, occupation, homeowner, children count etc are available from 2011.

1 pct - 1% sample from the 100% pct dataset. 100 pct - Representative random sample generated from the Experian Consumer Attributes dataset for UIUC.

100 pct sample is a representative sample of 2M to 3M US borrowers per year.

The dataset includes information such as loan amounts, inquiries, credits, trades, public records, and delinquencies of customers spanning from 2004 to 2018. Additionally, demographic information such as age, gender, income, education, occupation, homeowner status, and the count of children is available starting from the year 2011.

The terms "1 pct" and "100 pct" refer to specific subsets of the dataset:

"1 pct" represents a 1% sample extracted from the 100% dataset.

"100 pct" denotes a representative random sample derived from the Experian Consumer Attributes dataset specifically for UIUC.

The "100 pct" sample is an inclusive representation, encompassing 2 million to 3 million borrowers in the United States annually.

Furthermore, specific tables include years in their names. For example, the table eirc_rev_2019_2022 pertains to estimated interest rate calculations for revolving credit from 2019 to 2022. Similarly, taps4_2019_2022 relates to total annual plastic spending during the same period.

2. BIS

Experian's Small Business dataset (BIS) provides valuable insights into small and midsize businesses. It encompasses data concerning business credit scores spanning from 2009 to 2020.

3. Clarity

Clarity is dedicated to data reporting for under-banked, near prime, and subprime consumer segments during the period from 2013 to 2019. The data for Clarity is sourced from diverse financial service providers, encompassing online small dollar lenders, online installment lenders, single payment, line of credit, storefront small dollar lenders, auto title lenders, and rent-to-own establishments. The Clarity dataset comprises both inquiries and tradelines.

There are two specific subsets within the Clarity dataset: • "10 pct" denotes a 10% sample. • "100 pct" represents a full, comprehensive sample.

4. Tradelines:

In the context of Experian dataset, tradelines provide critical insights into a consumer's credit behavior, including information about the account type, outstanding balance, payment history, and status (e.g., open, closed, in good standing, or delinquent). This data plays a significant role in determining credit scores, assessing credit risk, and making informed lending decisions.

The GCCP dataset can also be alternatively categorized based on the quality of loans (prime vs. subprime) in addition to borrower type. The GCCP consists of two main components: the Experian umbrella dataset (containing prime loans) and Clarity (comprising subprime loans). The Experian umbrella dataset can be subdivided further into Experian Consumers (Attributes, Tradelines & Inquiries) and Experian Small Businesses (BIS).

Database Schema Overview

The above diagram succinctly presents an overview of the various schemas housed within the dsrsdb database, namely Attributes, BIS, Clarity, and Tradelines.

note

While navigating the GCCP dataset in the PostgreSQL database, you may encounter the 'Public' schema, which is designated for database maintenance purposes and can be ignored.

Here are the list of tables present in a corresponding schema and their corresponding significant attributes grouping wherever applicable.

1. Attributes

Tables in Attributes (8)Significant Attributes
attributes_1pctpremier, eirc, taps
attributes_100pctpremier, eirc, taps
eirc_mort_2019_2022eirc_mtf, eirc_mts
eirc_rev_2019_2022eirc_rv
premiers_view_abpremier_v1_2
premiers_view_cdpremier_v1_2
scoresdemos_2019_2022
taps4_2019_2022taps

2. BIS

Tables in BIS (4)Significant Attributes
biz_aggs
input_seqnum
mar_seqtot_trd, tot_bal, tot_limit
score_crdbscorefactor

3. Clarity

Tables in Clarity (4)Significant Attributes
inquiries_10pct
inquiries_100pct
tradelines_10pct
tradelines_100pct

4. Tradelines

Tables in Tradelines (6)Significant Attributes
inquiry_2004_2018
inquiry_2019_2022
public_2004_2018
public_2019_2022
tl_2004_2010acct, delq
tl_2010_2018acct, delq, acct_balance_am, acct_credit_limit_am, actual_pymt_am, eirc, taps, last_payment_dt