VerityData API – Technical Details¶
General Overview¶
The VerityData API allows authorized users to query directly from VerityData's vast database of both processed and unprocessed SEC filings.
This documentation outlines API usage instructions, filing structure, response fields, and also touches on how filings are internally processed for accuracy.
Additional information and sample API calls can be found at https://www.infilings.com/api/.
VerityData Coverage Summary¶
Filing Type | Structured Filings | Structured Sections |
---|---|---|
10-K | 40,000+ | 13,000,000+ |
10-Q | 100,000+ | 15,000,000+ |
IPO Filing | 5,000+ | 3,500,000+ |
Proxy | 30,000+ | 4,250,000+ |
Our coverage includes over 5,500 companies, and includes most 10-Ks filed since 2013, and most IPOs since 2016.
Pre-authorization Requirements¶
Accessing the VerityData API requires an authorization token provided by VerityData. Please contact your Customer Success Manager for information and pricing related to obtaining a token.
Format¶
API requests must be submitted using the POST method with a JSON payload to: https://www.infilings.com/api/api.php.
Requests must include the user's numerical VerityData id, authorization token, action, and any arguments required for the action.
API Requests¶
The following details various response objects and "core" actions available via the API, along with their respective argument options. Information on additional actions can be found at: https://www.infilings.com/api/api.php.
Response Objects¶
Field | Type | Description |
---|---|---|
status | string | Status of the response: ok: No issues in the response. fail: Response failed. partial: Filing within the response encountered an error. MultiFilingContent and Search actions only |
filings | array | Object containing data for the individual filings included in the response. |
incompleteData | boolean | Shows True if the request encountered an error on a single filing but still includes complete data for preceding filings included in the request. In this case, the last filing in the response encountered an error and has incomplete data. MultiFilingContent and Search actions only. |
reason | string | Description of error, if error occurred. |
Filings Object¶
Field | Type | Description |
---|---|---|
ticker | string | |
CUSIP | string | |
companyname | string | |
iacc | int | VerityData identifer for the filing. |
cik | int | SEC CIK for the filer. |
allfilerciks | int[] | All CIK's included in sec filing submission header. |
accessionnum | string | EDGAR accession number. |
formtype | string | |
datefiled | date | |
filedtimestamp | timestamp | Date & time when the filing was received by the SEC. |
received | timestamp | Date & time when the filing was received by VerityData. |
spotapproved | boolean | True if the filing has been processed and structured by VerityData. |
spotapproveddate | timestamp | Date & time when filing was approved. |
seclink | URL | Link to view filing on EDGAR. |
documentperiodenddate | date | Document period date taken from DocumentEntityInformation. |
documentfiscalperiodfocus | string | Fiscal period for filing taken from DocumentEntityInformation. |
documentfiscalyearfocus | int | Fiscal year for filing taken from DocumentEntityInformation. |
toc | array | Array containing "table of contents" related information. |
sections | array | Array containing data for each section within the filing (body, title, etc.) |
incompleteData | boolean | Shows True if an error occurred while processing the IACC. All filings in the response with no value in this field are valid with complete data. MultiFilingContent and Search actions only. |
Sections Object¶
Field | Type | Description |
---|---|---|
sectionid | integer | Identifier of the individual section in the respective filing. |
prevsectionid | integer | Identifier of the corresponding section from the prior filing. |
itemid | integer | Identifier for the parent item of the section. You can consider an item as a ‘container' for one or more sections. |
tlitem | string | The "Top-Level" item which contains the section. Top-level items are fundamentally based on the SEC's required items for a given form-type. |
intro | boolean | 'True' if the section is the first section of its parent item else 'False'. Often, intro sections will only contain a title for the item and nothing more, but that is not always the case. |
title | string | The title for the section. |
filingorder | integer | Numerical identifier indicating relative position within the filing. Smallest values are at the beginning of the filing while largest are at the end. |
changetype | char | This is the type of change for the respective item or section using our proprietary change algorithm: N: New Disclosure D: Deleted Disclosure U: Unchanged Disclosure F: Full / Major Change B: Big Chang M: Medium Change S: Small Change |
boilerplate | boolean | 'True' if the text within the Top Level Item, Item, or Section can be considered “boilerplate”, else 'False'. This distinction is made using a combination of automated textual analysis and manual analyst review. |
itemidpath | [integer] | Array listing all itemids which contain the section. Mutliple itemids indicate that the section is located within nested items. |
tags | [string] | Array of categorical tags for the section. |
body | string | The complete text of the respective section or item. There are four different output versions for the body content: Plain: this is a plain text version of the disclosure text and scrubs the output of any HTML or formatting from the file. For disclosures that contain tables, we remove the HTML and include the raw contents of each table cell in the output. HTML: this is the ‘original’ HTML of the source filing for the respective disclosure text. Diff: this is the ‘original’ HTML of the source filing for the respective disclosure text, along with our proprietary change algorithm output. See more details below on the Diff output including the stylesheet definitions to better understand how to denote added and deleted text. Machine: this is a JSON representation of the Diff of the body scrubbed of any HTML and punctuation; we also ‘stem’ the output of words (using the industry standard snowball stemmer). For disclosures that contain numeric tables (typically tables where a majority of cells are numbers or have color banding), we insert a [NUMERIC TABLE] marker into the output where the table was located. The output takes into account our proprietary “diffing” software to allow users to easily see what text is added, removed, and changed. This allows users to focus on the core text and facilitate textual analysis, machine learning, and more. See more below on the machine output. Plain Machine: this is a JSON representation of the tokenized and stemmed “plain” body. As with the “Machine” output, words are tokenized using the snowball stemmer. |
TOC Object (Table of Contents)¶
Field | Type | Description |
---|---|---|
order | [string] | Array listing the Top Level Items, in order, for the filing. |
.tlitem.[name].itemtitle | string | Title for the Top Level Item. These titles typically match our internal name for the Top Level Item. |
.tlitem.[name].itemid | integer | The itemid for the Top Level item. |
.tlitem.[name].level | integer | The Top Level Item's level within the filing. (Top Level Items are always Level 1) |
.tlitem.[name].parent | string | Parent item title for the Top Level Item. (By defenition, Top Level Items have no parent items, so will yield a 'None' value) |
.tlitem.[name].tlitem | string | The name of the Top Level Item. (By definition, yields itself) |
.tlitem.[name].content | [string] | Array listing all items and sections contained within the Top Level item, along with various table of content identifiers. For items, output includes:
|
API Calls / Actions¶
FilingList¶
Outputs a list of filings meeting the argument criteria. Response includes Filings Object.
Arguments¶
Argument | Type | Required | Description |
---|---|---|---|
companyid | int or string | Identifer for the company. Accepted identifiers are: Ticker, CIK, CUSIP | |
dateStart | date or timestamp | Start date for the query. (Inclusive) | |
dateEnd | date or timestamp | End date for the query. (Inclusive) | |
dateType | string | The date/time to use for dateStart/dateEnd arguments. Valid values are "datefiled", "sec_received", "infilings_received", and "reviewdate". Defaults to "datefiled" | |
reviewed | boolean | If True, only return reviewed filings. If False, only return unreviewed filings. If null or not present, return filings of either status. | |
mcapLow | int | Lower bound for optional marketcap filter. | |
mcapHigh | int | Upper bound for optional marketcap filter. | |
formTypes | string[] | Array of formtypes to include. | |
limit | int | Output limit. Defaults to 100. |
FilingContent¶
Get full or partial contents of a filing. Response includes Filings Object (shows as "filing" instead of "filings" for this call), TOC Object, and Sections Object.
Arguments¶
Argument | Type | Required | Description |
---|---|---|---|
iacc | int | Y | VerityData identifier for the filing |
body | string | Y | Body output type (plain/html/diff/machine/plainmachine). More information on these output types is included below. |
tlitem | string | Limit the response to a specific Top Level item. | |
includeexhibits | boolean | Include exhibit content in response. Defaults to False | |
changetype | [char] | Limit response to sections of given change type(s). | |
includedeletes | boolean | True to include deleted sections, False to exclude. Only for plain/html/machineplain outputs. |
MultiFilingContent¶
Get full or partial results from multiple filings, using a combination of arguments from the FilingList and FilingContent actions. The FilingList arguments select the filings to retrieve, the FilingContent arguments select the output formats. Response includes Filings Object, TOC Object, and Sections Object. Note - the toc and sections objects are located within the filings object for this call.
Arguments¶
Argument | Type | Required | Description |
---|---|---|---|
companyid | int or string | Identifer for the company. Accepted identifiers are: Ticker, CIK, CUSIP | |
multiiaccs | [int] | Array of IACCs to be included in the request. Maximum of 500. | |
dateStart | date or timestamp | Start date for the query. (Inclusive) | |
dateEnd | date or timestamp | End date for the query. (Inclusive) | |
dateType | string | The date/time to use for dateStart/dateEnd arguments. Valid values are "datefiled", "sec_received", "infilings_received", and "reviewdate". Defaults to "datefiled" | |
reviewed | boolean | If True, only return reviewed filings. If False, only return unreviewed filings. If null or not present, return filings of either status. | |
mcapLow | int | Lower bound for optional marketcap filter. | |
mcapHigh | int | Upper bound for optional marketcap filter. | |
formTypes | [string] | Array of formtypes to include. | |
body | string | Y | Body output type (plain/html/diff/machine/plainmachine). More information on these output types is included below. |
tlitem | string | Limit the response to a specific Top Level item. | |
includeexhibits | boolean | Include exhibit content in response. Defaults to False | |
changetype | [char] | Limit response to sections of given change type(s). | |
includedeletes | boolean | True to include deleted sections, False to exclude. Only for plain/html/machineplain outputs. | |
limit | int | Output limit. Defaults to 100. Ignored when using the multiiaccs argument. |
Search¶
Call results for Curated and saved searches from the inFilings web platform via the API. Note - searches cannot be created through the API, the API can only call results for searches which were previously created and saved on the web platform.
Arguments¶
Argument | Type | Required | Description |
---|---|---|---|
serachid | int | Y | ID for the search you'd like to use. This can be obtained from the web platform's URL when running the saved search via the web platform (i.e. infilings.com/search.php?searchid=XXXXXX) |
company id | string | Limit search results to specific company identifier (ticker, CIK, or CUSIP) | |
limit | int | Maximum number of results (defaults to 75, max is 75) | |
page | int | For use when more than 75 results (or set limit value) are found by the search. A 2 in this field paired with a default limit of 75 will show reults 75-150, 3 will show results 150-225, etc. | |
body | string | Body output type (plain/html/diff/machine/plainmachine). | |
timeframe | int | Override for the timeframe set in the saved search. The integer value is the number of days for the search to include. | |
sort | string | The sort order for the search results. Possible values are score (match score) or ftstamp (date filed). Default is ftstamp. |
Response¶
Field | Type | Description |
---|---|---|
ticker | string | |
CUSIP | string | |
companyname | string | |
iacc | int | VerityData identifer for the filing. |
cik | int | SEC CIK for the filer. |
formtype | string | Form-type of the search results. |
datefiled | date | |
filedtimestamp | timestamp | Date & time when the filing was received by the SEC. |
mcap | float | Market-cap for the company. |
boilerplate | boolean | 'True' if the text within the Top Level Item, Item, or Section can be considered “boilerplate”, else 'False'. This distinction is made using a combination of automated textual analysis and manual analyst review. |
title | string | The title for the section. |
filingorder | integer | Numerical identifier indicating relative position within the filing. Smallest values are at the beginning of the filing while largest are at the end. |
itemid | integer | Identifier for the parent item of the section. You can consider an item as a ‘container' for one or more sections. |
reviewed | boolean | True if the filing has been processed and structured by VerityData. |
tlitem | string | The "Top-Level" item which contains the section. Top-level items are fundamentally based on the SEC's required items for a given form-type. |
snippets | string | Snippet of text matching the search parameters |
TableOfContents¶
Get the Table of Contents for a filing. Response includes TOC Object.
Arguments¶
Argument | Type | Required | Description |
---|---|---|---|
iacc | int | Y | VerityData identifier for the filing |
tlitem | string | Limit the response to a specific Top Level item. |
RiskFactorCounts¶
Get various Risk Factor statistics for a group of companies. The arguments for company selection is similar to the FilingList action's arguments. Response detailed below.
Arguments¶
Argument | Type | Required | Description |
---|---|---|---|
companyid | int or string | Identifer for the company. Accepted identifiers are: Ticker, CIK, CUSIP | |
dateStart | date or timestamp | Start date for the query. (Inclusive) | |
dateEnd | date or timestamp | End date for the query. (Inclusive) | |
dateType | string | The date/time to use for dateStart/dateEnd arguments. Valid values are "datefiled", "sec_received", "infilings_received", and "reviewdate". Defaults to "datefiled" | |
reviewed | boolean | If True, only return reviewed filings. If False, only return unreviewed filings. If null or not present, return filings of either status. | |
mcapLow | int | Lower bound for optional marketcap filter. | |
mcapHigh | int | Upper bound for optional marketcap filter. | |
formTypes | string[] | Array of formtypes to include. | |
latest | boolean | Limit counts to the latest spot-approved 10-K / 10-Q for the company. | |
limit | int | Output limit. Defaults to 100. |
Response¶
Field | Type | Description |
---|---|---|
estdatefiled | date | |
unusual | integer | Count of risk factors marked as "unusual" by our system. This attribute is determined using a combination of the filer's disclosure history as well as broader macro reporting trends. |
new | integer | Count of new risk factors. See 'changetypes' within the section object: Fields table for more information on this change-type. |
deleted | integer | Count of deleted risk factors. See 'changetypes' within the section object: Fields table for more information on this change-type. |
risk_totalcount | integer | Total count of risk factors disclosed in the filing. |
totalchange | integer | Total count of changed risk factors in the filing. |
risk_unchangedcount | integer | Total count of unchanged risk factors disclosed in the filing. |
bigchange | integer | Total count of risk factors with "big" changes. See 'changetypes' within the section object: Fields table for more information on this change-type. |
mediumchange | ingeger | Total count of risk factors with "medium" changes. See 'changetypes' within the section object: Fields table for more information on this change-type. |
smallchange | integer | Total count of risk factors with "small" changes. See 'changetypes' within the section object: Fields table for more information on this change-type. |
tinychange | integer | Total count of risk factors with "tiny" changes. See 'changetypes' within the section object: Fields table for more information on this change-type. |
wordcount | integer | Count of words within the Risk Factors Top Level Item (stop words excluded). |
wordcountpct | double-precision | Risk Factor word count as a percentage of total words in all Top Level Items (excluding Exhibits). |
jaccard | double-precision | Computed jaccard score. See Lazy Prices academic study for more information. |
topadded | [JSON] | JSON output listing the most added words in the filing's Risk Factor disclosure along with their added count. Added words are determined by comparing the filing's Risk Factor disclosure to prior Risk Factor disclosures by the company (comparisons are made up to and including the prior 10-K). The word is identified by the 'w' header in the JSON. |
topdeleted | [JSON] | JSON output listing the most deleted words in the filing's Risk Factor disclosure along with their delete count. The methodology and output is the same as the topadded field above. |
Filing Structure and Top Level Items¶
10-K and 10-Q Top Level Items¶
The following table includes a list of Top Level Items in 10-K and 10-Q filings. Some Top Level Items are 10-K only, some are shared between 10-Ks and 10-Qs, and some are 10-Q only.
Top Level Item | Internal Name | 10-K Location | 10-Q Location | Boilerplate |
---|---|---|---|---|
Business Description | businessdesc | Item 1 | ||
Risk Factors | risk | Item 1A | Part II – Item 1A | Yes |
Unresolved Staff Comments | unresolved | Item 1B | Yes | |
Properties | properties | Item 2 | Part II – Item 1 | |
Legal Proceedings | legal | Item 3 | Yes | |
Mine Safety Disclosures | minesafety | Item 4 | Part II – Item 4 | Yes |
Executive Officers | execofficers | Item 4A | ||
Stockholder Matters | stockholdermatters | Item 5 | ||
Selected Financial Data | selectfinancial | Item 6 | ||
Management Discussion and Analysis | *mgmtdiscussion | Item 7 | Part I – Item 2 | |
Quantitative Disclosures About Market Risk | quantdisclosures | Item 7A | Part I – Item 3 | Yes |
Critical Accounting Policies | criticalaccounting | Item 7CA VerityData Specific | Part 1 – Item 2CA VerityData Specific | Yes |
Financial Statements | finstatements | Item 8 | Part I – Item 1 | |
Footnotes | notestoconsolidated | Item 8N VerityData Specific | Part 1 – Item 1N VerityData Specific | |
Accountant Changes and Disagreements | acctdisagreements | Item 9 | Yes | |
Controls and Procedures | controlprocs | Item 9A | Part I – Item 4 | Yes |
Other Information | otherinfo | Item 9B | Part II – Item 5 | Yes |
Disclosure Regarding Foreign Jurisdictions that Prevent Inspections | inspectionprevention | Item 9C | Yes | |
Directors, Officers, and Corporate Governance | dirandexecs | Item 10 | Yes | |
Executive Compensation | execcompensation | Item 11 | Yes | |
Security Ownership of Related Stockholder | beneowners | Item 12 | Yes | |
Certain Relationships and Related Transactions | relations | Item 13 | Yes | |
Accountant Fees | acctfees | Item 14 | Yes | |
Exhibits | finstatementschedules | Item 15 | Part II – Item 6 | |
Summary | summary | Item 16 | ||
Unregistered Sales | unregisteredsales | Part II – Item 2 | Yes | |
Defaults Upon Senior Securities | seniordefaults | Part II – Item 3 | Yes |
IPO Filing Top Level Items¶
The following table includes a list of Top Level Items in IPO filings. We process the following types of IPO filings: DRS, DRS/A, S-1, S-1/A and 424B4. As you’ll see in the table, most of the Top Level Items are specific to VerityData.
Top Level Item | Internal Name | IPO Location | Additional Notes |
---|---|---|---|
Prospectus | prospectus | VerityData Specific | |
Risk Factors | risk | VerityData Specific | The company’s first 10-Q or 10-K filing is compared against the company’s 424B4 for the most accurate changes |
Selected Financial and Operating Data | selectfinancial | VerityData Specific | |
Management Discussion and Analysis | mgmtdiscussion | VerityData Specific | The company’s first 10-K filing is compared against the company’s 424B4 for the most accurate changes |
Critical Accounting Policies | criticalaccounting | VerityData Specific | The company’s first 10-K filing is compared against the company’s 424B4 for the most accurate changes |
Quantitative Disclosures About Market Risk | quantdisclosures | VerityData Specific | |
Business | businessdesc | VerityData Specific | The company’s first 10-K filing is compared against the company’s 424B4 for the most accurate changes |
Regulation / Legal Maters | legalmatters | VerityData Specific | |
Management | management | VerityData Specific | |
Financial Statements | finstatements | VerityData Specific | The company’s first 10-K filing is compared against the company’s 424B4 for the most accurate changes |
Footnotes | notestoconsolidated | VerityData Specific | The company’s first 10-K filing is compared against the company’s 424B4 for the most accurate changes |
Other Expenses of Issuance and Distribution | otherexpenses | Item 13 | |
Indemnification of Directors and Officers | directors | Item 14 | |
Recent Sales of Unregistered Securities | unregistered | Item 15 | |
Exhibits | finstatementschedules | Item 16 | Exhibits are not included in our standard file; contact us if you are interested in Exhibits |
Undertakings | undertakings | Item 17 | |
Other | other |
Top Level Items - Additional Information¶
Overview¶
10-K and 10-Q filings generally follow a consistent outline as required by the SEC. We refer to the largest blocks as Top Level Items, which correspond to "Item" filing elements, e.g.
- Item 1. Business Description
- Item 1A. Risk Factors
- Item 2. Properties
- etc.
To build our Top Level Items structure, our proprietary software looks for markers that identify the beginning of each Top Level Item in the filing. See Page 6 of this document for a complete list of Top Level Items available in 10-Ks and 10-Qs.
It’s important to note that filers don’t always follow the standard Top Level Item structure. Top Level Items are often placed in Exhibits or other sections. In rarer cases a company may incorporate content from an annual report "by reference," requiring us to parse that additional document. Our SEC Specialists check the Top Level filing consistency of every filing to ensure the correct overall structure.
VerityData Top Level Items¶
To allow clients easy access to important disclosures, we’ve created a handful of proprietary Top Level Items. These include Item 7CA. Critical Accounting Policies and Item 8N. Footnotes. We restructure filings for consistency: e.g. If a filing has footnotes in the Exhibits, we would move them into Item 8N. Footnotes.
IPO Filings¶
IPO filings follow a less consistent overall structure than 10-K and 10-Q filings. They have fewer explicitly defined “Item X. Item Name” breakout within the filing. However, companies are generally consistent in disclosure names, like “Prospectus”, “Risk Factors” and so on.
When processing these IPO filings, we synthesize appropriate VerityData-specific Top Level Items. See the table above for a complete list of IPO Top Level Items.
Companies with Multiple Financials or Footnotes¶
For companies with subsidiaries or complex corporate structures that file multiple sets of Financial Statements and/or Footnotes, we create unique items for each set. We will include the name of the subsidiary within the title. Each set is considered independent of other sets, and will be matched with the corresponding set in any prior filings.
Detail Scopes¶
Below the Top Level Items, we structure the filings into "Sections," which consist of a title and the specific disclosure text. Sections are often collected into categories we call "Items." Items are collections of Sections, and may or may not have text of their own.
This structure is generated by our sophisticated software, which parses titles and callouts in the original filing, and can even identify structure by the way the filing's text is formatted.
For clarity, the software may re-organize the filing into more consistent Top Level Items, Items, and Sections, but it will maintain the vast majority of the original text, tables, and formatting. Tables, in particular, will be maintained in their original form.
Threading(Matching) between Filings¶
Most of the threading (matching) is done on like filings within the same Top Level Item. For example, Business Description in a 10-K is compared to Business Description in the prior 10-K. We call out important disclosures like Forward-Looking Statements to ensure accurate threading.
Inter-filing section matches do not need to be exact: the algorithms are able to adapt to wording changes, timeframe/period changes, and often even titles which have been renamed entirely. Human SEC specialists continually evaluate its performance.
Some sections are particularly sensitive to matching, these are described below.
With new data, there will not be existing data to match it to. When we determine a title was not disclosed in the prior filing, we classify it as a "New Disclosure."
Our technology intelligently uses prior filings to help it find matching data in new ones. Additionally, our team of SEC specialists review threads and edit the item-level and section-level matches when necessary. This allows us to capture changes in disclosures that may dramatically change locations or changes in titles from one filing to the next, and provides data to continually improve the software matches.
A special note as it relates to threading sections:
Our SEC Specialists attempt to structure specific disclosures across a company’s history, but sometimes:
* a specific section or disclosure dramatically changes stucture between filings, or
* a company significantly changes the Top Level Item in which it makes a specific disclosure
In these cases, a similar disclosure may appear as New in one place and Deleted inanother. Such events are rare, but we consider them a mis-categorization, and are constantly working to eliminate them.
Threading in "Risk Factor" and "Critical Accounting Policy"¶
The SEC has specific disclosure requirements for Risk Factor and Critical Account Policy updates. We analyze both 10-Ks and 10-Qs to accurately identify changes. For IPOs, we also include the company's final 424B4 IPO prospectus in the analysis.
For Risk Factors, we break out every individual risk factor by its title and description. Since these often change title and disclosure order. Some companies will also only list new or changed risk factors in their 10-Q disclosures. Because we have the 10-K, 10-Q, and IPO filings available, our software can account for these changes and provide the full list, calling out true "New" disclosures, as well. As with other matches, risk factor matches are resilient in the face of wording changes in the titles or descriptions.
If you want to do detailed analysis on risk factors, ask us about our Risk Factor specific feed offerings. The Lazy Prices study highlighted changes in risk factors as an opportunity for alpha. With VerityData you’ll have access to more granular data and specific modeling opportunities.
Management Discussion & Analysis (MD&A) Threading:¶
MD&A sections typically includes disclosures related to 3-, 6-, and 9-month results, depending on the fiscal period of the 10-Q filing. The software takes care to match like periods; this typically requires matching the same period from the previous fiscal year's filing, not the different-length period of the previous filing.
For example, a "6-month" results disclosure will often be compared to the "6-month" disclosure from three 10-Qs (one year) ago. This prevents comparing periods of different length, or mis-identifying the period as being a new disclosure. Note that this applies only to 10-Q MD&A disclosures; 10-Ks are always compared to the prior 10-K.
IPO Filing Threading:¶
We thread IPO filings by starting with the company’s initial IPO-related filing. A typical IPO thread will consist of Draft Registration Statement (DRS), to DRS/A(s), to the S-1, to S-1/A(s), through the company’s final 424B4 Prospectus. Except as noted above for Risk Factors, we thread disclosures within the same IPO Top Level Items. When a company files its first 10-K, we will thread the Footnotes, Business Description, MD&A, and more back to the company’s 424B4 to allow for more context when analyzing the 10-K.
Change Scores¶
Once we've matched sections and items to their equivalents in previous filings, we generate a Change Score for each one. This reflects the amount of change between a disclosure and the same disclosure in a previous filing; the higher the Change Score, the greater the difference (and likely the greater the importance of the change).
The algorithms for calculating these differences are extremely sophisticated, built on the world of Eugene Myers. Punctuation differences; different phrasing; different forms of the same word; dates, ages, time periods, page numbers; items marked as "copy"; and other mundane or expected changes will not be significantly affect the Change Score. Similarly, "small" changes with large meanings will affect it. These include things like opposite phrases (e.g. "will" to "will not"), changes in dollar or share values, the addition or removal of numeric tables, etc.
We do not incorporate changes in sentiment into the Change Score.
Other Linguistics Analysis¶
Boilerplate Designation¶
A number of Top Level items contain non-specific, general language: including short text like ‘None’ or ‘Not Applicable’. We classify these disclosures as "boilerplate." If a Top Level Item changes from boilerplate to actual content, the change type will appear as "New," to capture the fact that it is a new disclosure. Additionally, the change type for Subsequent Event disclosures in 10-K and 10-Q filings with the new content will also appear as New.
IPOs filings are handled slightly differently, Subsequent Event disclosures will display the respective change type; companies will often update their Subsequent Event disclosures as they update their IPO filings.
SEC Specialists review boilerplate settings to ensure accuracy. The boilerplate setting is especially helpful for Other Information and Subsequent Event disclosures, allowing you to focus analysis on disclosures that contain content. See more information the section on Top Level Items, above.
Tags¶
We ‘tag,’ or categorize, important disclosures such as Forward-Looking Statements and Critical Audit Matters. This allows for consistent threading across filings. Tags are created by utilizing Machine Learning and other proprietary software. SEC Specialists review tags as necessary. New tags are added over time.
Historical filing changes¶
Sometimes changes work both ways, and an older filing will be changed and re-approved in our system to make matching with modern filings more consistent. A change like this would result in re-approval(s) of the historical filing(s).
For example, in its most recent filing, a company decides to break a previously large section into smaller disclosure sections. In these cases, we may change the historical filings to more accurately thread (match) the disclosures.
As we add more Tags to the system, we may also need to modify the filing structure. If Tags are added, or if the filing structure is modified, this will require a re-approval of the filing.
Similarly, as more tags are added, these may be retroactively applied and result in historical filings being re-approved.
Additional Output Information and Examples¶
More on the Diff Output Body Type:¶
If you want to create a front-end application, the Diff output is likely the best output to use. It includes the ‘original’ HTML and the results of our proprietary software that identifies changes. At the end of the document, we’ve included details on stylesheet and classes you’ll see in the JSON:
More on the Machine Output Body Type:¶
If you are looking to perform advanced textual analysis and machine learning, we suggest that you review the Machine output files first.
Their output is an array of objects. Each object has a Command (add / delete / copy) followed by a list of words.
In the example below:
-
word1 and word2 were added
-
word3 was unchanged
-
word4 and word5 were deleted
"diff": \[
{
"command": "add",
"words": \[
"word1",
"word2"
\]
},
{
"command": "copy",
"words": \[
"unchanged",
"word3"
\]
},
{
"command": "delete",
"words": \[
"word4",
"word5"
\]
}
\]
If the add command also has an “oldwords” key associated in the same diff, it is considered an ‘unimportant’ replacement – it is almost always a date or time period replacement. You would replace the “oldwords” with the current “words”.
Here’s also some sample code that would make it easy to feed added text into say, a RNN for learning purposes to see what’s commonly added:
for each obj in machineDiffOutput
if obj.command = 'add' then
addedWords += obj.words.length;
end if;
if obj.command = 'delete' then
deletedWords += obj.words.length;
end if;
end for;
print "Added: " + addedWords + " - Deleted " + deletedWords + " Words";
Diff Class Details and Stylesheet¶
See these descriptions for the more impactful classes within the Diff output:
Class | Description |
---|---|
.opAdd: | Denotes added text as a ‘block’ add – these are cases where viewing changes inline is too difficult to read. You can think of it the same as the .opChangeAdd for all intents and purposes. |
.opChangeAdd: | Denotes added text ‘inline’ within the paragraph. |
.opDelete: | Denotes deleted text as a ‘block’ delete, similar to the .opAdd as noted above. |
.opChangeDelete: | Denotes deleted text ‘inline’ within the paragraph. |
.opDeletedTable: | The class for deleted tables. |
.opTooltip: | The class for tool-tips; we show tool-tips date / year / quarter changes. |
.opOpposite: | Denotes ‘opposite’ changes based on a proprietary dictionary, i.e. are -> were, gain -> loss. |
.opPositiveGain: | We display number changes differently – this is for number changes that we calculate to be positive changes. |
.opNegativeGain: | We display number changes differently – this is for number changes that we calculate to be negative changes. |
.opSmallCopy: | We use this class for short amounts of text between adds and deletes; it is purely for readability, so these unchanged words don’t get lost. |
Class Stylesheet:¶
.opAdd
{
background-color: \#EAFCD9 !important;
}
.opDelete
{
color: \#757575 !important;
}
td.opDelete
{
text-decoration: line-through;
}
.opDelete .strikeBorder
{
border: solid 1px \#eee;
}
.opDelete .strike div
{
text-decoration: line-through;
}
.opDelete table
{
color: \#757575 !important;
}
.opDeletedTable
{
opacity: 0.75;
border: solid 1px \#c2c2c2;
margin-top: 5px;
padding-left: 5px;
padding-right: 5px;
}
.opDeletedTable \> div
{
padding: 5px;
}
.opDeleteNoBG
{
color: \#999999 !important;
}
.opChangeDelete
{
color: \#999999 !important;
}
.opChangeAdd
{
background-color: \#EAFCD9 !important;
}
.opTooltip
{
background-color: \#DEEDFF !important;
border-top: solid 1px \#cccccc;
border-bottom: solid 1px \#cccccc;
}
.opTooltip ins
{
text-decoration: none;
}
.opOpposite
{
background-color: \#F0A3B4 !important;
font-weight: bold;
}
.opPositiveGain
{
background-color: \#EAFCD9 !important;
}
.opPositiveGain ins
{
text-decoration: none;
}
.opNegativeGain
{
background-color: \#FFD1DE !important;
}
.opNegativeGain ins
{
text-decoration: none;
}
.opSmallCopy
{
background-color: \#FFF4E3;
}
Please contact datafeeds@verityplatform.com with any questions