API Definition Harvesting

Apr 26

API definition harvesting, in the context of cybersecurity, is the process of automatically discovering, extracting, and collecting machine-readable descriptions of Application Programming Interfaces (APIs). These descriptions, or "definitions," provide details about an API's functionality, structure, and how to interact with it.

Here's a more detailed explanation:

Discovery: This involves actively searching for API definition files or resources. This can include:

Scanning web servers for common file names (e.g., openapi.json, swagger.yaml).
Crawling web applications to find links to definition files.
Analyzing network traffic to identify API definition exchanges.

Extraction: Once located, the API definitions are extracted. This might involve:

Downloading files.
Parsing web page content.
Intercepting and saving network communications.

Collection: The extracted definitions are then collected and stored for further analysis. This collection can be centralized to provide a comprehensive view of an organization's API landscape.
Formats: API definitions are available in various formats, with the OpenAPI Specification (OAS) being the most common. Others include Swagger (older version of OAS), RAML, and API Blueprint.
Purpose: API definition harvesting is crucial for:

Security Analysis: Understanding the API's functionality to identify potential vulnerabilities.
Attack Surface Mapping: Determining all the ways an API can be accessed and interacted with.
Automated Testing: Generating test cases to assess the API's security and functionality.

Here's how ThreatNG can assist with API definition harvesting:

1. External Discovery

ThreatNG's external discovery is the foundation for API definition harvesting. It enables the platform to scan an organization's entire external attack surface, identifying potential sources of API definitions. Since API definitions can be located in various places, including standard URLs, developer portals, and code repositories, ThreatNG's broad discovery capabilities are essential.

2. External Assessment

While ThreatNG doesn't have a specific "API definition harvesting assessment," its assessment capabilities provide valuable context:

Web Application Hijack Susceptibility: If ThreatNG identifies vulnerabilities that could allow an attacker to hijack a web application, having access to API definitions becomes more critical. Attackers can use these definitions to understand how to exploit those vulnerabilities.
Cyber Risk Exposure: ThreatNG's assessment of cyber risk can help prioritize the analysis of harvested API definitions. For example, if ThreatNG identifies an API with a high cyber risk exposure, the definitions for that API should be examined more closely.

3. Reporting

ThreatNG's reporting capabilities can present information about harvested API definitions, including their location, format, and the APIs they describe. This helps security teams organize their findings and gain a deeper understanding of the organization's API landscape.
The reports also include context, such as risk levels and recommendations, to help organizations assess and address API-related security concerns.

4. Continuous Monitoring

ThreatNG's continuous monitoring of the external attack surface ensures that any new or updated API definitions are discovered promptly. This is crucial because APIs and their definitions can change frequently.

5. Investigation Modules

ThreatNG's investigation modules provide capabilities that aid in API definition harvesting:

Domain Intelligence:

The Domain Overview module discovers SwaggerHub instances, which provide interactive API documentation and specifications. Since SwaggerHub is a platform for API development and documentation, it serves as a key source for API definitions.
The Subdomain Intelligence module can discover subdomains where API definitions might reside and identify API endpoints.

Archived Web Pages: This module can discover older versions of web pages, which might contain previous versions of API definitions.

6. Intelligence Repositories

While ThreatNG's intelligence repositories do not directly store API definitions, they provide context that is important for assessing the risk associated with APIs. For example, knowing about compromised credentials can help security teams understand the potential impact if those credentials are used to access APIs described by harvested definitions.

7. Working with Complementary Solutions

ThreatNG can enhance the effectiveness of other security tools by providing them with information about harvested API definitions:

API testing tools: ThreatNG can provide a list of API definitions to API testing tools, which can then utilize these definitions to generate test cases and automatically validate API functionality and security.
Vulnerability scanners: ThreatNG can assist vulnerability scanners by identifying APIs and their associated definitions, enabling the scanners to concentrate their efforts on assessing the API's security.

8. Examples of ThreatNG Helping

ThreatNG identifies an API definition file on a non-standard URL that was previously missed by security audits.
ThreatNG identifies an older version of an API definition that reveals deprecated API endpoints with known vulnerabilities.
ThreatNG's continuous monitoring detects the deployment of a new API definition, prompting a security review to ensure the API is implemented securely.

9. Examples of ThreatNG Working with Complementary Solutions

ThreatNG provides the URL of an OpenAPI Specification file to an API testing tool, which then uses the file to generate security tests for the API automatically.
ThreatNG identifies an API and its associated definition, which is then used by a vulnerability scanner to focus its analysis on the API's authentication and authorization mechanisms.

API Definition Harvesting

Threat NG Staff

API Definition Harvesting

API Exposure Analysis

API Documentation Retrieval