API Keys, Tokens & Secrets: How They Leak and How Developers can Avoid it

Credential exposures are one of the fastest paths to compromise. In 2024 alone, GitHub detected over 39M leaked secrets across its…

Veronica Peter

~11 min read · December 30, 2025 (Updated: December 30, 2025) · Free: Yes

Credential exposures are one of the fastest paths to compromise. In 2024 alone, GitHub detected over 39M leaked secrets across its platform, prompting new security protections and organization‑wide scanning features. Independent studies consistently show how severe and long-lasting this problem is. Millions of API keys and credentials appear online every day, and what's even more worrying is that a large share of them remain active days after exposure. In this article, we'll delve into how these leaks happen, how to detect them and respond to protect yourself and your organization.

Disclaimer: The techniques shared in this article are provided for defensive, authorized security monitoring of your own assets or those you have explicit permission to test. Do not scan third-party code without consent and always follow responsible disclosure practices if you discover exposures.

How Secrets Leak on GitHub

Secrets often leak because developers prioritize speed and convenience over security during development. According to GitHub, accidental exposure typically happens when sensitive files are committed unintentionally or when .gitignore rules are misconfigured.

Even after removal, secrets can remain buried deep within Git history, accessible to anyone who knows where to look. This persistence means deleting a file isn't enough, proper secret rotation and history rewriting are critical.

Here are the most common ways secrets escape on GitHub:

Temporary hardcoding during development/testing: Developers often embed keys "just to try something," then forget to remove them before commit, and even if later replaced, git history retains past exposures unless rewritten.
Misconfigured ignore rules & config files: Sometimes, credentials in .env, YAML/TOML/JSON/JS config files, or language-specific settings slip past .gitignore, especially in multi-repo or monorepo setups thereby causing leaks.
CI/CD workflow logs: In continuous integration and deployment pipelines, secrets can accidentally appear in build logs or stored artifacts. If these logs aren't properly secured, they become an easy target. Even worse, a malicious pull request could exploit overly broad permissions or insert code that prints sensitive credentials during the build process.
Private to Public repository transitions: Sometimes a private repository is accidentally made public, exposing everything inside it. Even if the mistake is corrected quickly, copies of that code such as forks or snippets saved as gists can keep those secrets alive outside the organization's control. Once they're out, it's almost impossible to guarantee complete removal.

Ultimately, these leaks stem from a combination of rushed workflows, convenience-driven shortcuts, and blind spots in the development lifecycle.

How Attackers (and Defenders) Discover Secret Leaks

GitHub Dorking: One of the easiest ways to identify leaked secrets is through GitHub dorking. GitHub dorking uses targeted, advanced search queries within GitHub's search bar to scan its global codebase for common patterns that may indicate exposed API keys, tokens, or configuration files with sensitive information.

Sample GitHub search queries

1. AWS_ACCESS_KEY_ID
2. "Authorization: Bearer"
3. filename:.env
4. ext:json "apiKey"
5. ext:py "sk-"

You can also tailor the search queries to your organization to see if your developers accidentally leaked secrets:

org:YourOrgName "AWS_ACCESS_KEY_ID"
org:YourOrgName "Authorization: Bearer"
org:YourOrgName filename:.env
org:YourOrgName ext:json "apiKey"
org:YourOrgName ext:py "sk-"

Pro tip: Combine terms with different file extensions (.env, .json, .yaml) and exclude false-positives like "examples", "test", "demo";

org:YourOrgName ext:json -example -test -demo "API_Key"

Find other helpful GitHub Dorking resources here: Mindmap/Github Dorks/Github Dorks.pdf at main · Ignitetechnologies/Mindmap · GitHub

2. Google Dorking:

Secrets also leak beyond GitHub. Sometimes developers mirror repos, expose CI/CD logs, or publish code on personal sites. Google search engine index exposures in those aspects:

Sample Google dorks:

# GitHub repositories
site:github.com ext:php "api-key"
site:github.com ext:php "api_key"
site:github.com ext:php "api-token"
site:github.com ext:php "api_token"
site:github.com ext:php "access-token"
site:github.com ext:php "access_token"
site:github.com ext:php "x-api-key"
site:github.com ext:php "x_api_key"
site:github.com ext:php "x-api-token"
site:github.com ext:php "x_api_token"
site:github.com ext:php "x-access-token"
site:github.com ext:php "x_access_token"
# GitLab repositories
site:gitlab.com ext:php "api-key"
# AWS
site:github.com ext:py "ap-northeast-1.amazonaws.com" "x-api-key"
#Google APIs
site:github.com ext:js "googleapis.com" "?key="
#OpenAI
site:github.com ext:py "https://api.openai.com/v1/models" "Authorization: Bearer"

3. Misconfigured Services

Secrets don't only leak through source code. Misconfigured applications and exposed internet-facing services can also reveal sensitive data through directory listings, configuration files, HTTP headers, error pages, banners, open indexes, unsecured storage, and debug endpoints.

Both adversaries and security teams leverage internet-wide scanners and search engines such as Shodan, Censys, and ZoomEye to identify these exposures at scale. These platforms crawl the internet and index metadata, banners, and HTTP responses, meaning hardcoded tokens, API keys, and credentials embedded in responses can become discoverable.

Misconfigured services significantly expands the attack surface beyond source code and underscores a critical principle: API keys and other secrets should never be accessible on public internet-facing services.

Sample Shodan queries:

1. http.title:"Index of" "api_key" // Finds directory listings with api_key
2. "Authorization: Bearer" "openai" // Finds OpenAI bearer tokens in headers or body.
3. "AWS_SECRET_ACCESS_KEY" // Finds exposed AWS secret keys in HTTP responses or banners.
4. http.title:"Index of" ".env" // Finds servers exposing .env files, which often contain credentials.
5. http.title:"Index of" ("credentials" OR "secrets") // Finds directory listings with files named credentials or secrets.
6. "x-api-key" // Finds HTTP headers or responses containing x-api-key, common in API Gateway usage.
7. "BEGIN RSA PRIVATE KEY" OR "AWS_SECRET_ACCESS_KEY" //Finds exposed private keys or AWS secrets in responses.
8. "DATABASE_URL=" OR "SECRET_KEY=" // Finds database connection strings or app secret keys in exposed files.
9. "AWS_ACCESS_KEY_ID" OR "AWS_SECRET_ACCESS_KEY" // Finds AWS access keys or secret keys in HTTP responses.
10. "X-Amz-Credential" // Finds AWS signature headers, often seen in misconfigured endpoints.
11. "googleapis.com" "key=" //Finds Google API calls with keys embedded in URLs.
12. "maps.googleapis.com" ("?key=" OR "&key=") //Finds Google Maps API requests with hardcoded keys.
13. "AccountKey=" "DefaultEndpointsProtocol=" //Common Azure storage connection string pattern; indicates exposed Azure credentials.
14. "Authorization: Bearer" "api.openai.com" // Finds OpenAI API bearer tokens in headers or responses.
15. "openai" "sk-" //Finds OpenAI secret keys (prefix sk-) in exposed content.
16. "ghp_"  // Finds GitHub Personal Access Tokens (prefix ghp_)
17. "glpat-"  // Finds GitLab Personal Access Tokens (prefix glpat-).
18. product:Elasticsearch "cluster_name"  // Finds Elasticsearch instances; often exposed without authentication.
19. http.title:"Kibana" //Finds Kibana dashboards that may be publicly accessible.
20. "Prometheus Time Series Collection and Processing Server" // Finds exposed Prometheus endpoints leaking metrics and possibly secrets.
21. http.title:"Grafana" //Finds publicly accessible Grafana dashboards.
22. http.title:"MinIO Browser" OR "x-amz-bucket-region" // Finds exposed MinIO or S3-compatible storage endpoints.
23. port:9200 product:Elasticsearch // Finds Elasticsearch clusters on default port; misconfigured ones can leak logs and sensitive data.
24. port:27017 product:MongoDB // Finds MongoDB databases exposed without authentication; can leak entire datasets including credentials.
25. http.title:"Kubernetes Dashboard" // Finds exposed Kubernetes dashboards that may reveal cluster secrets or allow unauthorized access.

Pro tip — To tailor the above searches to your organization for targeted auditing:

Combine sensitive patterns with your company's domain name or subdomains to the query e.g "AWS_SECRET_ACCESS_KEY" ssl:"yourcompany.com", "Authorization: Bearer" "api.openai.com" hostname:"*.yourcompany.com"
If you know your public IP ranges, include them e.g net:"203.0.113.0/24"
Many misconfigured services expose org names in banners or TLS certs, you can find them by adding org:"YourCompanyName" to the query.
If you have inventory asset tags in banners, include them e.g "X-Company-Env: production"

To run the above searches using Shodan CLI tool, simply add "shodan search" before your query: shodan search "<your query here>"

5. CLI Tools

Command Line Interface tools like cURL and grep are great for quick targeted searches within your organization's codebase. They are useful for automating audits or integrating checks into CI/CD pipelines.

i. GitHub Search API

This is GitHub's centralized Search API that lets you query across all repositories in your organization without manually browsing each repo. By combining cURL with GitHub's search dorks such as org:YourOrgName, filename:.env, or ext:py,you can programmatically detect exposed secrets like API keys, tokens, and credentials.

// Basic curl search for AWS keys
curl -H "Accept: application/vnd.github.v3+json" \
"https://api.github.com/search/code?q=org:YourOrgName+AWS_ACCESS_KEY_ID+NOT+sample+NOT+test"

curl -H "Accept: application/vnd.github.v3+json" \ "https://api.github.com/search/code?q=org:YourOrgName+AWS_ACCESS_KEY_ID+NOT+sample+NOT+test"

// Search for Bearer tokens in Python files
curl -H "Accept: application/vnd.github.v3+json" \ "https://api.github.com/search/code?q=org:YourOrgName+%22Authorization:+Bearer%22+ext:py"

// Discover .env files (high risk)
curl -H "Accept: application/vnd.github.v3+json" \ "https://api.github.com/search/code?q=org:YourOrgName+filename:.env"

Pro-Tip: Use authenticated requests with a Personal Access Token (PAT) to avoid rate limits and include private repositories in your search. PAT tokens allows up to 5,000 requests/hour compared to 60 unauthenticated and ensures your searches cover all repos you have access to (based on your role and token scopes). Worth noting that PAT visibility follows the token owner, that is to say your results include only repos you can access. A PAT does not grant extra access. For org‑wide attack surface coverage, use a token owned by an org owner or a GitHub App installed at the org level with read‑only permissions.

Step-by-Step Guide on how to generate a Personal Access Token (PAT) on GitHub and use it in your cURL searches:

Step 1: Generate a Personal Access Token (PAT)

1. Log in to GitHub with your account. 2. Go to Settings → Developer settings → Personal access tokens. 3. Choose Classic PAT (older method) or Fine-grained PAT (recommended for security). 4. Click Generate new token: - For classic PAT, select scopes: repo → Full access to private repos (needed for org-wide scans). read:org → If you need org-level metadata. -For fine-grained PAT, select: Repository access: All repositories (or specific ones). Permissions: Contents: Read (minimum needed for search). 5. Set an expiration date (best practice: short-lived tokens). 6. Click Generate token and copy it immediately (you won't see it again).

Step 2: Store the PAT securely

Never hardcode the token in scripts, use an environment variable instead:

export GITHUB_TOKEN="ghp_yourtokenhere"

Or store in a secure secrets manager (Vault, AWS Secrets Manager, etc.).

Step 3: Verify your setup

Run:

curl -s -H "Authorization: Bearer $GITHUB_TOKEN" https://api.github.com/user

If it returns your user info, the token works.

Step 4: How to Use PAT in cURL searches

Simply add the Authorization header to your cURL command:

curl -s -H "Accept: application/vnd.github.v3+json" \
-H "Authorization: Bearer $GITHUB_TOKEN" \
"https://api.github.com/search/code?q=org:YourOrgName+AWS_ACCESS_KEY_ID+NOT+sample+NOT+test"

Best Practices

Rotate tokens regularly.
Use fine-grained PATs for least privilege.
For large orgs, consider GitHub Apps for better auditability and rate limits.
Never commit PATs to code, add them to .gitignore and CI/CD secrets.

ii. Local Grep Sweep (Cloned Repo)

When you have a repository cloned locally, one of the fastest ways to uncover secrets is by sweeping through the codebase with grep. Unlike remote searches, local scans give you full visibility into everything including branches, configuration files, and artifacts that GitHub's search engine might not index.

Secrets often follow predictable patterns, by using recursive grep with regular expressions, you can quickly identify these patterns across the entire repo:

Sample Commands:

1. grep -RniE '(AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|Authorization:\s*Bearer|api[_-]?key|SECRET(_KEY)?|PASSWORD|TOKEN)' .
2. grep -RniE 'sk_[A-Za-z0-9]{16,}' .

The above commands scan every file in the current directory and its subdirectories, returning matches with file paths and line numbers. To reduce noise in your scan, try the following enhancements:

// Removes noisy directories where secrets rarely live
 grep -RniE '(AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|Authorization:\s*Bearer|api[_-]?key|secret)'
  . --exclude-dir={.git,node_modules,vendor}
grep -RniE 'sk_[A-Za-z0-9]{16,}' . --exclude-dir={.git,node_modules,vendor}
// Target specific file extensions
  grep -RniE '(AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|Authorization:\s*Bearer|api[_-]?key|SECRET(_KEY)?|PASSWORD|TOKEN)' 
  --include='*.{env,json,yaml,yml,py,js,ts,go,rb,properties}' .
// Colorize matches for better readability
  grep --color=always -RniE '(api[_-]?key|SECRET(_KEY)?|PASSWORD|TOKEN|Authorization:\s*Bearer)' .
//Identify JWTs specifically
  grep -RniE 'eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}' .
//Skip binary files and large folders
   grep -RniI --exclude-dir={.git,node_modules,vendor,dist}
  -E '(AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|Authorization:\s*Bearer|api[_-]?key|SECRET(_KEY)?|PASSWORD|TOKEN)' .
//Search Git history for removed secrets
   git log -p | grep -RniE '(AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|Authorization:[[:space:]]*Bearer|api[_-]?key|SECRET(_KEY)?|PASSWORD|TOKEN|eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}|sk_(live|test)_[A-Za-z0-9]{16,}|xox[bap]-|ghp_[A-Za-z0-9]{36}|AIza[0-9A-Za-z_-]{35})'

6. Automated Secret-Scanning Tools

Manual checks like grep are powerful, but they don't scale well across large organizations or complex Git histories. This is where automated secret-scanning tools come in. These tools are designed to detect hardcoded credentials, API keys, and other sensitive data in source code, configuration files, and historical commits using advanced techniques like entropy analysis and pattern matching.

TruffleHog:

TruffleHog is a popular open-source tool that scans Git repositories for secrets in both the current state and the entire commit history. It looks for high-entropy strings (which often indicate cryptographic keys) and known secret patterns.

Sample Commands:

1. trufflehog git https://github.com/your-org/your-repo.git
2. trufflehog github --org ORGNAME --token "$GITHUB_TOKEN" --include-forks --include-members --include-wikis --issue-comments  --pr-comments --gist-comments --json --fail | tee trufflehog_out.json

GitLeaks

GitLeaks is a widely used open-source command line tool tool for detecting secrets in code repositories before final push or when developer makes changes to the code. It uses pattern matching (regex) and entropy analysis to identify things like API keys, passwords, and tokens. GitLeaks is highly configurable, allowing you to define custom rules and exclusions. It integrates smoothly with CI/CD pipelines and can enforce checks during pre-commit or pre-push hooks, helping developers catch and remove secrets before they enter the codebase. This makes it an effective solution for preventing credential leaks early in the development workflow. You can read up on this tool here.

Other Organization-wide secret monitoring and scanning tools;

GitHub Secret Scanning & Push Protection
GitGuardian
detect-Secrets
Spectral

These tools monitor across the entire SDLC , from developer workstations to production pipelines.

Incident Response Playbook: Exposed Secret in Repository

1. Immediate Containment (Highest Priority)

Revoke or disable the exposed secret immediately in the source system (cloud provider, API platform, database, etc.).
Rotate the secret and generate a new one.
Update all dependent services and applications to use the new secret.
Confirm the old secret is no longer valid.

2. Impact Assessment

Review logs and monitoring data for:
Unauthorized access attempts
Suspicious API calls or resource usage
Data exfiltration or privilege escalation

Determine:

Duration of exposure
Whether the secret had read-only or privileged access
Systems, data, or users potentially impacted
Escalate internally if required (security lead, cloud team, legal, compliance).

3. Codebase Remediation

Remove the secret from:
Source files
Configuration files
Environment examples (.env, sample configs)

Replace hardcoded secrets with:

Environment variables
Secret managers (AWS Secrets Manager, Azure Key Vault, HashiCorp Vault, etc.)
Add the affected file patterns to .gitignore where appropriate.

4. Repository History Sanitization (Disruptive, hence to be done carefully)

Confirm all secrets have already been revoked and rotated.
Search the full Git history to locate all instances of the exposed secret.
Use BFG Repo-Cleaner or git filter-repo to remove the secret from commit history.
Force push the rewritten history to the remote repository.

Notify all collaborators to:

Re-clone the repository, or
Reset their local history to the updated version.

Note: Rewriting Git history does not revoke exposed credentials or stop active abuse; it only reduces the risk of future accidental exposure. Always revoke and rotate secrets first before attempting any history cleanup.

5. Validation & Monitoring

Verify:

The old secret no longer works
The new secret functions correctly
No sensitive data remains in the repository or history

Enable enhanced monitoring for:

Abuse attempts using the old credential
Abnormal access patterns post-incident

6. Communication & Documentation

Document the incident:

Root cause
Timeline
Actions taken
Impact assessment

Notify stakeholders as required:

Engineering
Management
Compliance or legal (if applicable)
Prepare external communication if exposure affected customers or public systems.

7. Lessons Learned & Prevention

Implement preventive controls:

Pre-commit secret scanning (Gitleaks, TruffleHog)
CI/CD secret scanning
Repository secret scanning alerts
Enforce secure development practices:
No hardcoded secrets
Mandatory secret rotation policies
Least privilege for API keys and credentials
Update incident response runbooks based on lessons learned.

How Developers can Avoid Secret Leakage in Code

Pre-commit hooks : Remove secrets before they enter history. Always run scanners in redact mode to prevent credentials from being printed to the terminal or logs.
GitHub Secret Scanning & Push Protection: Enable secret scanning and push protection by default for both public and private repositories. Organization-wide scans help uncover legacy leaks and improve overall visibility.
Ephemeral credentials (OIDC): Avoid storing permanent secrets in build pipelines. Instead, allow them to request temporary access when needed, so any leaked credentials expire quickly.
Least privilege & token scoping: When generating keys and tokens, employ the principle o least privilege to ensure credentials get only the permissions they actually need, so a leak causes minimal damage.
Action hygiene & third-party reviews: Review what your CI/CD workflows and third-party actions can access, and avoid printing sensitive values in logs.
Continuous education: Train contributors on .gitignore hygiene, secret storage best practices, and responsible disclosure.

#api #api-key #sdlc #application-security #credentials