Static analysis

activerecon archangel teamcw anthem

For starter, you may use CTRL+U or Right-click and "View page source".

While uncommon, in easy CTFs or niche challenges:

  • The website title may contain information
  • Emails may expose internal domains

HTML tags

walkinganapplication ctfcollectionvol1 lianyu easypeasyctf html_disabled_buttons

A webpage often have hidden or disabled HTML tags.

This sample script below is fetching every HTML tag that is hidden aside from some uninteresting ones.

➑️ Find hidden input fields, hidden content...

Array.from(document.querySelectorAll('*')).filter(x => {
    // There were not displayed to the user in the first place
    if (x.nodeName === "HEAD" || x.nodeName === "META"
        || x.nodeName === "LINK" || x.nodeName === "STYLE"
        || x.nodeName === "SCRIPT" || x.nodeName === "TITLE") return false
    // hidden hidden="" hidden="hidden"
    if (x.hidden === true) return true
    // style (visibility/display/font-size)
    const style = window.getComputedStyle(x)
    if (style.visibility === 'hidden') return true
    if (style.display === 'none') return true
    if (style.fontSize === "0px") return true
    return false
}).map(x => x.outerHTML)

Links

contentdiscovery adventofcyber2 picklerick gamingserver surfer archangel teamcw anthem devvortex cap http_directory_indexing

To ensure you visited every page, you may want to check every link on every page. You can use this command:

// list every element with a href
document.querySelectorAll('*[href]') 

Another way is to use onectf or ZAProxy.

$ # crawl internal URLs (same domain)
$ onectf crawl -u https://example.com/

As this process is done by search engines, websites may have a file sitemap.xml with every page of their website, but there is no guaranty that all pages were added inside.

πŸ‘‰ Common URL if present: https://example.com/sitemap.xml

There is also a file robots.txt with the pages that robots should not crawl/index, with may indicate vulnerable/sensitive pages.

πŸ‘‰ Common URL if present: https://example.com/robots.txt


Comments

howwebsiteswork walkinganapplication picklerick wgelctf gamingserver cyborgt8 nibbles html_source_code http_directory_indexing phpbb_install_files

You can use JavaScript to fetch every HTML comment (rt/96517):

document.querySelector('html').innerHTML.replaceAll('\n', ' ').match(/<!--.*?-->/g)
document.querySelector('html').innerHTML.replaceAll('\n', '/n').match(/<!--.*?-->/g).map(x => x.replaceAll('/n', '\n'))

You may append the snippet this to remove empty comments:

[...].filter(r => r !== "<!---->")

You can use onectf to crawl a website and display HTML comments:

$ onectf crawl -u https://example.com/ --comments

Analyze the javascript

server_side_attacks walkinganapplication pythonplayground glitch javascript_authentication javascript_authentication_2 javascript_source

You may use the console debugger, after adding a breakpoint in the JavaScript, to analyze the javascript code, if needed.

πŸ“š Some developers might use well-known script name to hide their scripts such as jquery.js.

➑️ It's hard, so feel free to explore other techniques first.