Text Tools

Smart Extract

Paste any mixed-format text. SmartDevBox scans it and pulls out every embedded JSON object, XML element, HTML block, date, URL, and email address — each tagged with its exact line, column, and byte offset.

What Is Smart Extract?

Real-world text is rarely clean. Log files contain serialised JSON objects between plaintext lines. API responses embed XML fragments. Database exports mix ISO dates with prose. Emails scatter URLs and addresses through unstructured content.

Smart Extract is a dedicated panel in SmartDevBox that scans any input — no matter how mixed — and identifies every recognisable structured section within it. It returns each section with its type, content, and precise position metadata (line number, column, character offset range) so you always know exactly where it came from.

What Gets Extracted

The extractor recognises six section types. Each type gets its own colour badge in the results panel:

JSONJSON objects & arrays

Balanced brace/bracket matching with string-aware parsing. Only spans that pass JSON.parse() are kept — no false positives from partial structures.

Embedded API payloads in log linesConfig blobs inside plaintext reportsSerialised objects in CSV fields

XMLXML elements

Tag-balanced extraction that recurses through nested children. Self-closing and void elements are skipped; only complete open/close pairs are returned.

SOAP envelopes inside mixed responsesXML config blocks in log outputData fragments in templated text

HTMLHTML blocks

Same tag-balancing engine as XML, with a recognised set of HTML element names (div, p, table, form, section, article, and more). HTML sections are reported separately from XML.

Rendered HTML snippets in API responsesEmail body HTML in raw message dumpsTemplate fragments in mixed documents

DATEDates & timestamps

Multiple regex patterns covering ISO 8601 (with optional time and timezone), named-month formats (Jan 1, 2024), slash/dash/dot-separated numeric dates, and Unix millisecond timestamps (13-digit).

Created-at / updated-at fields in logsReport dates in plaintext exportsTimestamps scattered across mixed output

URLURLs

Matches absolute http:// and https:// URIs, stopping at whitespace or common delimiters. Captures full paths, query strings, and fragments.

API endpoints buried in log linesLinks embedded in plaintext responsesRedirect chains in request traces

EMAILEmail addresses

RFC 5321-style pattern: local part (letters, digits, dots, hyphens, plus signs) + @ + domain with at least one dot. Handles subdomains and multi-part TLDs.

Contact addresses in text exportsSender/recipient fields in raw email dumpsUser emails in serialised records

Position Metadata

Every extracted section carries precise location information, not just its content:

Line & column — 1-based position of the section's first character in the original input
Character offset range — byte-accurate start and end indices into the source string
Size — character count and kilobyte equivalent displayed in the section card

This metadata is especially useful when the source is a large log file or API trace — you can tell at a glance which line a JSON payload appeared on, and jump to it in your editor.

Smart De-nesting

Structured data is often nested — a JSON object may contain an XML string, or an HTML block may embed a JSON data- attribute. The extractor applies a outermost-wins rule per type: when one span of the same type fully contains another, only the outer span is returned. This prevents the results from being flooded with every intermediate object or element inside a larger structure.

How to Use It

1Open SmartDevBox and paste your mixed-format text into the input area.
2SmartDevBox will suggest Extract Sections automatically if it detects mixed structured content. Or select it manually from the Text category in the sidebar.
3The detection bar shows a summary — e.g. 3 JSON · 2 DATE · 1 URL — before you commit to extracting.
4Click Extract → to open the section picker popup.
5Filter by type (JSON / XML / HTML / DATE / URL / EMAIL), search by content, expand previews, then select the section you want.
6Click Extract to send the selected section to the output panel, ready to copy or pipe into another tool.

The extracted content is sent directly to the output panel. From there you can use Use as Input to pipe it straight into another tool — for example extract a JSON section, then immediately format it with the JSON Formatter.

Use Cases

Pull a JSON payload out of a multi-line server log for formatting or inspection
Extract XML response bodies embedded in SOAP trace output
Harvest all dates from a report to audit timestamps
Collect every URL from a raw HTTP response dump
Extract email addresses from a batch of exported records
Isolate an HTML fragment from a mixed API response for preview or conversion
Grab a nested JSON config block from a large deployment manifest

Privacy

All extraction runs entirely in the browser using client-side JavaScript. Your text never leaves your machine — there is no server processing and no data retention. See the Privacy & Security page for full details.

Smart Extract vs Manual Regex or grep

The traditional approach to extracting structured data from mixed text is to write a regular expression or use command-line tools like grep orjq. Smart Extract eliminates that friction:

grep / regex approachWrite and test a pattern for each format, handle multi-line JSON manually, pipe through jq — requires shell access and regex knowledge.

SmartDevBox Smart ExtractPaste mixed text. JSON, XML, dates, URLs, and emails are extracted automatically with line/column positions — in the browser, no shell required.

Smart Extract is particularly useful when working with log files, API response dumps, or documentation that contains embedded data you need to inspect or reuse. The position metadata (line, column, byte offset) makes it easy to locate the original span in large files.

Frequently Asked Questions

What if the JSON in my text is invalid?

Only spans that pass JSON.parse() validation are returned as JSON sections. Invalid JSON fragments are silently skipped. You may still see them detected as plain text, but they won't be tagged as JSON.

Can I extract multiple sections at once?

The current picker allows selecting one section at a time to send to the output. To process multiple sections, extract them one at a time and use the Use as Input flow for each.

How does it handle dates that look like version numbers (e.g. 1.2.3)?

The date patterns require day, month, and year components in plausible ranges. A version string like 1.2.3 does not match because the year component is not a 2- or 4-digit value. Some ambiguous strings may still be surfaced — use the type filter and preview to confirm before extracting.

Does it work on very large inputs?

The extractor runs synchronously in the browser on every input change, with a short debounce. For very large files (hundreds of KB), detection may take a noticeable fraction of a second. There is no hard size limit.