Smart Extract
Paste any mixed-format text. SmartDevBox scans it and pulls out every embedded JSON object, XML element, HTML block, date, URL, and email address — each tagged with its exact line, column, and byte offset.
What Is Smart Extract?
Real-world text is rarely clean. Log files contain serialised JSON objects between plaintext lines. API responses embed XML fragments. Database exports mix ISO dates with prose. Emails scatter URLs and addresses through unstructured content.
Smart Extract is a dedicated panel in SmartDevBox that scans any input — no matter how mixed — and identifies every recognisable structured section within it. It returns each section with its type, content, and precise position metadata (line number, column, character offset range) so you always know exactly where it came from.
What Gets Extracted
The extractor recognises six section types. Each type gets its own colour badge in the results panel:
Position Metadata
Every extracted section carries precise location information, not just its content:
- Line & column — 1-based position of the section's first character in the original input
- Character offset range — byte-accurate
startandendindices into the source string - Size — character count and kilobyte equivalent displayed in the section card
This metadata is especially useful when the source is a large log file or API trace — you can tell at a glance which line a JSON payload appeared on, and jump to it in your editor.
Smart De-nesting
Structured data is often nested — a JSON object may contain an XML string, or an HTML block may embed a JSON data- attribute. The extractor applies a outermost-wins rule per type: when one span of the same type fully contains another, only the outer span is returned. This prevents the results from being flooded with every intermediate object or element inside a larger structure.
How to Use It
- 1Open SmartDevBox and paste your mixed-format text into the input area.
- 2SmartDevBox will suggest Extract Sections automatically if it detects mixed structured content. Or select it manually from the Text category in the sidebar.
- 3The detection bar shows a summary — e.g. 3 JSON · 2 DATE · 1 URL — before you commit to extracting.
- 4Click Extract → to open the section picker popup.
- 5Filter by type (JSON / XML / HTML / DATE / URL / EMAIL), search by content, expand previews, then select the section you want.
- 6Click Extract to send the selected section to the output panel, ready to copy or pipe into another tool.
The extracted content is sent directly to the output panel. From there you can use Use as Input to pipe it straight into another tool — for example extract a JSON section, then immediately format it with the JSON Formatter.
Use Cases
- Pull a JSON payload out of a multi-line server log for formatting or inspection
- Extract XML response bodies embedded in SOAP trace output
- Harvest all dates from a report to audit timestamps
- Collect every URL from a raw HTTP response dump
- Extract email addresses from a batch of exported records
- Isolate an HTML fragment from a mixed API response for preview or conversion
- Grab a nested JSON config block from a large deployment manifest
Privacy
All extraction runs entirely in the browser using client-side JavaScript. Your text never leaves your machine — there is no server processing and no data retention. See the Privacy & Security page for full details.
Smart Extract vs Manual Regex or grep
The traditional approach to extracting structured data from mixed text is to write a regular expression or use command-line tools like grep orjq. Smart Extract eliminates that friction:
Smart Extract is particularly useful when working with log files, API response dumps, or documentation that contains embedded data you need to inspect or reuse. The position metadata (line, column, byte offset) makes it easy to locate the original span in large files.
Frequently Asked Questions
What if the JSON in my text is invalid?
Only spans that pass JSON.parse() validation are returned as JSON sections. Invalid JSON fragments are silently skipped. You may still see them detected as plain text, but they won't be tagged as JSON.
Can I extract multiple sections at once?
The current picker allows selecting one section at a time to send to the output. To process multiple sections, extract them one at a time and use the Use as Input flow for each.
How does it handle dates that look like version numbers (e.g. 1.2.3)?
The date patterns require day, month, and year components in plausible ranges. A version string like 1.2.3 does not match because the year component is not a 2- or 4-digit value. Some ambiguous strings may still be surfaced — use the type filter and preview to confirm before extracting.
Does it work on very large inputs?
The extractor runs synchronously in the browser on every input change, with a short debounce. For very large files (hundreds of KB), detection may take a noticeable fraction of a second. There is no hard size limit.