Developing Marklogic Applications I - Serach Concepts

This is continuation of my notes from the MarkLogic training course Developing Marklogic Applications I - XQuery

Day 3: Search Concepts

Search

Everything in the Information Studio can be scripted via the admin module

By Default, cts:search is filtered

cts:search(
  fn:collection(),
  cts:near-query((cts:word-query("cats"), cts:word-query("cats")), 2),
  "filtered"
)

By Default, Reindexing occurs as soon as a configuration value is changes. IE automatic reindexing.

xdmp:plan(cts:search(fn:collection("books"), cts:word-query("dog")))//qry:final-plan

GeoSpacial Search

Basis of geospacial is latitude and longditude. The basic Marklogic type is cts:point

Queries
  • cts:circle($radiusInMiles, cts:point(-33.8830, 151.2216)
  • cts:polygon($pointsSequence)
  • cts:box($lowerCorner, $upperCorner)
  • cts:element-pair-geospatial-query()
  • cts:element-geospatial-query()
Index Types
  • Element - <element> lat, long </element>
  • Element Child - <element> <child>lat, long</child> </element>
  • Element Pair - <element> <lat>lat</lat><long>long</long></element>
  • Attribute - <element lat="lat" long="long" />

Information Studio Workspaces are xml.

Examples
cts:search(
  fn:collection(),
  cts:element-pair-geospacial-query(
    xs:QName("place"), xs:QName("lat"), xs:QName("lon"),
    cts:circle( 200, cts:point(25.0, -80) )))
xdmp:http-get("some-url", options);

Snippets, Highlighting, Sorting and Pagination

Snippets

Search returns a snippeted results set. Results provide a search snippet that gives the context of the search results.

Included snippet elements

  • search:result
  • snippet:sni

Can provide a option on the serch request to apply transform

<options xmls="http://marklogic.com/appservices/search">
  <transform-results apply="snippet">
    <preferred-elements>
      <element ns="http://marklogic.com/mlu/top-songs" name="descr" />
    </preferred-elemetns>
  </transform-results>
</options>
Highlighting

Results set includesd a highligh that can then have a class wrapping the string.

Sort options

Sort options require range indexes

option node defined how the search results are sorted.

  • Optiizing Sort
    • Ranged indexs can use range indexes.
    • Ranged indexes cab improve performance.
    • Adding <debug> to the options will show if range indexes are being used.
PAgniation
  • Default to 10 results
  • Modified in the results set.
  • search:response inclused total, start, and page-length
search:search("beatles", (), 11)

Faceted Navigation

  • Facets are grouped search results. by adding contraints to the query
  • Facets can be Bucketed contraints (eg Decade 2010-2020). These have upper and lower bounds.
  • Facets require ranged index
  • String range indexes have a collation. A collation is a uri to the string rules (eg diacritices, case sensitive
  • Facets are returned in the search results.
Creation of facets
  1. Create a contraint (with collation if needed)
    • This may include buckets for bucket contraints.
  2. Configure the Facet options.

Updating Content and Transactions

MVCC Multi Version Concurrency Controll
  • Inserts and Updates

    On Commit, Document is given a created transaction stamp on the document.

    1. Doc and indexes are inserted into Forest in In-Memory Stand, and
    2. An Entry has been made in Forest level Journal on disk

    Once the stand reaches a point (doc count, mem count, /other configuratable ceiling) the docuemtns is pushed from the In-Memory Stand into the On-Disk Stand

    Updates are the same except:

    1. The document uri exists.
    2. The old document received a delete timestamp equal to the new docs create.
    3. Updates aquire a write lock on the document.

    MarkLogic is an append database

    xdmp:document-insert("uri", xml):
    
    xdmp:node-replace(fn:doc("song1.xml")/top-song/title, <title>Trouble for Nothing</title)
    
    xdmp:node-insert-child(fn:doc("title")/book, <chapter no="2">...</chapter>)
    xdmp:node-insert-after(fn:doc("title")/book/title, <author>Herman Melvill</author>)
    
    xdmp:node-insert-child(fn:doc("moby_dick.xml")/book/author, attribute dob {"1819-08-01"})
    
    xdmp:node-insert-child(fn:doc("moby_dick.xml")/book/author/@dob, attribute dob {"unknown"})
    
  • Queries

    Run on a timestamp, so the data from that time is returned. this means queries never need to lock documents, becuase its a snapshot.

  • Merge / Update transactions

    Merge, merges stands together. Merge purges old deleted documents from the system. Only includes the current data.

  • Delete

    Delete marks the document with a delete timestamp. Delete is a document update.

    xdmp:node-delete(fn:doc("moby_dick.xml")//chapter[2])
    
    xdmp:node-delete(fn:doc("moby_dick.xml")/book/author/@dob)