Searching Users With Elasticsearch or OpenSearch
Overview
You can search users in FusionAuth to see how many there are, what they are doing, and more. This document will walk you through how to use FusionAuth’s powerful search capabilities to retrieve such user data.
While this document references Elasticsearch, the same functionality is available if you use a supported version of OpenSearch.
Most of this document applies to searching Entities, which also leverage the Elasticsearch engine.
Examples are in curl, but you may also use any of the supported client libraries to run user searches in your favorite development language.
What Type of Engine
This document discusses the Elasticsearch search engine.
You can determine which search engine FusionAuth is using. It will either be the database
search engine or the elasticsearch
search engine.
To do so, in the administrative user interface, navigate to System -> About and scroll to the System section and you will see the configured search engine, as well as the version of Elasticsearch, if available. Below, it is Elasticsearch 7.17.0
.
If you are using the database search engine, you’ll want to consult the database search engine documentation for more information.
- Learn more about each type of search engine.
- Learn more about switching search engines.
An Introduction To The Elasticsearch Search Engine
With any search, you can make either a POST
or a GET
request. The functionality is exactly the same, but a POST
request can be larger. On the other hand, a GET
request is easily shared. Pick what works for you. However, this document uses POST
requests.
Four Types of Searches
There are three parameters for a search, and they are mutually exclusive.
Search Type Summary
Parameter | Uses Elasticsearch | Best For |
---|---|---|
ids | No | When you know the exact Ids of users you are trying to retrieve. You may have stored the id off in some other datastore, or be responding to a webhook. |
queryString | Yes | When you are searching on one field or want to search across multiple fields for strings. |
query | Yes | Useful to leverage the full power of Elasticsearch queries and need to look across different fields, including nested values, and/or use compound or complex search parameters such as ranges. |
nextResults | Yes | Allows you to continue paging forward in the results set after an initial query or queryString query. Useful for paging past the 10,000 results limit in Elasticsearch. See Limitations for more information. Available since version 1.48.0. |
Ids Searches
ids
searches are useful when you know exactly which users you want to retrieve. This is the only search which is guaranteed to only query the database, never Elasticsearch.
QueryString Searches
The queryString
search is a case-insensitive match. Whitespace is allowed in the search, but must be URL escaped; for example, using %20
for a space character. Elasticsearch compatible regular expressions may be used, so you may search by prefix or suffix using the *
wildcard. You may also search particular fields by prefixing the query with a field name, such as email:
.
There is some pre-processing done on the queryString
before it is passed to Elasticsearch. If the queryString
parameter has a :
it will be passed as-is. Otherwise, if it has no spaces and contains a @
it is assumed to be an email address and will be passed with a prefix: email:
. Otherwise, if it has no spaces it will have the wildcard *
prepended and appended.
FusionAuth adds wildcards to unscoped queryString
s in order to match the broadest set of results. Depending on the specifics of the system, a leading wildcard on a search term can significantly reduce search performance. In order to improve search performance, limit the search to a single field by specifying it in the queryString
followed by a :
to prevent the pre-processing by FusionAuth.
The above pre-processing occurs if you are interacting directly with the API. The FusionAuth administrative user interface does additional pre-processing and so results may differ.
You may use AND
and OR
clauses in this parameter to construct compound queries.
Query Searches
The query
parameter search requires an escaped JSON string which is passed to Elasticsearch and therefore must be a valid Elasticsearch query. With a query
search, you have the full power of the Elasticsearch query language.
Next Results Searches
Available since 1.48.0
You may perform a nextResults
search after receiving a nextResults
token in the response from an initial query
or queryString
query when using the Elasticsearch engine.
The token contains encoded information about the prior query and the previously returned results and if provided will return a set of results that immediately follow the results from the previous response in the ordered query.
This is roughly equivalent to performing a normal paginated request using startRow
. For example if you submitted a search quest with a startRow
of 0
and a numberOfResults
of 25
and then supplied the same query with a startRow
of 25
and numberOfResults
of 25
you should expect to receive the same users in the response as if you supplied a query with a startRow
of 0
and a numberOfResults
of 25
and then use the nextResults
token with a numberOfResults
of 25
in the next request.
Where this differs is the ability to continue paging through large result sets. Elasticsearch has a limitation that prevents a search from paging past 10,000 results. The nextResults
token will internally perform a search_after
query in Elasticsearch which can bypass that limitation. You can find more info in the Elasticsearch Documentation.
Field Mappings
For both queryString
and query
searches, you may search against specific fields.
The exact list of indexed fields is not documented, but if you are running FusionAuth self-hosted, you can find the list by retrieving the Elasticsearch mapping.
If you are using FusionAuth Cloud, please open a support ticket if you have a question about which fields are searchable.
Fields of the user object are indexed, as are the following relationships, which are available using the nested
query type.
memberships
- group memberships for this userregistrations
- application registrations for this user
Learn more about the nested query type.
Results Of a User Search
The results of any search have the same format. There is a total
object with the number of results and a users
array containing user objects.
Sample Results From a Search
{
"expandable": [],
"nextResults": "eyJscyI6WyIxLjAwMTQ2MTkiLG51bGwsInRlc3R1c2VyOTkwOUBsb2NhbC5jb20iLCJjNmI4ZjQyNC0wOTRjLTQ1MWYtYWMxNS05Y2ZkODI3NTZlNGEiXSwicXMiOiIqIiwic2YiOltdfQ",
"total": 1,
"users": [
{
"active": true,
"birthDate": "1976-05-30",
"breachedPasswordLastCheckedInstant": 1471786483322,
"data": {
"displayName": "Johnny Boy",
"favoriteColors": [
"Red",
"Blue"
]
},
"email": "example@fusionauth.io",
"expiry": 1571786483322,
"firstName": "John",
"fullName": "John Doe",
"id": "00000000-0000-0001-0000-000000000000",
"imageUrl": "http://65.media.tumblr.com/tumblr_l7dbl0MHbU1qz50x3o1_500.png",
"lastLoginInstant": 1471786483322,
"lastName": "Doe",
"memberships": [
{
"data": {
"externalId": "cc6714c6-286c-411c-a6bc-ee413cda1dbc"
},
"groupId": "2cb5c83f-53ff-4d16-88bd-c5e3802111a5",
"id": "27218714-305e-4408-bac0-23e7e1ddceb6",
"insertInstant": 1471786482322
}
],
"middleName": "William",
"mobilePhone": "303-555-1234",
"passwordChangeRequired": false,
"passwordLastUpdateInstant": 1471786483322,
"preferredLanguages": [
"en",
"fr"
],
"registrations": [
{
"applicationId": "10000000-0000-0002-0000-000000000001",
"data": {
"displayName": "Johnny",
"favoriteSports": [
"Football",
"Basketball"
]
},
"id": "00000000-0000-0002-0000-000000000000",
"insertInstant": 1446064706250,
"lastLoginInstant": 1456064601291,
"preferredLanguages": [
"en",
"fr"
],
"roles": [
"user",
"community_helper"
],
"username": "johnny123",
"usernameStatus": "ACTIVE",
"verified": true,
"verifiedInstant": 1698772159415
}
],
"timezone": "America/Denver",
"tenantId": "f24aca2b-ce4a-4dad-951a-c9d690e71415",
"twoFactor": {
"methods": [
{
"authenticator": {
"algorithm": "HmacSHA1",
"codeLength": 6,
"timeStep": 30
},
"id": "35VW",
"method": "authenticator"
},
{
"id": "V7SH",
"method": "sms",
"mobilePhone": "555-555-5555"
},
{
"email": "example@fusionauth.io",
"id": "7K2G",
"method": "email"
}
]
},
"usernameStatus": "ACTIVE",
"username": "johnny123",
"verified": true,
"verifiedInstant": 1698772159415
}
]
}
User Search Examples
Below are examples of searches you can run. All examples are available to be run locally by downloading this GitHub repository and following the instructions in the README.
Each example uses the below shell script to run the search. Feel free to download it, update your API key and base FusionAuth URL, and experiment. The script uses the POST
method so that the request body can be large. The value of the filename passed to the script changes each time, but everything else is the same.
search.sh: An Example Shell Script For Searching
#!/bin/sh
FILE=$1
API_KEY=90d8fb62-6f13-47d4-8ef6-1c3e687883c6
BASE_URL='http://localhost:9011'
#BASE_URL='https://sandbox.fusionauth.io'
curl -XPOST \
-H 'Content-type: application/json' \
-H "Authorization: $API_KEY" \
$BASE_URL'/api/user/search' \
-d @$FILE
Searching By User Ids
Here’s an example of an ids
query.
{
"search": {
"ids": [
"00000000-0000-0000-0000-000000000001",
"00000000-0000-0000-0000-000000000002"
]
}
}
This file is the first argument for the search.sh
shell script. Here’s how you’d run the query if the above JSON is in the ids-request.json
file.
Running The Example Shell Script
./search.sh ids-request.json
Searching With queryString
Let’s look at some examples of queryString
searches. Below, you are searching for dinesh
, across all fields, and wildcards are prepended and appended.
Example Request JSON For Searching Across All Indexed Fields With queryString
{
"search": {
"numberOfResults": 50,
"queryString": "dinesh",
"sortFields": [
{
"missing": "_first",
"name": "email",
"order": "asc"
}
],
"startRow": 0
}
}
Here’s how you’d run it if the above JSON is in all-fields-data-request.json
.
Running The Example Shell Script
./search.sh all-fields-data-request.json
Below, only the email
field is searched for the string dinesh
, because email:
is specified.
Example Request JSON For Searching By Emails With queryString
{
"search": {
"numberOfResults": 50,
"queryString": "email:dinesh*",
"sortFields": [
{
"missing": "_first",
"name": "email",
"order": "asc"
}
],
"startRow": 0
}
}
If you change the value from email:dinesh*
to email:dinesh
this query will return 0 results, because there is no user with that email and no wildcarding is done. If, instead, you change the value to email:dinesh@fusionauth.io
this query will return the user.
Finally, below, the email
field matches the string dinesh
and the verified
field is false
. These are joined together with an OR
clause which returns the user if either clause is true, but you can also use AND
for the intersection.
Example Request JSON For Searching By Emails And The Verified Field
{
"search": {
"numberOfResults": 50,
"queryString": "email:dinesh* OR verified:true",
"sortFields": [
{
"missing": "_first",
"name": "email",
"order": "asc"
}
],
"startRow": 0
}
}
Learn more about making queryString queries in the Elasticsearch documentation.
Searching With query
The query
parameter is the most powerful way to search. The query
parameter is an escaped Elasticsearch query which is passed through to the corresponding Elasticsearch server. Let’s walk through some examples.
Searching On One Field
First, build the Elasticsearch query. Below is a query matching all users with a user.data.Company
attribute of PiedPiper
.
The FusionAuth user.data
field can hold arbitrary JSON and is useful if your users have customer data fields.
There is a similar .data
field on many FusionAuth objects, though not all of them are indexed and searchable.
The key is case sensitive, so searching on user.data.company
will return zero results. However, the value is case insensitive. Searching for values of PiedPiper
, piedpiper
and PIEDPIPER
will return the same number of results.
Example Elasticsearch Query JSON Searching the Company User Attribute
{
"match": {
"data.Company": {
"query": "PiedPiper"
}
}
}
Next, escape any JSON characters in the string. This is a major difference when using the query
method when compared to the queryString
method. Because you can use the full Elasticsearch query language with the query
method, but FusionAuth’s APIs also expect JSON, you must escape the Elasticsearch query JSON.
One option to escape the JSON is to use jq
. If the Elasticsearch query above is in queryfile
, escape it using the below command. (You may ignore or remove the newlines indicated by \n
; they’ll be ignored by Elasticsearch.)
Escaping Elasticsearch Query String using jq
cat queryfile | jq -R -s '.'
That command outputs this string.
Example Escaped Elasticsearch Query JSON For Searching the Company User Attribute
"{\n \"match\": {\n \"data.Company\": {\n \"query\": \"PiedPiper\"\n }\n }\n}\n"
Next, add that string as the query
parameter to a request.
Example Request JSON For Searching By the Company User Attribute Emails With queryString
{
"search": {
"numberOfResults": 50,
"query":"{\n \"match\": {\n \"data.Company\": {\n \"query\": \"PiedPiper\"\n }\n }\n}\n",
"sortFields": [
{
"missing": "_first",
"name": "email",
"order": "asc"
}
],
"startRow": 0
}
}
Then, run the query using the search.sh
script. Here’s how you’d run it if the above JSON is in user-data-simple-request.json
.
Running The Example Shell Script
./search.sh user-data-simple-request.json
Searching On Multiple Fields
You can also perform advanced searches, including with multiple clauses, nested queries and numeric ranges. Let’s take a closer look at a more complex query.
First, here’s the query.
Example Elasticsearch Complex Query JSON
{
"bool": {
"must": [
{
"range": {
"data.Salary": {
"lt": 100000
}
}
},
{
"match": {
"data.Company": {
"query": "PiedPiper"
}
}
},
{
"match": {
"verified": {
"query": true
}
}
},
{
"nested": {
"path": "registrations",
"query": {
"bool": {
"must": [
{
"match": {
"registrations.data.paid": false
}
}
]
}
}
}
}
]
}
}
With this query, you are searching for users who meet the following criteria.
- A salary of less than 100000.
- Pied Piper employment.
- A verified email address.
- Registered for a particular application in your system. The
registrations
object is nested. - For that particular application, having a
paid
attribute offalse
.
This is quite specific. In addition to matching a value in the user.data
field like the first example, you are searching ranges, checking standard user attributes such as verified
, finding users who have registered for a given application, and examining application specific fields.
With the query
search method, if you can write the Elasticsearch query, you can find your users.
After creating the query, escape JSON characters in the query string. You can use jq
to do the escaping. If the Elasticsearch query above is in queryfile
, you can escape it using this command.
Escaping Elasticsearch Query String using jq
cat queryfile | jq -R -s '.'
That will display this escaped string:
Example Escaped Elasticsearch Complex Query JSON
"{\n \"bool\": {\n \"must\": [\n {\n \"range\": {\n \"data.Salary\": {\n \"lt\": 100000\n }\n }\n },\n {\n \"match\": {\n \"data.Company\": {\n \"query\": \"PiedPiper\"\n }\n }\n },\n {\n \"match\": {\n \"verified\": {\n \"query\": true\n }\n }\n },\n { \n \"nested\" : {\n \"path\" : \"registrations\",\n \"query\" : {\n \"bool\" : {\n \"must\" : [ {\n \"match\" : {\n \"registrations.applicationId\" : \"e9fdb985-9173-4e01-9d73-ac2d60d1dc8e\"\n },\n \"match\" : {\n \"registrations.data.paid\" : false\n }\n } ]\n }\n }\n }\n }\n ]\n }\n}\n"
The next step is to add that string as the query
parameter to a request.
Example Request JSON For a Complex Query
{
"search": {
"numberOfResults": 50,
"query": "{\n \"bool\": {\n \"must\": [\n {\n \"range\": {\n \"data.Salary\": {\n \"lt\": 100000\n }\n }\n },\n {\n \"match\": {\n \"data.Company\": {\n \"query\": \"PiedPiper\"\n }\n }\n },\n {\n \"match\": {\n \"verified\": {\n \"query\": true\n }\n }\n },\n { \n \"nested\" : {\n \"path\" : \"registrations\",\n \"query\" : {\n \"bool\" : {\n \"must\" : [ {\n \"match\" : {\n \"registrations.applicationId\" : \"e9fdb985-9173-4e01-9d73-ac2d60d1dc8e\"\n },\n \"match\" : {\n \"registrations.data.paid\" : false\n }\n } ]\n }\n }\n }\n }\n ]\n }\n}\n",
"sortFields": [
{
"missing": "_first",
"name": "email",
"order": "asc"
}
],
"startRow": 0
}
}
Run the query using the search.sh
script. Here’s how you’d run it if the above JSON is in user-data-complex-request.json
.
Running The Example Shell Script
./search.sh user-data-complex-request.json
You’ll get back any users that match all the criteria.
Pagination
You often need to paginate the results when running a query that matches many users or entities.
You can use the numberOfResults
and startRow
parameters to do so.
Pagination pseudocode to retrieve search results
{/* this is pseudo code and won't work out of the box */}
{/* adapt for whichever client and programming language you are using */}
startRow = 0
numberOfResults = 25
fullresults = [] // new array
results = client.search(query, startRow, numberOfResults)
count = results.length
fullresults.append(results)
while (count > 0) {
startRow = startRow + count
results = client.search(query, startRow, numberOfResults)
fullresults.append(results)
count = results.length
}
You may also set numberOfResults
to a higher number (500 or 5000, for example) to retrieve more results.
However, processing results 25 or 50 at a time has less impact on the FusionAuth system.
Note that prior to version 1.48.0 you’ll only be able to get back 10,000 results no matter how you paginate. See the Limitations section for workarounds.
Extended Pagination
Available since 1.48.0
Suppose you ran the following search query because you wanted to find all of your users. Note the use of accurateTotal
to see the true number of available users.
Search Query To Find All Users
{
"search": {
"accurateTotal": true,
"numberOfResults": 25,
"queryString": "*",
"sortFields": [
{
"missing": "_first",
"name": "email",
"order": "asc"
}
],
"startRow": 0
}
}
You might get a response back like this:
Result Of Find All Query
{
"nextResults": "eyJscyI6WyIxLjAwMTQ2MTkiLG51bGwsInRlc3R1c2VyOTkwOUBsb2NhbC5jb20iLCJjNmI4ZjQyNC0wOTRjLTQ1MWYtYWMxNS05Y2ZkODI3NTZlNGEiXSwicXMiOiIqIiwic2YiOltdfQ",
"total": 12009,
"users": [
...
]
}
If you attempt run a query to find results using a startRow
greater than 10,000 you will receive an error. To get around this you can use the same query with a startRow
of 9,975 and a numberOfResults
of 25. Using those results you can take the nextResults
token from the response and run the following query:
Next Results Query
{
"search": {
"numberOfResults": 25,
"nextResults": "eyJscyI6WyIxLjAwMTQ2MTkiLG51bGwsInRlc3R1c2VyOTkwOUBsb2NhbC5jb20iLCJjNmI4ZjQyNC0wOTRjLTQ1MWYtYWMxNS05Y2ZkODI3NTZlNGEiXSwicXMiOiIqIiwic2YiOltdfQ"
}
}
You will receive a response containing users 10,001 through 10,025. You can then take the nextResults
token from that response and repeat the request for users 10,026 through 10,050, and continue to repeat as needed to page through the results.
Re-indexing
Reindexing is an expensive operation, especially if your system has a large number of users, so it should not be run unless necessary.
It is possible, though rare, for an Elasticsearch index to become out of sync with the database. If you stand up FusionAuth with a database dump and restore or import users using the User Import API, you may need to run this operation. You may also be instructed to do so by FusionAuth support.
In general, even if a temporary outage occurs with Elasticsearch, the index will be sync up automatically.
If you do need to run this, navigate to System -> Reindex in the FusionAuth admin UI to initiate a reindex of all users. This navigation item will only be displayed when the search engine is Elasticsearch.
Optionally, you can also reindex via API.
Supported Versions
Elasticsearch Versions >=
7.6.1 and <=
7.17.x are currently supported. Later versions may work as well but may not have been tested for compatibility.
OpenSearch version 2.x should also function properly with FusionAuth version >= 1.42.0
.
Troubleshooting Elasticsearch Queries
When you are troubleshooting Elasticsearch queries, running them against the Elasticsearch server is helpful.
If you are running in FusionAuth Cloud, you cannot access Elasticsearch in the manner described here.
When running FusionAuth locally in Docker, you can expose the Elasticsearch port by adding the following lines to your docker-compose.yml
file and restarting.
Exposing Elasticsearch Ports
# ... other stuff
search:
# ... other stuff
ports:
- 9200:9200
Once you’ve opened up the port, you can query Elasticsearch directly. Here’s an example of a single field query.
Example Of Querying Elasticsearch Directly
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"data.Company": {
"query": "PiedPiper"
}
}
}
}'
When you run this curl command, you’ll get back results similar to below.
Results Of Querying Elasticsearch Directly
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 10,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 2.0,
"hits" : [
{
"_index" : "fusionauth_user",
"_type" : "_doc",
"_id" : "00000000-0000-0000-0000-000000000003",
"_score" : 2.0
},
{
"_index" : "fusionauth_user",
"_type" : "_doc",
"_id" : "00000000-0000-0000-0000-000000000006",
"_score" : 2.0
},
{
"_index" : "fusionauth_user",
"_type" : "_doc",
"_id" : "00000000-0000-0000-0000-000000000005",
"_score" : 2.0
}
]
}
}
When debugging, examine the hits.hits
array. The _id
value corresponds to the user Id that matched your query. If the content or number of users returned are different than expected, modify the query.
Limitations
You cannot filter search results in FusionAuth to only return certain fields. Instead you must do this through post-processing. So if you want to retrieve only the firstName
and birthDate
fields of a set of users, the results will give you each entire user object and you must select desired fields. You can use the JSON processing facilities in your chosen language to do so, or use a tool such as jq
.
Prior to version 1.48.0, when using the Elasticsearch search engine, the maximum number of users returned for any search is 10,000 users. For versions 1.48.0 and later, there is no limit on the number of users which can be returned if you paginate through the results.
There are no latency guarantees around the indexing of user data in Elasticsearch after the user has been updated using the API. The Elasticsearch index is eventually consistent.
The duration between an Update User API call and the changed attributes appearing in a search request would be the time it takes FusionAuth to make the request to Elasticsearch added to the time for Elasticsearch to refresh the index and make it visible to search.
The time it takes to expose index changes to searches in Elasticsearch is called the refresh_interval
. This value defaults to one second, so that is the minimum practical delay.
Maximum Users Returned Workarounds
Available since 1.48.0
When using the Elasticsearch engine you can use the nextResults
token to page past the 10,000 limit. See Extended Pagination for more information.
You can work around the user search limit by writing one or more search queries to return less than 10,000 users. You can know if the query returns more than that limit by using the accurateTotal
request parameter. The User Search API contains more information about the accurateTotal
parameter and its effect.
For example, if you needed to download all users, you could query for users with an email address starting with A
, then starting with B
, and so on. Here’s a sample shell script which retrieves all users using this strategy.
If you have access to the Elasticsearch server, you may also run your query directly against it. The Elasticsearch query result will return Elasticsearch document Ids. Each document Id corresponds to a FusionAuth user Id. To retrieve user data, which is most likely what you are after, use the User Search API with one or more user Ids.
Changing Data Field Types
FusionAuth provides data
fields on many types of objects:
- Applications
- Tenants
- Groups
- Users
- Registrations
- Consents
If you are using the Elasticsearch search engine, the user.data , registration.data , and entity.data fields are indexed by Elasticsearch.
For example, you could create a field contained in user.data called migrated and store a boolean value. If you later set that field to an object value for any user, you won’t be able to search for that user. Other users added after this user will be found, however, as long as they have the correct boolean value for user.data.migrated (or no value).
Elasticsearch requires fields to have the same data type across all indexed objects. In the example above, once Elasticsearch “knows” that user.data.migrated is a boolean, it expects this field, if present, to be a boolean for all users.
Therefore, you should not change the data type of fields stored in these fields across entities. This must be enforced by any software that updates these fields. There’s an open GitHub issue to allow FusionAuth to enforce the Elasticsearch schema.
Other object data fields may in the future be indexed by Elasticsearch. Therefore, it is recommended to maintain a consistent schema for all data contained in data fields.
This limitation applies only to installations using the Elasticsearch search engine. However, if you start with the database search engine and eventually need to switch to the Elasticsearch search engine because the database search engine no longer meets your needs, if you have not enforced consistency in the data
field types, you will not be able to do so.
Dates that are stored in the data field must be valid. Dates such as “0000-00-00” will fail to parse, for example. Some databases will return that value for invalid timestamps. When setting data values, invalid dates should be set to null
to keep the schema valid.
If you do not enforce the schema, objects will be mysteriously hidden from searches. It can also result in a MapperParsingException.
Additional Resources
The FusionAuth Elasticsearch API documentation has examples of JSON for various queries, as well as additional supported parameters.
All search examples shown above can be run locally by downloading this GitHub repository and following the instructions in the README.