skip to Main Content

I am using ElasticSearch version 8.5.1 and the latest python library of ElasticSearch concurrent with version 8.5.1. Also, my Python version is 3.10.4. I was trying to follow this tutorial but clearly some of the software have changed a few things over the past year.

I am having trouble with Haystack’s ElasticsearchDocumentStore. After following the ElasticSearch instructions here for deploying an instance of a single node in a container using a docker image, I was able to run the following 2 code blocks successfully:


    import requests
    from datetime import datetime
    from elasticsearch import Elasticsearch
    from elasticsearch import RequestsHttpConnection
    
    client = Elasticsearch( [{ 'host': '127.0.0.1', 'port': 9200,'scheme': 'https'}], 
    ca_certs="../http_ca.crt", http_auth=('username', 'password'))
    resp = client.info()
    resp  # this executed correctly

and this just for good measure:

    r = requests.get('https://localhost:9200/_cluster/health', verify="../http_ca.crt", 
    headers={"Authorization": 'Basic ' + TOKEN})
    r.json()  # this executed correctly

Then I tried

    from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore

    doc_store = ElasticsearchDocumentStore(
        host="localhost",
        port=9200,
        scheme="https",
        username = "username",
        password = "password",
        index = "doc1",

)

and no matter what I try above, I get this error:

Output exceeds the size
limit.
Open the full output data in a text
editor
WARNING:elasticsearch:GET https://localhost:9200/ [status:N/A
request:0.029s] Traceback (most recent call last): File
"c:Usersk.muftiDesktopQA_system.venvlibsite-packagesurllib3connectionpool.py",
line 703, in urlopen
httplib_response = self._make_request( File "c:Usersk.muftiDesktopQA_system.venvlibsite-packagesurllib3connectionpool.py",
line 386, in _make_request
self._validate_conn(conn) File "c:Usersk.muftiDesktopQA_system.venvlibsite-packagesurllib3connectionpool.py",
line 1042, in validate_conn
conn.connect() File "c:Usersk.muftiDesktopQA_system.venvlibsite-packagesurllib3connection.py",
line 414, in connect
self.sock = ssl_wrap_socket( File "c:Usersk.muftiDesktopQA_system.venvlibsite-packagesurllib3utilssl
.py",
line 449, in ssl_wrap_socket
ssl_sock = ssl_wrap_socket_impl( File "c:Usersk.muftiDesktopQA_system.venvlibsite-packagesurllib3utilssl.py",
line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname) File "C:Python310libssl.py",
line 512, in wrap_socket
return self.sslsocket_class._create( File "C:Python310libssl.py", line 1070, in _create
self.do_handshake() File "C:Python310libssl.py", line 1341, in do_handshake
self._sslobj.do_handshake() ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed
certificate in certificate chain (_ssl.c:997)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

self.do_handshake()
File "C:Python310libssl.py", line 1341, in do_handshake
self._sslobj.do_handshake()
urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:997)
Output exceeds the size limit. Open the full output data in a text editor

ConnectionError Traceback (most recent call last)
File c:Usersk.muftiDesktopQA_system.venvlibsite-packageshaystackdocument_storeselasticsearch.py:272, in ElasticsearchDocumentStore._init_elastic_client(cls, host, port, username, password, api_key_id, api_key, aws4auth, scheme, ca_certs, verify_certs, timeout, use_system_proxy)
271 if not status:
–> 272 raise ConnectionError(
273 f"Initial connection to Elasticsearch failed. Make sure you run an Elasticsearch instance "
274 f"at {hosts} and that it has finished the initial ramp up (can take > 30s)."
275 )
276 except Exception:

ConnectionError: Initial connection to Elasticsearch failed. Make sure you run an Elasticsearch instance at [{'host': 'localhost', 'port': 9200}] and that it has finished the initial ramp up (can take > 30s).

During handling of the above exception, another exception occurred:

ConnectionError Traceback (most recent call last)
Cell In [97], line 1
—-> 1 doc_store = ElasticsearchDocumentStore(
2 host="localhost",
3 port=9200,
4 scheme="https",
5 username = "username",
6 password = "password",
7 index = "aurelius",
8
9 )

278 f"Initial connection to Elasticsearch failed. Make sure you run an Elasticsearch instance at {hosts} and that it has finished the initial ramp up (can take > 30s)."
279 )
280 return client

ConnectionError: Initial connection to Elasticsearch failed. Make sure you run an Elasticsearch instance at [{'host': 'localhost', 'port': 9200}] and that it has finished the initial ramp up (can take > 30s).

Any ideas or solutions? I have tried with and without the parameters that the function can take, and nothing works.

2

Answers


  1. Chosen as BEST ANSWER

    It seems that I simply forgot to add in the parameter ca_certs="../http_ca.crt" after copying the security certificate from the container onto the local machine.

    doc_store = ElasticsearchDocumentStore(
        host="localhost",
        port=9200,
        ca_certs="../http_ca.crt",
        scheme="https",
        username = "username",
        password = "password",
        index = "doc1"
    

    If you installed ElasticSearch on your local machine and did not use a docker container as I did, then I am not sure where to get the certificate file from, but I imagine this process should be easier.


  2. After checking the process you did before you asked the question, I figured out that you need to add the certs file path to the haystack elastic store connection too.
    After digging through their documentation on the elastic search document store, I found out that you can do this.

    document_store = ElasticsearchDocumentStore(
        host="localhost", username="elastic",
        password="***", index="document",
        scheme="https", ca_certs="./http_ca.crt"
    )
    

    You need to use https as the scheme because after following the process of installing elastic and kibana with docker from the main documentation. The ip address uses ssl (https).

    As for the ca_certs, this is the

    Root certificates for SSL: it is a path to certificate authority (CA) certs on disk – ElasticSearchStore Haystack

    The ElasticSearch docker installation specifies how you can get the cert file.

    docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .
    

    That copies the http_ca.crt file to the directory where you ran the command. So when you are providing the ca_certs field in ElasticsearchDocumentStore you will provide the path to the file. For my example, it is in the directory where I ran the program.

    This is also a github issue that addresses some of the issues you might encounter.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search